Add Range Join Optimization #68

GeekSheikh · 2021-03-03T15:36:06Z

At present both sides of this join are relatively small -- so adding a range join hint here may not have much effect but it's worth exploring when/if there's time.

overwatch/src/main/scala/com/databricks/labs/overwatch/pipeline/GoldTransforms.scala

Lines 373 to 379 in b8bd69c

    
           val jobRunIntermediateStates = newTerminatedJobRuns.alias("jr") 
        
             .join(clusterPotentialIntermediateStates.alias("cpot"), 
        
               $"jr.organization_id" === $"cpot.organization_id" && 
        
                 $"jr.cluster_id" === $"cpot.cluster_id" && 
        
                 $"cpot.unixTimeMS_state_start" > $"jr.job_runtime.startEpochMS" && // only states beginning after job start and ending before 
        
                 $"cpot.unixTimeMS_state_end" < $"jr.job_runtime.endEpochMS" 
        
             )

GeekSheikh added the optimization Technical Spark Optimization label Mar 3, 2021

gueniai added this to the backlog milestone Apr 24, 2023

gueniai removed this from the backlog milestone Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Range Join Optimization #68

Add Range Join Optimization #68

GeekSheikh commented Mar 3, 2021

Add Range Join Optimization #68

Add Range Join Optimization #68

Comments

GeekSheikh commented Mar 3, 2021