Skip to content

Latest commit

 

History

History
 
 

join-library

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Join-library

Join-library provides inner join, outer left and right join functions to Google Cloud Dataflow. The aim is to simplify the most common cases of join to a simple function call.

The functions are generic so it supports join of any types supported by Dataflow. Input to the join functions are PCollections of Key/Values. Both the left and right PCollections need the same type for the key. All the join functions return a Key/Value where Key is the join key and value is a Key/Value where the key is the left value and right is the value.

In the cases of outer join, since null cannot be serialized the user have to provide a value that represent null for that particular use case.

Example how to use join-library:

PCollection<KV<String, String>> leftPcollection = ...
PCollection<KV<String, Long>> rightPcollection = ...

PCollection<KV<String, KV<String, Long>>> joinedPcollection =
  Join.innerJoin(leftPcollection, rightPcollection);

Join-library can be found on maven-central:

<dependency>
  <groupId>org.linuxalert.dataflow</groupId>
  <artifactId>google-cloud-dataflow-java-contrib-joinlibrary</artifactId>
  <version>0.0.3</version>
</dependency>

Questions or comments: M.Runesson [at] gmail [dot] com