Skip to content

Commit

Permalink
[streaming] ConnectedDataStream API and operator cleanup + modified w…
Browse files Browse the repository at this point in the history
…indowReduceGroup functionality
  • Loading branch information
gyfora authored and mbalassi committed Oct 1, 2014
1 parent 6492af0 commit 127470b
Show file tree
Hide file tree
Showing 21 changed files with 427 additions and 1,752 deletions.
23 changes: 22 additions & 1 deletion docs/streaming_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ With `minBy` and `maxBy` the output of the operator is the element with the curr

### Window/Batch operators

Window and batch operators allow the user to execute function on slices or windows of the DataStream in a sliding fashion. If the stepsize for the slide is not defined then the window/batchsize is used as stepsize by default.
Window and batch operators allow the user to execute function on slices or windows of the DataStream in a sliding fashion. If the stepsize for the slide is not defined then the window/batchsize is used as stepsize by default. The user can also use user defined timestamps for calculating time windows.

When applied to grouped data streams the data stream is batched/windowed for different key values separately.

Expand Down Expand Up @@ -326,6 +326,27 @@ dataStream1.connect(dataStream2)
})
~~~

#### winddowReduceGroup on ConnectedDataStream
The windowReduceGroup operator applies a user defined `CoGroupFunction` to time aligned windows of the two data streams and return zero or more elements of an arbitrary type. The user can define the window and slide intervals and can also implement custom timestamps to be used for calculating windows.

~~~java
DataStream<Integer> dataStream1 = ...
DataStream<String> dataStream2 = ...

dataStream1.connect(dataStream2)
.windowReduceGroup(new CoGroupFunction<Integer, String, String>() {

@Override
public void coGroup(Iterable<Integer> first, Iterable<String> second,
Collector<String> out) throws Exception {

//Do something here

}
}, 10000, 5000);
~~~


#### Reduce on ConnectedDataStream
The Reduce operator for the `ConnectedDataStream` applies a simple reduce transformation on the joined data streams and then maps the reduced elements to a common output type.

Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ private void validateMerge(String id) {
* @return The {@link ConnectedDataStream}.
*/
public <R> ConnectedDataStream<OUT, R> connect(DataStream<R> dataStream) {
return new ConnectedDataStream<OUT, R>(environment, jobGraphBuilder, this, dataStream);
return new ConnectedDataStream<OUT, R>(this, dataStream);
}

/**
Expand Down

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@
public class GroupedWindowGroupReduceInvokable<IN, OUT> extends WindowGroupReduceInvokable<IN, OUT> {

private static final long serialVersionUID = 1L;

int keyPosition;
Map<Object, StreamWindow> streamWindows;
List<Object> cleanList;
long currentMiniBatchCount = 0;

public GroupedWindowGroupReduceInvokable(GroupReduceFunction<IN, OUT> reduceFunction, long windowSize,
long slideInterval, int keyPosition, TimeStamp<IN> timestamp) {
public GroupedWindowGroupReduceInvokable(GroupReduceFunction<IN, OUT> reduceFunction,
long windowSize, long slideInterval, int keyPosition, TimeStamp<IN> timestamp) {
super(reduceFunction, windowSize, slideInterval, timestamp);
this.keyPosition = keyPosition;
this.reducer = reduceFunction;
Expand All @@ -48,7 +48,6 @@ protected StreamBatch getBatch(StreamRecord<IN> next) {
StreamWindow window = streamWindows.get(key);
if (window == null) {
window = new GroupedStreamWindow();
window.minibatchCounter = currentMiniBatchCount;
streamWindows.put(key, window);
}
this.window = window;
Expand All @@ -62,26 +61,25 @@ protected void reduceLastBatch() {
}
}

private void shiftGranularityAllWindows(){
private void shiftGranularityAllWindows() {
for (StreamBatch window : streamWindows.values()) {
window.circularList.newSlide();
window.minibatchCounter+=1;
}
}

private void slideAllWindows(){
private void slideAllWindows() {
currentMiniBatchCount -= batchPerSlide;
for (StreamBatch window : streamWindows.values()) {
window.circularList.shiftWindow(batchPerSlide);
}
}
}

private void reduceAllWindows() {
for (StreamBatch window : streamWindows.values()) {
window.minibatchCounter -= batchPerSlide;
window.reduceBatch();
}
}

protected class GroupedStreamWindow extends StreamWindow {

private static final long serialVersionUID = 1L;
Expand All @@ -90,28 +88,35 @@ public GroupedStreamWindow() {
super();
}

@Override
public void addToBuffer(IN nextValue) throws Exception {
checkWindowEnd(timestamp.getTimestamp(nextValue));
if (currentMiniBatchCount >= 0) {
circularList.add(nextValue);
}
}

@Override
protected synchronized void checkWindowEnd(long timeStamp) {
nextRecordTime = timeStamp;

while (miniBatchEnd()) {
shiftGranularityAllWindows();
currentMiniBatchCount += 1;
if (batchEnd()) {
reduceAllWindows();
slideAllWindows();
}
}
currentMiniBatchCount = this.minibatchCounter;
}

@Override
public boolean batchEnd() {
if (minibatchCounter == numberOfBatches) {
if (currentMiniBatchCount == numberOfBatches) {
return true;
}
return false;
}


}

Expand Down

This file was deleted.

Loading

0 comments on commit 127470b

Please sign in to comment.