Skip to content

Commit

Permalink
[FLINK-8764] [docs] Adjust quickstart documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
StephanEwen committed Feb 26, 2018
1 parent c6f8406 commit 647c552
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 162 deletions.
119 changes: 25 additions & 94 deletions docs/quickstart/java_api_quickstart.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Sample Project using the Java API"
nav-title: Sample Project in Java
title: "Project Template for Java"
nav-title: Project Template for Java
nav-parent_id: start
nav-pos: 0
---
Expand Down Expand Up @@ -86,120 +86,51 @@ quickstart/
│   └── myorg
│   └── quickstart
│   ├── BatchJob.java
│   ├── SocketTextStreamWordCount.java
│   ├── StreamingJob.java
│   └── WordCount.java
│   └── StreamingJob.java
└── resources
└── log4j.properties
{% endhighlight %}

The sample project is a __Maven project__, which contains four classes. _StreamingJob_ and _BatchJob_ are basic skeleton programs, _SocketTextStreamWordCount_ is a working streaming example and _WordCountJob_ is a working batch example. Please note that the _main_ method of all classes allow you to start Flink in a development/testing mode.
The sample project is a __Maven project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program.
The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments.

We recommend you __import this project into your IDE__ to develop and
test it. If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/)
test it. IntelliJ IDEA supports Maven projects out of the box.
If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/)
allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import).
Some Eclipse bundles include that plugin by default, others require you
to install it manually. The IntelliJ IDE supports Maven projects out of
the box.
to install it manually.


*A note to Mac OS X users*: The default JVM heapsize for Java is too
*A note to Mac OS X users*: The default JVM heapsize for Java mey be too
small for Flink. You have to manually increase it. In Eclipse, choose
`Run Configurations -> Arguments` and write into the `VM Arguments`
box: `-Xmx800m`.

## Build Project

If you want to __build your project__, go to your project directory and
issue the `mvn clean install -Pbuild-jar` command. You will
__find a jar__ that runs on every Flink cluster with a compatible
version, __target/original-your-artifact-id-your-version.jar__. There
is also a fat-jar in __target/your-artifact-id-your-version.jar__ which,
additionally, contains all dependencies that were added to the Maven
project.
If you want to __build/package your project__, go to your project directory and
run the '`mvn clean package`' command.
You will __find a JAR file__ that contains your application, plus connectors and libraries
that you may have added as dependencoes to the application: `target/<artifact-id>-<version>.jar`.

__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point,
we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, the Flink
can run time application from the JAR file without additionally specifying the main class.

## Next Steps

Write your application!

The quickstart project contains a `WordCount` implementation, the
"Hello World" of Big Data processing systems. The goal of `WordCount`
is to determine the frequencies of words in a text, e.g., how often do
the terms "the" or "house" occur in all Wikipedia texts.

__Sample Input__:

~~~bash
big data is big
~~~

__Sample Output__:

~~~bash
big 2
data 1
is 1
~~~

The following code shows the `WordCount` implementation from the
Quickstart which processes some text lines with two operators (a FlatMap
and a Reduce operation via aggregating a sum), and prints the resulting
words and counts to std-out.

~~~java
public class WordCount {

public static void main(String[] args) throws Exception {

// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// get input data
DataSet<String> text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,"
);

DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new LineSplitter())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);

// execute and print result
counts.print();
}
}
~~~

The operations are defined by specialized classes, here the LineSplitter class.

~~~java
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {

@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");

// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
~~~

{% gh_link /flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java "Check GitHub" %} for the full example code.

For a complete overview over our API, have a look at the
If you are writing a streaming application and you are looking for inspiration what to write,
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program)

If you are writing a batch processing application and you are looking for inspiration what to write,
take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html)

For a complete overview over the APIa, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.

If you have any trouble, ask on our
[Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/).
We are happy to provide help.
Expand Down
95 changes: 27 additions & 68 deletions docs/quickstart/scala_api_quickstart.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Sample Project using the Scala API"
nav-title: Sample Project in Scala
title: "Project Template for Scala"
nav-title: Project Template for Scala
nav-parent_id: start
nav-pos: 1
---
Expand Down Expand Up @@ -173,14 +173,18 @@ quickstart/
└── myorg
└── quickstart
├── BatchJob.scala
├── SocketTextStreamWordCount.scala
├── StreamingJob.scala
└── WordCount.scala
└── StreamingJob.scala
{% endhighlight %}

The sample project is a __Maven project__, which contains four classes. _StreamingJob_ and _BatchJob_ are basic skeleton programs, _SocketTextStreamWordCount_ is a working streaming example and _WordCountJob_ is a working batch example. Please note that the _main_ method of all classes allow you to start Flink in a development/testing mode.
The sample project is a __Maven project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program.
The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments.

We recommend you __import this project into your IDE__. For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites:
We recommend you __import this project into your IDE__.

IntelliJ IDEA supports Maven out of the box and offers a plugin for Scala development.
From our experience, IntelliJ provides the best experience for developing Flink applications.

For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites:

* _Eclipse 4.x_
* [Scala IDE](http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site)
Expand All @@ -191,78 +195,33 @@ We recommend you __import this project into your IDE__. For Eclipse, you need th
* [m2eclipse-scala](http://alchim31.free.fr/m2e-scala/update-site)
* [Build Helper Maven Plugin](https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/)

The IntelliJ IDE supports Maven out of the box and offers a plugin for
Scala development.
### Build Project

If you want to __build/package your project__, go to your project directory and
run the '`mvn clean package`' command.
You will __find a JAR file__ that contains your application, plus connectors and libraries
that you may have added as dependencoes to the application: `target/<artifact-id>-<version>.jar`.

### Build Project
__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point,
we recommend you change the `mainClass` setting in the `pom.xml` file accordingly. That way, the Flink
can run time application from the JAR file without additionally specifying the main class.

If you want to __build your project__, go to your project directory and
issue the `mvn clean package -Pbuild-jar` command. You will
__find a jar__ that runs on every Flink cluster with a compatible
version, __target/original-your-artifact-id-your-version.jar__. There
is also a fat-jar in __target/your-artifact-id-your-version.jar__ which,
additionally, contains all dependencies that were added to the Maven
project.

## Next Steps

Write your application!

The quickstart project contains a `WordCount` implementation, the
"Hello World" of Big Data processing systems. The goal of `WordCount`
is to determine the frequencies of words in a text, e.g., how often do
the terms "the" or "house" occur in all Wikipedia texts.

__Sample Input__:

~~~bash
big data is big
~~~
If you are writing a streaming application and you are looking for inspiration what to write,
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program)

__Sample Output__:

~~~bash
big 2
data 1
is 1
~~~

The following code shows the `WordCount` implementation from the
Quickstart which processes some text lines with two operators (a FlatMap
and a Reduce operation via aggregating a sum), and prints the resulting
words and counts to std-out.

~~~scala
object WordCountJob {
def main(args: Array[String]) {

// set up the execution environment
val env = ExecutionEnvironment.getExecutionEnvironment

// get input data
val text = env.fromElements("To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,")

val counts = text.flatMap { _.toLowerCase.split("\\W+") }
.map { (_, 1) }
.groupBy(0)
.sum(1)

// emit result and print result
counts.print()
}
}
~~~
If you are writing a batch processing application and you are looking for inspiration what to write,
take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html)

{% gh_link flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala "Check GitHub" %} for the full example code.
For a complete overview over the APIa, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.

For a complete overview over our API, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html),
[DataSet API]({{ site.baseurl }}/dev/batch/index.html), and
[Scala API Extensions]({{ site.baseurl }}/dev/scala_api_extensions.html)
sections. If you have any trouble, ask on our
If you have any trouble, ask on our
[Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/).
We are happy to provide help.

Expand Down

0 comments on commit 647c552

Please sign in to comment.