title
Quickstart: Setup

Get Flink up and running in a few simple steps.

Requirements

Flink runs on all UNIX-like environments: Linux, Mac OS X, Cygwin. The only requirement is to have a working Java 6.x (or higher) installation.

Download

Download the ready to run binary package. Choose the Flink distribution that matches your Hadoop version. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.

Hadoop 1.2
Hadoop 2 (YARN)

Download Flink for Hadoop 1.2

Download Flink for Hadoop 2

Start

You are almost done.

Go to the download directory.
Unpack the downloaded archive.
Start Flink.

$ cd ~/Downloads              # Go to download directory
$ tar xzf flink-*.tgz  # Unpack the downloaded archive
$ cd flink
$ bin/start-local.sh          # Start Flink

Check the JobManager's web frontend at http://localhost:8081 and make sure everything is up and running.

Run Example

Run the Word Count example to see Flink at work.

Download test data:

$ wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt

You now have a text file called hamlet.txt in your working directory.
Start the example program:

$ bin/flink run \
    --jarfile ./examples/flink-java-examples-{{site.FLINK_VERSION_STABLE}}-WordCount.jar \

    --arguments file://`pwd`/hamlet.txt file://`pwd`/wordcount-result.txt

You will find a file called wordcount-result.txt in your current directory.

Cluster Setup

Running Flink on a cluster is as easy as running it locally. Having passwordless SSH and the same directory structure on all your cluster nodes lets you use our scripts to control everything.

Copy the unpacked flink directory from the downloaded archive to the same file system path on each node of your setup.
Choose a master node (JobManager) and set the jobmanager.rpc.address key in conf/flink-conf.yaml to its IP or hostname. Make sure that all nodes in your cluster have the same jobmanager.rpc.address configured.
Add the IPs or hostnames (one per line) of all worker nodes (TaskManager) to the slaves files in conf/slaves.

You can now start the cluster at your master node with bin/start-cluster.sh.

The following example illustrates the setup with three nodes (with IP addresses from 10.0.0.1 to 10.0.0.3 and hostnames master, worker1, worker2) and shows the contents of the configuration files, which need to be accessible at the same path on all machines:

/path/to/flink/conf/
flink-conf.yaml

jobmanager.rpc.address: 10.0.0.1

/path/to/flink/
conf/slaves

    10.0.0.2
    10.0.0.3

Flink on YARN

You can easily deploy Flink on your existing YARN cluster.

Download the Flink YARN package with the YARN client: Flink for YARN
Make sure your HADOOP_HOME (or YARN_CONF_DIR or HADOOP_CONF_DIR) environment variable is set to read your YARN and HDFS configuration.
Run the YARN client with: ./bin/yarn-session.sh. You can run the client with options -n 10 -tm 8192 to allocate 10 TaskManagers with 8GB of memory each.

For more detailed instructions, check out the programming Guides and examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setup_quickstart.md

setup_quickstart.md

Requirements

Download

Start

Run Example

Cluster Setup

Flink on YARN

Files

setup_quickstart.md

Latest commit

History

setup_quickstart.md

File metadata and controls

Requirements

Download

Start

Run Example

Cluster Setup

Flink on YARN