forked from apache/beam
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds detailed instructions on how to execute ExampleEchoPipeline.java (…
…apache#33108) * Adds detailed instructions on how to execute ExampleEchoPipeline.java * spacing in exampleechopipeline * revert header * spacing * spacing * spacing * license * spotless
- Loading branch information
1 parent
7accd8d
commit f38edb7
Showing
2 changed files
with
87 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
86 changes: 86 additions & 0 deletions
86
examples/java/src/main/java/org/apache/beam/examples/subprocess/Readme.MD
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
# Apache Beam Subprocess Example | ||
|
||
This example demonstrates how to execute external C++ binaries as subprocesses within an Apache Beam pipeline using the `SubProcessKernel`. | ||
|
||
## Prerequisites | ||
|
||
* **Google Cloud Project:** A Google Cloud project with billing enabled. | ||
* **Dataflow API:** Enable the Dataflow API for your project. | ||
* **C++ compiler:** You'll need a C++ compiler (like g++) to compile the C++ binaries. | ||
|
||
## Steps | ||
|
||
1. **Create a [Maven Example project](https://beam.apache.org/get-started/quickstart-java/) that builds against the latest Beam release:** | ||
|
||
```bash | ||
mvn archetype:generate \ | ||
-DarchetypeGroupId=org.apache.beam \ | ||
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ | ||
-DarchetypeVersion=2.60.0 \ | ||
-DgroupId=org.example \ | ||
-DartifactId=word-count-beam \ | ||
-Dversion="0.1" \ | ||
-Dpackage=org.apache.beam.examples \ | ||
-DinteractiveMode=false | ||
``` | ||
|
||
2. **Build the project:** | ||
|
||
* Navigate to the root of the repository (`word-count-beam/`): | ||
|
||
```bash | ||
cd word-count-beam/ | ||
``` | ||
|
||
* Build the project using Maven: | ||
|
||
```bash | ||
mvn clean install | ||
``` | ||
|
||
3. **Run the pipeline on Dataflow:** | ||
|
||
```bash | ||
mvn compile exec:java \ | ||
-Dexec.mainClass=org.apache.beam.examples.subprocess.ExampleEchoPipeline \ | ||
-Dexec.args="--sourcePath=/absolute/path/to/your/subprocess/directory \ | ||
--workerPath=/absolute/path/to/your/subprocess/directory \ | ||
--concurrency=5 \ | ||
--filesToStage=/absolute/path/to/your/subprocess/directory/echo,/absolute/path/to/your/subprocess/directory/echoagain \ | ||
--runner=DataflowRunner \ | ||
--project=your-project-id \ | ||
--region=your-gcp-region \ | ||
--tempLocation=gs://your-gcs-bucket/temp" | ||
``` | ||
|
||
* Replace the placeholders with your actual paths, project ID, region, and Cloud Storage bucket. | ||
|
||
## Important notes | ||
|
||
* **Dependencies:** Ensure your `pom.xml` includes the Dataflow runner dependency (`beam-runners-google-cloud-dataflow-java`). | ||
* **Authentication:** Authenticate your environment to your Google Cloud project. | ||
* **DirectRunner:** On `DirectRunner`, you will see the error ` Process succeded but no result file was found`, showing that the Process is successful. | ||
|
||
## Code overview | ||
|
||
* **`ExampleEchoPipeline.java`:** This Java file defines the Beam pipeline that executes the `Echo` and `EchoAgain` binaries as subprocesses. | ||
* **`Echo.cc` and `Echoagain.cc`:** These C++ files contain the code for the external binaries. These won't be visible when running the example with the created example project. You will need to compile these (using `g++ Echo.cc -o Echo` and `g++ EchoAgain.cc -o EchoAgain`), and then provide their path via the `sourcePath` and `workerPath` flags as listed above. | ||
* **`SubProcessKernel.java`:** This class in the Beam Java SDK handles the execution of external binaries and captures their output. |