This project provides instructions on how to generate an individual example custom step jar, move it into Data Studio, and deploy your custom step.
Example custom steps provided are:
- AddVAT
- ConcatValues
- IPGeolocation
This demo project relies on the Gradle Shadow Plugin to pack all of the dependencies
into a single jar
. This way, our sample code can be used to easily integrate with third party libraries, such as the Apache Commons Lang used in IPGeolocation.java
.
For more details about the Gradle Shadow Plugin, refer to the user documentation.
-
Run
gradle build
either from the command line or from IntelliJ IDEA:Note: This deployment uses the IPGeolocation step as an example. For AddVAT and ConcatValues (and any other custom step), repeat these steps in their respective folders.
-
The output of the build is located at
build/libs/IPGeolocation-all.jar
: -
Copy and paste the
jar
into the Data Studio addons folder. -
Once the
jar
is moved into the Data Studio addons folder, the example steps will be listed in the left-hand side pane:
IPGeolocation depends on specific input data that contains IPv4 addresses. You can extract the sample data from the test resources folder:
-
The sample data is available at IPGeolocationData.csv
-
Add the sample data as a source under the Datasets tab.
-
Select this data as source in the Data Studio UI.
-
Link it with the custom step:
-
Execute/Run the workflow or click on
Show step results
.
- AddVAT: AddVATData.csv
- ConcatValues: ConcatValuesData.csv
- IPGeolocation: IPGeolocationData.csv
The AddVAT example step adds a user-defined VAT percentage to an input column. The column will be renamed and returns the total + VAT amount.
The input is taken from a single column from an input node and the output is published to a single column in the output node (replacing the input column). As the example step class is AddVAT.java
which contains the metadata, configuration and processor.
The VAT percentage is specified using a "Number" step property. This is demonstrated in AddVAT.java
.
The ConcatValues example step concatenates two columns together using a user-defined delimiter into a new output column inserted after the input columns.
The input is taken from two columns from an input node and the output is published to a single column in the output node. As the example step class is ConcatValues.java
which contains the metadata, configuration and processor.
The two input columns are selected using two "Column Chooser" step properties whereas the delimiter is selected using a "Custom Chooser" step property. This is demonstrated in ConcatValues.java
.
The IPGeolocation example step takes a list of IPv4 addresses as an input and maps them to their respective countries of origin.
This example step relies on ip-api, an API endpoint that identifies the country of origin (and other location specific data) based on a provided IP address. In this example, the response is returned in JSON format.
The input is taken from a single column from an input node and the output is published to a single column in the output node. As the example step is large, the main class is IPGeolocation.java
which contains the metadata and configuration. The processor is housed in a separate class (IPGeolocationProcessor.java
) for better readability.
The IPGeolocation example step demonstrates the following features of the Aperture Data Studio SDK:
- HTTP requests (using the SDK HTTP Libraries/Helper Classes)
- Caching (using SDK Cache)
- Throttling (using Java Semaphore)
- Step Settings (retrieving lang settings from the UI for query)
- Concurrent asynchronous requests (using Java CompletableFuture)
The DemoAggregateStep perform various aggregates operation on a single group column.
This examples relies on the SDK 2.4.0 preprocessing API.
The HTTP requests are made using the SDK HTTP libraries/helper classes (i.e. WebHttpClient
, WebHttpRequest
, WebHttpResponse
). First, an HTTP web client (WebHttpClient
) is set up, and a request (WebHttpRequest
) is sent through the client using the sendAsync()
method. This returns a WebHttpResponse
which contains the location data of the IP address in JSON format.
When executing the step, it first checks if there is any data stored in the cache. If there is a valid cache, the output is populated from the cache, otherwise the data is pulled from the API endpoint. Caches are created and managed using the SDK Cache libraries/helper classes (i.e. StepCacheManager
, StepCache
, StepCacheConfiguration
).
Configuration for the cache includes:
- Name
- Time to Live
- Scope
- Key-Value Type
Throttling is demonstrated using Java Semaphore, limiting the number of concurrent HTTP requests to avoid overloading the server at the endpoint. A semaphore is set up, and a limited number of permits (set at 5 permits) are provided. Each request acquires a single permit and when the response is returned, the permit is released.
Step settings can be set under the "Step Settings" tab in the Data Studio UI. In particular, the JSON response returned by the ip-api endpoint can be configured to be in specified languages.
In the IPGeolocationProcessor.java
, the step setting field can be found in the StepProcessorContext
and retrieve using the getStepSettingFieldValueAsString()
method.
Asynchronous requests are made using the sendAsync()
method of WebHttpClient
. The Java CompletableFuture handles the response. A CompletableFuture of type WebHttpResponse
(i.e. CompletableFuture<WebHttpResponse>
) allows Data Studio to continue execution and make other asynchronous calls. The thenAccept()
method of the CompletableFuture
defines what is done when the response is received.