Skip to content

Commit

Permalink
Fix website io connector 404 (apache#5347)
Browse files Browse the repository at this point in the history
  • Loading branch information
tuteng authored and merlimat committed Oct 24, 2019
1 parent b4e5d15 commit a71ac20
Show file tree
Hide file tree
Showing 19 changed files with 412 additions and 4 deletions.
21 changes: 21 additions & 0 deletions site2/docs/io-aerospike.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
id: io-aerospike
title: Aerospike Sink Connector
sidebar_label: Aerospike Sink Connector
---

The Aerospike Sink connector is used to write messages to an Aerospike Cluster.

## Sink Configuration Options

The following configuration options are specific to the Aerospike Connector:

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| `seedHosts` | `true` | `null` | Comma seperated list of one or more Aerospike cluster hosts; each host can be specified as a valid IP address or hostname followed by an optional port number (default is 3000). |
| `keyspace` | `true` | `null` | Aerospike namespace to use. |
| `keySet` | `false` | `null` | Aerospike set name to use. |
| `columnName` | `true` | `null` | Aerospike bin name to use. |
| `maxConcurrentRequests` | `false` | `100` | Maximum number of concurrent Aerospike transactions that a Sink can open. |
| `timeoutMs` | `false` | `100` | A single timeout value controls `socketTimeout` and `totalTimeout` for Aerospike transactions. |
| `retries` | `false` | `1` | Maximum number of retries before aborting a write transaction to Aerospike. |
22 changes: 22 additions & 0 deletions site2/docs/io-cassandra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
id: io-cassandra
title: Cassandra Sink Connector
sidebar_label: Cassandra Sink Connector
---

The Cassandra Sink connector is used to write messages to a Cassandra Cluster.

The tutorial [Connecting Pulsar with Apache Cassandra](io-quickstart.md) shows an example how to use Cassandra Sink
connector to write messages to a Cassandra table.

## Sink Configuration Options

All the Cassandra sink settings are listed as below. All the settings are required to run a Cassandra sink.

| Name | Default | Required | Description |
|------|---------|----------|-------------|
| `roots` | `null` | `true` | Cassandra Contact Points. A list of one or many node address. It is a comma separated `String`. |
| `keyspace` | `null` | `true` | Cassandra Keyspace name. The keyspace should be created prior to creating the sink. |
| `columnFamily` | `null` | `true` | Cassandra ColumnFamily name. The column family should be created prior to creating the sink. |
| `keyname` | `null` | `true` | Key column name. The key column is used for storing Pulsar message keys. If a Pulsar message doesn't have any key associated, the message value will be used as the key. |
| `columnName` | `null` | `true` | Value column name. The value column is used for storing Pulsar message values. |
1 change: 0 additions & 1 deletion site2/docs/io-connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Name|Java class
[Netty](io-netty-source.md)|[org.apache.pulsar.io.netty.NettySource.java](https://github.com/apache/pulsar/blob/master/pulsar-io/netty/src/main/java/org/apache/pulsar/io/netty/NettySource.java)
[RabbitMQ](io-rabbitmq-source.md)|[org.apache.pulsar.io.rabbitmq.RabbitMQSource.java](https://github.com/apache/pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)


## Sink connector

Pulsar has various sink connectors, which are sorted alphabetically as below.
Expand Down
21 changes: 21 additions & 0 deletions site2/docs/io-elasticsearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
id: io-elasticsearch
title: ElasticSearch Connector
sidebar_label: ElasticSearch Connector
---

## Sink

The ElasticSearch Sink Connector is used to pull messages from Pulsar topics and persist the messages
to a index.

## Sink Configuration Options

| Name | Default | Required | Description |
|------|---------|----------|-------------|
| `elasticSearchUrl` | `null` | `true` | The url of elastic search cluster that the connector connects to. |
| `indexName` | `null` | `true` | The index name that the connector writes messages to. |
| `indexNumberOfShards` | `1` | `false` | The number of shards of the index. |
| `indexNumberOfReplicas` | `1` | `false` | The number of replicas of the index. |
| `username` | `null` | `false` | The username used by the connector to connect to the elastic search cluster. If username is set, a password should also be provided. |
| `password` | `null` | `false` | The password used by the connector to connect to the elastic search cluster. If password is set, a username should also be provided. |
2 changes: 1 addition & 1 deletion site2/docs/io-file-source.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
id: io-file
id: io-file-source
title: File source connector
sidebar_label: File source connector
---
Expand Down
27 changes: 27 additions & 0 deletions site2/docs/io-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
id: io-file
title: File Connector
sidebar_label: File Connector
---

## Source

The File Source Connector is used to pull messages from files in a directory and persist the messages
to a Pulsar topic.

### Source Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| inputDirectory | `true` | `null` | The input directory from which to pull files. |
| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. |
| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. |
| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. |
| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. |
| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. |
| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. |
| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. |
| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. |
| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. |
| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. |
| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. |
26 changes: 26 additions & 0 deletions site2/docs/io-hdfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
id: io-hdfs
title: Hdfs Connector
sidebar_label: Hdfs Connector
---

## Sink

The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages
to a hdfs file.

## Sink Configuration Options

| Name | Default | Required | Description |
|------|---------|----------|-------------|
| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. |
| `directory` | `null` | `true` | The HDFS directory from which files should be read from or written to. |
| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII, etc. |
| `compression` | `null` | `false` | The compression codec used to compress/de-compress the files on HDFS. |
| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use for authentication. |
| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for authentication. |
| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced. |
| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc. |
| `separator` | `null` | `false` | The character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array. |
| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush data to HDFS disk. |
| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk. |
25 changes: 25 additions & 0 deletions site2/docs/io-influxdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
id: io-influxdb
title: InfluxDB Connector
sidebar_label: InfluxDB Connector
---

## Sink

The InfluxDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
to an InfluxDB database.

## Sink Configuration Options

| Name | Default | Required | Description |
|------|---------|----------|-------------|
| `influxdbUrl` | `null` | `true` | The url of the InfluxDB instance to connect to. |
| `username` | `null` | `false` | The username used to authenticate to InfluxDB. |
| `password` | `null` | `false` | The password used to authenticate to InfluxDB. |
| `database` | `null` | `true` | The InfluxDB database to write to. |
| `consistencyLevel` | `ONE` | `false` | The consistency level for writing data to InfluxDB. Possible values [ALL, ANY, ONE, QUORUM]. |
| `logLevel` | `NONE` | `false` | The log level for InfluxDB request and response. Possible values [NONE, BASIC, HEADERS, FULL]. |
| `retentionPolicy` | `autogen` | `false` | The retention policy for the InfluxDB database. |
| `gzipEnable` | `false` | `false` | Flag to determine if gzip should be enabled. |
| `batchTimeMs` | `1000` | `false` | The InfluxDB operation time in milliseconds. |
| `batchSize` | `200` | `false` | The batch size of write to InfluxDB database. |
23 changes: 23 additions & 0 deletions site2/docs/io-jdbc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
id: io-jdbc
title: JDBC Connector
sidebar_label: JDBC Connector
---

## Sink

The JDBC Sink Connector is used to pull messages from Pulsar topics and persist the messages to an MySQL or Sqlite.
Current support INSERT, DELETE and UPDATE.

### Sink Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| userName | `false` | `` | Username used to connect to the database specified by `jdbcUrl`. |
| password | `false` | `` | Password used to connect to the database specified by `jdbcUrl`. |
| jdbcUrl | `true` | `` | The JDBC url of the database this connector connects to. |
| tableName | `true` | `` | The name of the table this connector writes messages to. |
| nonKey | `false` | `` | Fields used in update events. A comma-separated list. |
| key | `false` | `` | Fields used in where condition of update and delete Events. A comma-separated list. |
| timeoutMs | `false` | `500` | The jdbc operation timeout in milliseconds. |
| batchSize | `false` | `200` | The batch size of updates made to the database. |
44 changes: 44 additions & 0 deletions site2/docs/io-kafka.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
id: io-kafka
title: Kafka Connector
sidebar_label: Kafka Connector
---

## Source

The Kafka Source Connector is used to pull messages from Kafka topics and persist the messages
to a Pulsar topic.

### Source Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| bootstrapServers | `true` | `null` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. |
| groupId | `true` | `null` | A unique string that identifies the consumer group this consumer belongs to. |
| fetchMinBytes | `false` | `1` | Minimum bytes expected for each fetch response. |
| autoCommitEnabled | `false` | `true` | If true, the consumer's offset will be periodically committed in the background. This committed offset will be used when the process fails as the position from which the new consumer will begin. |
| autoCommitIntervalMs | `false` | `5000` | The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if `autoCommitEnabled` is set to true. |
| heartbeatIntervalMs | `false` | `3000` | The interval between heartbeats to the consumer when using Kafka's group management facilities. |
| sessionTimeoutMs | `false` | `30000` | The timeout used to detect consumer failures when using Kafka's group management facility. |
| topic | `true` | `null` | Topic name to receive records from Kafka. |
| consumerConfigProperties | `false` | `null` | The consumer config properties to be passed to Consumer. Note that other properties specified in the connector config file take precedence over this config. |
| keyDeserializationClass | `false` | `org.apache.kafka.common.serialization.StringDeserializer` | Deserializer class for key that implements the org.apache.kafka.common.serialization.Deserializer interface. |
| valueDeserializationClass | `false` | `org.apache.kafka.common.serialization.ByteArrayDeserializer` | Deserializer class for value that implements the org.apache.kafka.common.serialization.Deserializer interface. |

## Sink

The Kafka Sink Connector is used to pull messages from Pulsar topics and persist the messages
to a Kafka topic.

### Sink Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| bootstrapServers | `true` | `null` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. |
| acks | `true` | `null` | The kafka producer acks mode. |
| batchSize | `false` | `16384` | The kafka producer batch size. |
| maxRequestSize | `false` | `1048576` | The maximum size of a request in bytes. |
| topic | `true` | `null` | Topic name to receive records from Kafka. |
| producerConfigProperties | `false` | `null` | The producer config properties to be passed to Producer. Note that other properties specified in the connector config file take precedence over this config. |
| keySerializerClass | `false` | `org.apache.kafka.common.serialization.StringSerializer` | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
| valueSerializerClass | `false` | `org.apache.kafka.common.serialization.ByteArraySerializer` | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
36 changes: 36 additions & 0 deletions site2/docs/io-kinesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
id: io-kinesis
title: AWS Kinesis Connector
sidebar_label: AWS Kinesis Connector
---

## Sink

The Kinesis Sink connector is used to pull data from Pulsar topics and persist the data into
AWS Kinesis.

### Sink Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| awsEndpoint | `true` | null | kinesis end-point url can be found at : https://docs.aws.amazon.com/general/latest/gr/rande.html |
| awsRegion | `true` | null | appropriate aws region eg: us-west-1, us-west-2 |
| awsKinesisStreamName | `true` | null | kinesis stream name |
| awsCredentialPluginName | `false` | null | Fully-Qualified class name of implementation of {@inject: github:`AwsCredentialProviderPlugin`:/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/AwsCredentialProviderPlugin.java}. It is a factory class which creates an AWSCredentialsProvider that will be used by Kinesis Sink. If it is empty then KinesisSink will create a default AWSCredentialsProvider which accepts json-map of credentials in `awsCredentialPluginParam` |
| awsCredentialPluginParam | `false` | null | json-parameters to initialize `AwsCredentialsProviderPlugin` |
| messageFormat | `true` | `ONLY_RAW_PAYLOAD` | Message format in which kinesis sink converts pulsar messages and publishes to kinesis streams |

### Message Formats

The available message formats are listed as below:

#### **ONLY_RAW_PAYLOAD**

Kinesis sink directly publishes pulsar message payload as a message into the configured kinesis stream.
#### **FULL_MESSAGE_IN_JSON**

Kinesis sink creates a json payload with pulsar message payload, properties and encryptionCtx, and publishes json payload into the configured kinesis stream.

#### **FULL_MESSAGE_IN_FB**

Kinesis sink creates a flatbuffer serialized paylaod with pulsar message payload, properties and encryptionCtx, and publishes flatbuffer payload into the configured kinesis stream.
20 changes: 20 additions & 0 deletions site2/docs/io-mongo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
id: io-mongo
title: MongoDB Connector
sidebar_label: MongoDB Connector
---

## Sink

The MongoDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
to a collection.

## Sink Configuration Options

| Name | Default | Required | Description |
|------|---------|----------|-------------|
| `mongoUri` | `null` | `true` | The uri of mongodb that the connector connects to (see: https://docs.mongodb.com/manual/reference/connection-string/). |
| `database` | `null` | `true` | The name of the database to which the collection belongs to. |
| `collection` | `null` | `true` | The collection name that the connector writes messages to. |
| `batchSize` | `100` | `false` | The batch size of write to the collection. |
| `batchTimeMs` | `1000` | `false` | The batch operation interval in milliseconds. |
2 changes: 1 addition & 1 deletion site2/docs/io-netty-source.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
id: io-netty
id: io-netty-source
title: Netty source connector
sidebar_label: Netty source connector
---
Expand Down
20 changes: 20 additions & 0 deletions site2/docs/io-netty.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
id: io-netty
title: Netty Tcp or Udp Connector
sidebar_label: Netty Tcp or Udp Connector
---

## Source

The Netty Source connector opens a port that accept incoming data via the configured network protocol and publish it to a user-defined Pulsar topic.
Also, this connector is suggested to be used in a containerized (e.g. k8s) deployment.
Otherwise, if the connector is running in process or thread mode, the instances may be conflicting on listening to ports.

### Source Configuration Options

| Name | Required | Default | Description |
|------|----------|---------|-------------|
| `type` | `false` | `tcp` | The network protocol over which data is trasmitted to netty. Valid values include HTTP, TCP, and UDP |
| `host` | `false` | `127.0.0.1` | The host name or address that the source instance to listen on. |
| `port` | `false` | `10999` | The port that the source instance to listen on. |
| `numberOfThreads` | `false` | `1` | The number of threads of Netty Tcp Server to accept incoming connections and handle the traffic of the accepted connections. |
2 changes: 1 addition & 1 deletion site2/docs/io-rabbitmq-source.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
id: io-rabbitmq
id: io-rabbitmq-source
title: RabbitMQ source connector
sidebar_label: RabbitMQ source connector
---
Expand Down
Loading

0 comments on commit a71ac20

Please sign in to comment.