Fix website io connector 404 (apache#5347)

adamkeyser · Oct 24, 2019 · a71ac20 · a71ac20
1 parent b4e5d15
commit a71ac20
Show file tree

Hide file tree

Showing 19 changed files with 412 additions and 4 deletions.
diff --git a/site2/docs/io-aerospike.md b/site2/docs/io-aerospike.md
@@ -0,0 +1,21 @@
+---
+id: io-aerospike
+title: Aerospike Sink Connector
+sidebar_label: Aerospike Sink Connector
+---
+
+The Aerospike Sink connector is used to write messages to an Aerospike Cluster.
+
+## Sink Configuration Options
+
+The following configuration options are specific to the Aerospike Connector:
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| `seedHosts` | `true` | `null` | Comma seperated list of one or more Aerospike cluster hosts; each host can be specified as a valid IP address or hostname followed by an optional port number (default is 3000). | 
+| `keyspace` | `true` | `null` | Aerospike namespace to use. |
+| `keySet` | `false` | `null` | Aerospike set name to use. |
+| `columnName` | `true` | `null` | Aerospike bin name to use. |
+| `maxConcurrentRequests` | `false` | `100` | Maximum number of concurrent Aerospike transactions that a Sink can open. |
+| `timeoutMs` | `false` | `100` | A single timeout value controls `socketTimeout` and `totalTimeout` for Aerospike transactions.  |
+| `retries` | `false` | `1` | Maximum number of retries before aborting a write transaction to Aerospike. |
diff --git a/site2/docs/io-cassandra.md b/site2/docs/io-cassandra.md
@@ -0,0 +1,22 @@
+---
+id: io-cassandra
+title: Cassandra Sink Connector
+sidebar_label: Cassandra Sink Connector
+---
+
+The Cassandra Sink connector is used to write messages to a Cassandra Cluster.
+
+The tutorial [Connecting Pulsar with Apache Cassandra](io-quickstart.md) shows an example how to use Cassandra Sink
+connector to write messages to a Cassandra table.
+
+## Sink Configuration Options
+
+All the Cassandra sink settings are listed as below. All the settings are required to run a Cassandra sink.
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `roots` | `null` | `true` | Cassandra Contact Points. A list of one or many node address. It is a comma separated `String`. |
+| `keyspace` | `null` | `true` | Cassandra Keyspace name. The keyspace should be created prior to creating the sink. |
+| `columnFamily` | `null` | `true` | Cassandra ColumnFamily name. The column family should be created prior to creating the sink. |
+| `keyname` | `null` | `true` | Key column name. The key column is used for storing Pulsar message keys. If a Pulsar message doesn't have any key associated, the message value will be used as the key. |
+| `columnName` | `null` | `true` | Value column name. The value column is used for storing Pulsar message values. |
diff --git a/site2/docs/io-connectors.md b/site2/docs/io-connectors.md
@@ -25,7 +25,6 @@ Name|Java class
 [Netty](io-netty-source.md)|[org.apache.pulsar.io.netty.NettySource.java](https://github.com/apache/pulsar/blob/master/pulsar-io/netty/src/main/java/org/apache/pulsar/io/netty/NettySource.java)
 [RabbitMQ](io-rabbitmq-source.md)|[org.apache.pulsar.io.rabbitmq.RabbitMQSource.java](https://github.com/apache/pulsar/blob/master/pulsar-io/rabbitmq/src/main/java/org/apache/pulsar/io/rabbitmq/RabbitMQSource.java)
 
-
 ## Sink connector
 
 Pulsar has various sink connectors, which are sorted alphabetically as below.

diff --git a/site2/docs/io-elasticsearch.md b/site2/docs/io-elasticsearch.md
@@ -0,0 +1,21 @@
+---
+id: io-elasticsearch
+title: ElasticSearch Connector
+sidebar_label: ElasticSearch Connector
+---
+
+## Sink
+
+The ElasticSearch Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a index.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `elasticSearchUrl` | `null` | `true` | The url of elastic search cluster that the connector connects to. |
+| `indexName` | `null` | `true` | The index name that the connector writes messages to. |
+| `indexNumberOfShards` | `1` | `false` | The number of shards of the index. |
+| `indexNumberOfReplicas` | `1` | `false` | The number of replicas of the index. |
+| `username` | `null` | `false` | The username used by the connector to connect to the elastic search cluster. If username is set, a password should also be provided. |
+| `password` | `null` | `false` | The password used by the connector to connect to the elastic search cluster. If password is set, a username should also be provided. |
diff --git a/site2/docs/io-file-source.md b/site2/docs/io-file-source.md
@@ -1,5 +1,5 @@
 ---
-id: io-file
+id: io-file-source
 title: File source connector
 sidebar_label: File source connector
 ---

diff --git a/site2/docs/io-file.md b/site2/docs/io-file.md
@@ -0,0 +1,27 @@
+---
+id: io-file
+title: File Connector
+sidebar_label: File Connector
+---
+
+## Source
+
+The File Source Connector is used to pull messages from files in a directory and persist the messages
+to a Pulsar topic.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| inputDirectory | `true` | `null` | The input directory from which to pull files. |
+| recurse | `false` | `true` | Indicates whether or not to pull files from sub-directories. |
+| keepFile | `false` | `false` | If true, the file is not deleted after it has been processed and causes the file to be picked up continually. |
+| fileFilter | `false` | `[^\\.].*` | Only files whose names match the given regular expression will be picked up. |
+| pathFilter | `false` | `null` | When 'recurse' property is true, then only sub-directories whose path matches the given regular expression will be scanned. |
+| minimumFileAge | `false` | `0` | The minimum age that a file must be in order to be processed; any file younger than this amount of time (according to last modification date) will be ignored. |
+| maximumFileAge | `false` | `Long.MAX_VALUE` | The maximum age that a file must be in order to be processed; any file older than this amount of time (according to last modification date) will be ignored. |
+| minimumSize | `false` | `1` | The minimum size (in bytes) that a file must be in order to be processed. |
+| maximumSize | `false` | `Double.MAX_VALUE` | The maximum size (in bytes) that a file can be in order to be processed. |
+| ignoreHiddenFiles | `false` | `true` | Indicates whether or not hidden files should be ignored or not. |
+| pollingInterval | `false` | `10000` | Indicates how long to wait before performing a directory listing. |
+| numWorkers | `false` | `1` | The number of worker threads that will be processing the files. This allows you to process a larger number of files concurrently. However, setting this to a value greater than 1 will result in the data from multiple files being "intermingled" in the target topic. |
diff --git a/site2/docs/io-hdfs.md b/site2/docs/io-hdfs.md
@@ -0,0 +1,26 @@
+---
+id: io-hdfs
+title: Hdfs Connector
+sidebar_label: Hdfs Connector
+---
+
+## Sink
+
+The Hdfs Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a hdfs file.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `hdfsConfigResources` | `null` | `true` | A file or comma separated list of files which contains the Hadoop file system configuration, e.g. 'core-site.xml', 'hdfs-site.xml'. |
+| `directory` | `null` | `true` | The HDFS directory from which files should be read from or written to. |
+| `encoding` | `null` | `false` | The character encoding for the files, e.g. UTF-8, ASCII, etc. |
+| `compression` | `null` | `false` | The compression codec used to compress/de-compress the files on HDFS. |
+| `kerberosUserPrincipal` | `null` | `false` | The Kerberos user principal account to use for authentication. |
+| `keytab` | `null` | `false` | The full pathname to the Kerberos keytab file to use for authentication. |
+| `filenamePrefix` | `null` | `false` | The prefix of the files to create inside the HDFS directory, i.e. a value of "topicA" will result in files named topicA-, topicA-, etc being produced. |
+| `fileExtension` | `null` | `false` | The extension to add to the files written to HDFS, e.g. '.txt', '.seq', etc. |
+| `separator` | `null` | `false` | The character to use to separate records in a text file. If no value is provided then the content from all of the records will be concatenated together in one continuous byte array. |
+| `syncInterval` | `null` | `false` | The interval (in milliseconds) between calls to flush data to HDFS disk. |
+| `maxPendingRecords` | `Integer.MAX_VALUE` | `false` | The maximum number of records that we hold in memory before acking. Default is `Integer.MAX_VALUE`. Setting this value to one, results in every record being sent to disk before the record is acked, while setting it to a higher values allows us to buffer records before flushing them all to disk. |
diff --git a/site2/docs/io-influxdb.md b/site2/docs/io-influxdb.md
@@ -0,0 +1,25 @@
+---
+id: io-influxdb
+title: InfluxDB Connector
+sidebar_label: InfluxDB Connector
+---
+
+## Sink
+
+The InfluxDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to an InfluxDB database.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `influxdbUrl` | `null` | `true` | The url of the InfluxDB instance to connect to. |
+| `username` | `null` | `false` | The username used to authenticate to InfluxDB. |
+| `password` | `null` | `false` | The password used to authenticate to InfluxDB. |
+| `database` | `null` | `true` | The InfluxDB database to write to. |
+| `consistencyLevel` | `ONE` | `false` | The consistency level for writing data to InfluxDB. Possible values [ALL, ANY, ONE, QUORUM]. |
+| `logLevel` | `NONE` | `false` | The log level for InfluxDB request and response. Possible values [NONE, BASIC, HEADERS, FULL]. |
+| `retentionPolicy` | `autogen` | `false` | The retention policy for the InfluxDB database. |
+| `gzipEnable` | `false` | `false` | Flag to determine if gzip should be enabled. |
+| `batchTimeMs` | `1000` | `false` | The InfluxDB operation time in milliseconds. |
+| `batchSize` | `200` | `false` | The batch size of write to InfluxDB database. |
diff --git a/site2/docs/io-jdbc.md b/site2/docs/io-jdbc.md
@@ -0,0 +1,23 @@
+---
+id: io-jdbc
+title: JDBC Connector
+sidebar_label: JDBC Connector
+---
+
+## Sink
+
+The JDBC Sink Connector is used to pull messages from Pulsar topics and persist the messages to an MySQL or Sqlite.
+Current support INSERT, DELETE and UPDATE.
+
+### Sink Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| userName | `false` | `` | Username used to connect to the database specified by `jdbcUrl`. |
+| password | `false` | `` | Password used to connect to the database specified by `jdbcUrl`. |
+| jdbcUrl | `true` | `` | The JDBC url of the database this connector connects to. |
+| tableName | `true` | `` | The name of the table this connector writes messages to. |
+| nonKey | `false` | `` | Fields used in update events. A comma-separated list. |
+| key | `false` | `` | Fields used in where condition of update and delete Events. A comma-separated list. |
+| timeoutMs | `false` | `500` | The jdbc operation timeout in milliseconds. |
+| batchSize | `false` | `200` | The batch size of updates made to the database. |
diff --git a/site2/docs/io-kafka.md b/site2/docs/io-kafka.md
@@ -0,0 +1,44 @@
+---
+id: io-kafka
+title: Kafka Connector
+sidebar_label: Kafka Connector
+---
+
+## Source
+
+The Kafka Source Connector is used to pull messages from Kafka topics and persist the messages
+to a Pulsar topic.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| bootstrapServers | `true` | `null` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. |
+| groupId | `true` | `null` | A unique string that identifies the consumer group this consumer belongs to. |
+| fetchMinBytes | `false` | `1` | Minimum bytes expected for each fetch response. |
+| autoCommitEnabled | `false` | `true` | If true, the consumer's offset will be periodically committed in the background. This committed offset will be used when the process fails as the position from which the new consumer will begin. |
+| autoCommitIntervalMs | `false` | `5000` | The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if `autoCommitEnabled` is set to true. |
+| heartbeatIntervalMs | `false` | `3000` | The interval between heartbeats to the consumer when using Kafka's group management facilities. |
+| sessionTimeoutMs | `false` | `30000` | The timeout used to detect consumer failures when using Kafka's group management facility. |
+| topic | `true` | `null` | Topic name to receive records from Kafka. |
+| consumerConfigProperties | `false` | `null` | The consumer config properties to be passed to Consumer. Note that other properties specified in the connector config file take precedence over this config. |
+| keyDeserializationClass | `false` | `org.apache.kafka.common.serialization.StringDeserializer` | Deserializer class for key that implements the org.apache.kafka.common.serialization.Deserializer interface. |
+| valueDeserializationClass | `false` | `org.apache.kafka.common.serialization.ByteArrayDeserializer` | Deserializer class for value that implements the org.apache.kafka.common.serialization.Deserializer interface. |
+
+## Sink
+
+The Kafka Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a Kafka topic.
+
+### Sink Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| bootstrapServers | `true` | `null` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. |
+| acks | `true` | `null` | The kafka producer acks mode. |
+| batchSize | `false` | `16384` | The kafka producer batch size. |
+| maxRequestSize | `false` | `1048576` | The maximum size of a request in bytes. |
+| topic | `true` | `null` | Topic name to receive records from Kafka. |
+| producerConfigProperties | `false` | `null` | The producer config properties to be passed to Producer. Note that other properties specified in the connector config file take precedence over this config. |
+| keySerializerClass | `false` | `org.apache.kafka.common.serialization.StringSerializer` | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
+| valueSerializerClass | `false` | `org.apache.kafka.common.serialization.ByteArraySerializer` | Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface. |
diff --git a/site2/docs/io-kinesis.md b/site2/docs/io-kinesis.md
@@ -0,0 +1,36 @@
+---
+id: io-kinesis
+title: AWS Kinesis Connector
+sidebar_label: AWS Kinesis Connector
+---
+
+## Sink
+
+The Kinesis Sink connector is used to pull data from Pulsar topics and persist the data into
+AWS Kinesis.
+
+### Sink Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| awsEndpoint | `true` | null | kinesis end-point url can be found at : https://docs.aws.amazon.com/general/latest/gr/rande.html |
+| awsRegion | `true` | null | appropriate aws region eg: us-west-1, us-west-2 |
+| awsKinesisStreamName | `true` | null | kinesis stream name |
+| awsCredentialPluginName | `false` | null | Fully-Qualified class name of implementation of {@inject: github:`AwsCredentialProviderPlugin`:/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/AwsCredentialProviderPlugin.java}. It is a factory class which creates an AWSCredentialsProvider that will be used by Kinesis Sink. If it is empty then KinesisSink will create a default AWSCredentialsProvider which accepts json-map of credentials in `awsCredentialPluginParam` | 
+| awsCredentialPluginParam | `false` | null | json-parameters to initialize `AwsCredentialsProviderPlugin` |
+| messageFormat | `true` | `ONLY_RAW_PAYLOAD` | Message format in which kinesis sink converts pulsar messages and publishes to kinesis streams |
+
+### Message Formats
+
+The available message formats are listed as below:
+
+#### **ONLY_RAW_PAYLOAD**
+
+Kinesis sink directly publishes pulsar message payload as a message into the configured kinesis stream.
+#### **FULL_MESSAGE_IN_JSON**
+
+Kinesis sink creates a json payload with pulsar message payload, properties and encryptionCtx, and publishes json payload into the configured kinesis stream.
+
+#### **FULL_MESSAGE_IN_FB**
+
+Kinesis sink creates a flatbuffer serialized paylaod with pulsar message payload, properties and encryptionCtx, and publishes flatbuffer payload into the configured kinesis stream.
diff --git a/site2/docs/io-mongo.md b/site2/docs/io-mongo.md
@@ -0,0 +1,20 @@
+---
+id: io-mongo
+title: MongoDB Connector
+sidebar_label: MongoDB Connector
+---
+
+## Sink
+
+The MongoDB Sink Connector is used to pull messages from Pulsar topics and persist the messages
+to a collection.
+
+## Sink Configuration Options
+
+| Name | Default | Required | Description |
+|------|---------|----------|-------------|
+| `mongoUri` | `null` | `true` | The uri of mongodb that the connector connects to (see: https://docs.mongodb.com/manual/reference/connection-string/). |
+| `database` | `null` | `true` | The name of the database to which the collection belongs to. |
+| `collection` | `null` | `true` | The collection name that the connector writes messages to. |
+| `batchSize` | `100` | `false` | The batch size of write to the collection. |
+| `batchTimeMs` | `1000` | `false` | The batch operation interval in milliseconds. |
diff --git a/site2/docs/io-netty-source.md b/site2/docs/io-netty-source.md
@@ -1,5 +1,5 @@
 ---
-id: io-netty
+id: io-netty-source
 title: Netty source connector
 sidebar_label: Netty source connector
 ---

diff --git a/site2/docs/io-netty.md b/site2/docs/io-netty.md
@@ -0,0 +1,20 @@
+---
+id: io-netty
+title: Netty Tcp or Udp Connector
+sidebar_label: Netty Tcp or Udp Connector
+---
+
+## Source
+
+The Netty Source connector opens a port that accept incoming data via the configured network protocol and publish it to a user-defined Pulsar topic.
+Also, this connector is suggested to be used in a containerized (e.g. k8s) deployment.
+Otherwise, if the connector is running in process or thread mode, the instances may be conflicting on listening to ports.
+
+### Source Configuration Options
+
+| Name | Required | Default | Description |
+|------|----------|---------|-------------|
+| `type` | `false` | `tcp` | The network protocol over which data is trasmitted to netty. Valid values include HTTP, TCP, and UDP |
+| `host` | `false` | `127.0.0.1` | The host name or address that the source instance to listen on. |
+| `port` | `false` | `10999` | The port that the source instance to listen on. |
+| `numberOfThreads` | `false` | `1` | The number of threads of Netty Tcp Server to accept incoming connections and handle the traffic of the accepted connections. |
diff --git a/site2/docs/io-rabbitmq-source.md b/site2/docs/io-rabbitmq-source.md
@@ -1,5 +1,5 @@
 ---
-id: io-rabbitmq
+id: io-rabbitmq-source
 title: RabbitMQ source connector
 sidebar_label: RabbitMQ source connector
 ---