Note: This data source was added after the 1.2 release, so it will not be available before 1.3 is released. Unless you check out the source code from Github and compile it yourself.
Suppose you have a collection named "newUsers" in a "gatheredData" DB.
If this is the "schema":
{
_id: ObjectId("54107bb33f2a38e1e44e9961"),
name:"Adolf",
address:{
street:"Rue Av.",
number: 102,
zip-code: 4106
}
}
and you just want to consider "name" and "address.zip-code" fields:
<data-source class="no.priv.garshol.duke.datasources.MongoDBDataSource">
<param name="server-address" value="domain.com"/>
<param name="port-number" value="27017"/>
<param name="database" value="gatheredData"/>
<param name="collection" value="newUsers"/>
<param name="projection" value="{_id:0, name:1, address.zip-code:1}"/>
<column property="Name" name="name"/>
<column property="ZipCode" name="address.zip-code"/>
</data-source>
Note that the fields "name" and "address.zip-code" have been mapped into "Name" and "ZipCode" in the column definition.
- Required:
- database
- collection
- Optional (and default values):
- server-address: "localhost"
- port-number: "27017:
- db-auth: "false" (other possible values: "true" and "admin", case unsensitive)
- user-name: required if db-auth is set to "true" or "admin"
- password: required if db-auth is set to "true" or "admin"
- cursor-notimeout: "false" (can be set to "true")
- query: "{}" (query all documents in the collection)
- projection: if not set, no projection will be performed
- If a parameter is invalid (port-number, server-address, collection, etc.) an error will be thrown. The value for port-number must be a valid port number (between 1 and 65535), otherwhise the default value (27017) will be left. 2a. Setting a certain parameter to "" is the same as not setting it (commenting/omitting the assignment). 2b. If port-number it not going to be specified, it is preferable to comment (or omit) the assignment. An error with a specific message will be thrown if it's set to "" or to an unparseable string.
- When cursor-notimeout is not set (or set to "false") and BATCH_SIZE < db[collection].count(query), it's possible that when Duke is trying to fetch the next batch (by performing a MongoDB getmore) the cursor is already timed out. To prevent this, set cursor-notimeout to "true". 4a. If db-auth is set to "true", the connection will try to find the credentials (user+password) in the DB specified by the database parameter. When set to "admin", it will try to find those credentials in the "admin" DB. 4b. If db-auth is set to a value different from "false", "true" or "admin" (case insensitive), this setting will be ignored and the field will preserve its default value.
- The query and projection parameters have to be valid JSON formatted. There's no need to add quotes, even if they operate on a nested field. If format is not valid, an error will be thrown.
- You could skip the projection operator and perform the projection in the column definition, but a lot of very large documents would imply a significant slowdown caused by network traffic.
- The query document can contain operators like $orderby, $showDiskLoc, etc. See http://docs.mongodb.org/manual/reference/operator/query-modifier/
- Deal with arrays of values and arrays of documents. Given that Duke supports multivalued columns, this is desired.
- Accept different authentication methods.
- Temporary index creation if needed.
- Support for aggregate() command.