Skip to content

Commit

Permalink
Create Extract.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mausch committed May 2, 2015
1 parent 4e1f8ad commit 0163864
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions Documentation/Extract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Binary document upload

SolrNet supports Solr "extract" feature (a.k.a. Solr "Cell") to index data from binary document formats such as Word, PDF, etc.

Here's a simple example showing how to extract text from a PDF file, without indexing it:

```csharp
ISolrOperations<Something> solr = ...
using (var file = File.OpenRead(@"test.pdf")) {
var response = solr.Extract(new ExtractParameters(file, "some_document_id") {
ExtractOnly = true,
ExtractFormat = ExtractFormat.Text,
});
Console.WriteLine(response.Content);
}
```

`ExtractOnly = true` tells Solr to just perform text extraction but not index the uploaded document.
If `ExtractOnly = false` you can add more fields with the `Fields` property.
Other options can be set through the properties of the [`ExtractParameters` class](https://github.com/mausch/SolrNet/blob/master/SolrNet/ExtractParameters.cs).
It's usually recommended to provide the `StreamType` for the content, as auto-detection might fail.

For more details about each option in `ExtractParameters` see the [Solr wiki](https://wiki.apache.org/solr/ExtractingRequestHandler) and the [Solr reference guide](https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika).

0 comments on commit 0163864

Please sign in to comment.