Skip to content

Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3

License

Notifications You must be signed in to change notification settings

jhecking/tika-lambda

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3.

Based on original Tika Lambda version in gnethercutt/tika-lambda as well as subsequent enhancements in DovetailSoftware/tika-lambda as well as cmaxwellau/tika-lambda.

This version of the Tika Lambda function adds:

  • An AWS Serverless Application Model template to easily package & deploy the function.
  • A Gradle build file to build, package & deploy the application.
  • Configurable S3 bucket prefix & extension for extracted data.

AWS CloudFormation Template

Parameters

Key Default Description
DocumentBucket - S3 Bucket Name
SourcePrefix upload/ Folder prefix which the Lambda will watch for new uploads
TargetPrefix extracted/ Folder prefix under which the extracted data will be stored
TargetExtension .extracted.json Filename extension for the extracted data files (JSON)

About

Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%