GitHub - jhecking/tika-lambda: Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3

Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3.

Based on original Tika Lambda version in gnethercutt/tika-lambda as well as subsequent enhancements in DovetailSoftware/tika-lambda as well as cmaxwellau/tika-lambda.

This version of the Tika Lambda function adds:

An AWS Serverless Application Model template to easily package & deploy the function.
A Gradle build file to build, package & deploy the application.
Configurable S3 bucket prefix & extension for extracted data.

AWS CloudFormation Template

Key	Default	Description
DocumentBucket	-	S3 Bucket Name
SourcePrefix	upload/	Folder prefix which the Lambda will watch for new uploads
TargetPrefix	extracted/	Folder prefix under which the extracted data will be stored
TargetExtension	.extracted.json	Filename extension for the extracted data files (JSON)

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src/main/java/extractor		src/main/java/extractor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
template.yaml		template.yaml