IKE (Interactive Knowledge Extraction)

Usage Guide

First time users, please refer the IKE Getting Started Guide

Run Locally

Install PostgreSQL locally and create a database for use by IKE. IKE needs this to store the tables you create.
Modify the Tablestore key value setting in the [IKE config] (https://github.com/allenai/ike/blob/master/src/main/resources/application.conf) with appropriate database JDBC URL and credentials.
Run sbt.
Enter the reStart command.
Open http://localhost:8080 in a browser.

The webapp will download some large files from the datastore upon first request. This could take several minutes. You will see a series of messages that look like the following:

ike 2016-05-11 13:46:27,070 INFO  org.allenai.datastore.Datastore - Downloading org.allenai.dictionary.indexes/WaterlooFilteredV2Shard4-d1.zip from the public datastore. 1.23 GB bytes read.
ike 2016-05-11 13:46:28,260 INFO  org.allenai.datastore.Datastore - Downloading org.allenai.dictionary.indexes/WaterlooFilteredV2Shard4-d1.zip from the public datastore. 1.23 GB bytes read.
ike 2016-05-11 13:46:44,521 INFO  org.allenai.datastore.Datastore - Downloading org.allenai.dictionary.indexes/WaterlooFilteredV2Shard4-d1.zip from the public datastore. 1.23 GB bytes read.

On subsequent runs, the service will start up quickly as the downloaded indexes are cached locally.

Creating and using an Index

To create an index, you need the source text either as a directory of text files, or as one file with one document per line. Once you have that, run this in sbt:

ike/runMain org.allenai.ike.index.CreateIndex --help

At the time of writing, this prints

Usage: CreateIndex [options]
 -d <value> | --destination <value>
       Directory to create the index in
 -b <value> | --batchSize <value>
       Batch size
 -t <value> | --textSource <value>
       URL of a file or directory to load the text from
 --help

The URL for the corpus can be either a file URL or a datastore URL. A datastore URL looks like this: datastore://{public|private}/<group>/<name>-v<version>.<ext> for files, and datastore://{public|private}/<group>/<name>-d<version> for directories.

NOTE: The private datastore resources are for AI2 users only.

You can also point to a corpus directory in your file system by using a file:// URL, for e.g., file://path/to/my/corpus/directory.

When you have created the index, you can use it by modifying application.conf and restarting.

Index Size Limits

A Blacklab index size will typically be 12-13x the size of the input corpus being indexed. Our testing on an Amazon r3.2xlarge instance indicated that an index size of upto 2 GB could be processed with reasonable speed when loaded into IKE. If you have a large corpus, one way to get around the size limits is to shard the corpus and create multiple indexes, each within the size limit to load into IKE.

Supported Platforms

IKE has been built, tested and verified to work on Linux and Mac systems. However, if you are interested in developing / running on Windows, please see the instructions from [diniluca1789] (https://github.com/diniluca1789), an external IKE user who got it successfully building and running on Windows, as described in this thread.

AI2 Internal Information

AI2 internal users, please go to this link.

Name		Name	Last commit message	Last commit date
Latest commit History 1,090 Commits
images		images
project		project
src		src
webapp		webapp
.gitignore		.gitignore
README-AI2.md		README-AI2.md
README.md		README.md
USAGE-GUIDE.md		USAGE-GUIDE.md
build.sbt		build.sbt
license.txt		license.txt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IKE (Interactive Knowledge Extraction)

Usage Guide

Run Locally

Creating and using an Index

Index Size Limits

Supported Platforms

AI2 Internal Information

About

Releases

Packages

Languages

License

hunchera/ike

Folders and files

Latest commit

History

Repository files navigation

IKE (Interactive Knowledge Extraction)

Usage Guide

Run Locally

Creating and using an Index

Index Size Limits

Supported Platforms

AI2 Internal Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages