LilyPads

This is the source code for the LilyPads application. LilyPads was presented at IVAPP 2020, and the publication can be found here. See How to Cite or the NOTICE file for how to reference the publication and the code.

Installation

To build, npm, sass and pip are required. To install the requisites, run the following commands:

$ npm install
$ python3 -m venv env
$ source env/bin/activate
(env) $ python setup.py install

Dataset Creation

Input Dataset Format

Input datasets are CSV files with the following columns:

Column Name	Description
`Index`	An index number unique to each row.
`Date`	The publication date, in `YYYY-mm-dd` format.
`Title (Newspaper)`	Title of the newspaper.
`Location`	Location of publication, as text.
`Search term`	Search term used to find the article (optional).
`Text`	The full text of the article.
`Language`	Language of the document.
`Corpus`	Corpus this was extracted from (optional).
`Link`	URL to the source.
`place_id`	ID of place (see Geolocations).
`translated`	Translated full text, optional, if the translation should be used for the word cloud.

Stop Words

Stop words are words that can be filtered out for text analysis because they carry little meaning by themselves. Examples are "the", "a", "for". In the context of OCR (optical character recognition), mis-scanned artifacts can also be considered stop words. Stop words can be provided to the conversion program by placing a newline-separated text file of such words in data/stopwords/stopwords.<lang>.txt, where <lang> is the ISO 639-1 code for the language in question (e.g., stopwords.en.txt).

Geolocations

The program needs to look up the geographical location of places of publication. Each article has a place_id field that references one such location. The geographical data must be placed as GZIP-ed JSON files in the folder data/geolocations.d/ (e.g., data/geolocations.d/geo.json.gz). Each JSON in that directory is a dictionary with the following structure (place-1 is an example for a place_id):

{
  "place-1": {
    "formatted_address": "New Orleans",
    "geometry": {
      "location": {
        "lat": 29.9510658,
        "lng": -90.0715323
      }
    },
    "place_id": "place-1"
  },
  ...
}

Metadata

Each dataset must have an attributed metadata file. That file has the same file name as the dataset file, but with the ending .meta.json (e.g., dataset.csv -> dataset.meta.json). The JSON contains at least a name field with the name of the dataset, and a roles field with an array of strings, which are the roles that may view the dataset (see User Management). The metadata can contain additional info as required, such as copyright statements, creation dates, etc. All that info will be included in generated datasets.

Creating the Datasets

The datasets can be built using the Makefile in the data/ directory. The Makefile looks for CSV files in the data/ directory and creates a dataset file from each of them. The Makefile in the root directory will also call the Makefile in the data/ directory.

User Management

The server expects an SQLite3 database in the working directory with the following structure:

CREATE TABLE users (
  id          TEXT PRIMARY KEY,
  password    TEXT,
  expires     DATE,
  roles       TEXT DEFAULT '',
  comment     TEXT DEFAULT NULL
);

The id field is the user's login name, the password is a hashed password entry as generated by htpasswd(1). The expires field can be used to specify when an account expires, and the user cannot login after that date. The roles field is a comma-separated list of roles the user is part of. The user is only allowed to see and load datasets which share at least one of these roles.

Build and Deployment

The entire project (JavaScript, assets, CSS, datasets) can be built using the provided Makefile:

$ make
$ # or, for a production build
$ make prod

The built project can then be bundled into a Python wheel file for deployment as such:

$ source env/bin/activate
(env) $ python setup.py bdist_wheel

This will generate a .whl file in dist/, which can be copied to the appropriate location. To install it there, create a virtual environment and install it using pip:

$ python3 -m venv env
$ source env/bin/activate
(env) $ pip install --upgrade path/to/wheelfile.whl

The server can be started using gunicorn:

$ source env/bin/activate
(env) $ gunicorn -b localhost:8000 lilypads:app

See also the host.sh file for an example. This would be the appropriate place to pass SSL certificates to the gunicorn process. Another possibility is to host the server behind an nginx or Apache httpd web server.

How to Cite

Max Franke, Markus John, Moritz Knabben, Jana Keck, Tanja Blascheck, and Steffen Koch. LilyPads: Exploring the Spatiotemporal Dissemination of Historical Newspaper Articles. In Proceedings of the 15th International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, pp. 17--28. SciTePress, 2020. DOI:10.5220/0008871400170028.

BibTeX:

@inproceedings{franke2020lilypads,
 author = {Franke, Max and John, Markus and Knabben, Moritz and Keck, Jana and Blascheck, Tanja and Koch, Steffen},
 title = {LilyPads: Exploring the Spatiotemporal Dissemination of Historical Newspaper Articles},
 journal = {Proceedings of the 15th International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP},
 publisher = {SciTePress},
 year = {2020},
 month = {2},
 pages = {17--28},
 doi = {10.5220/0008871400170028},
 isbn = {978-989-758-402-2},
 organization = {INSTICC}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
lilypads		lilypads
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
host.sh		host.sh
package-lock.json		package-lock.json
package.json		package.json
setup.py		setup.py
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LilyPads

Installation

Dataset Creation

Input Dataset Format

Stop Words

Geolocations

Metadata

Creating the Datasets

User Management

Build and Deployment

How to Cite

About

Releases 1

Packages

Contributors 2

Languages

License

UniStuttgart-VISUS/LilyPads

Folders and files

Latest commit

History

Repository files navigation

LilyPads

Installation

Dataset Creation

Input Dataset Format

Stop Words

Geolocations

Metadata

Creating the Datasets

User Management

Build and Deployment

How to Cite

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages