Skip to content

JamesMcGuigan/elasticsearch-tweets

Repository files navigation

Kaggle Tweets with ElasticSearch

ElasticSearch visualization of the Kaggle Disaster Tweets dataset using React Google Maps.

CSV enriched with Google API geolocation data, imported via bulk-upload + scan-and-scroll nodejs data pipeline. Strict json schema, with zero-downtime SchemaUpdate.mjs reindexing script. React Google Maps visualization of the dataset, with dynamic search and filtering UI. Extensive documentation on ElasticSearch features, queries and upgrade methodology.

ElasticSearch Writeups

ElasticSearch Schemas

Javascript Code Examples

ElasticSearch Hosting with Bonsai

.env

.env secrets are not committed to github

cp ./.env.template       ./.env 
cp ./.env.local.template ./.env.local 
vim ./.env ./.env.local

cat .env

USERNAME=
PASSWORD=
ELASTICSEARCH=kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443
INDEX=twitter
ELASTICSEARCH_URL=https://$USERNAME:$PASSWORD@$ELASTICSEARCH
SCHEMA=server/schema.json5
GEOCODE_API_KEY=
MAPS_API_KEY=

cat .env.local

NEXT_PUBLIC_USERNAME=
NEXT_PUBLIC_PASSWORD=
NEXT_PUBLIC_ELASTICSEARCH=kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443
NEXT_PUBLIC_ELASTICSEARCH_URL=https://${NEXT_PUBLIC_USERNAME}:${NEXT_PUBLIC_PASSWORD}@kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443
NEXT_PUBLIC_INDEX=twitter
NEXT_PUBLIC_MAPS_API_KEY=

Create Index and Reingest

nvm use --lts
bash ./server/schema.sh     
node --experimental-modules ./server/ingest.mjs
node --experimental-modules ./server/geocode.mjs 
node --experimental-modules ./server/deleteOverage.mjs 
{"acknowledged":true}
{"acknowledged":true,"shards_acknowledged":true,"index":"twitter"}
green open twitter   Qews-7jyTPGhTCb45z3eyA 1 1 0 0   460b   230b
green open .kibana_1 XsYy7txoR8Oa178heSj9OA 1 1 8 0 97.6kb 35.4kb

ingest: 0 documents in kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
ingest: ./input/test.csv     into 3263 documents in 421ms
ingest: ./input/train.csv    into 7613 documents in 990ms
ingest: 10876 documents in kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 198 documents in 192ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 164 documents in 124ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 228 documents in 161ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 170 documents in 117ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 112 documents in 86ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter
geocode: updated 104 documents in 105ms for kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443/twitter

Update Schema and Reindex

With .env file

node --experimental-modules ./server/SchemaUpdate.mjs

Without .env file (or to override)

node --experimental-modules ./server/SchemaUpdate.mjs \
--schema ./server/schema.json5 \
--alias twitter \
--elasticsearch https://kaggle-tweets-7601590568.eu-west-1.bonsaisearch.net:443 \
--username username \
--password password