Skip to content

Kevnz/star-wars-dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Star Wars Dataset

Sample dataset for test purposes.

Simply use the files in the directories json or xml, depending on your needs.

If you want to tweek the way data is collected, it is taken from the awesome SWAPI, the Star Wars API, to get the initial dataset (using scrapper.js). It then enriches it a bit with description texts from Wikipedia (using enrich.js). It also splits it up in individual files, both JSON and XML (using entities.js).

$ npm install
$ ./src/scrapper.js > data/swapi.json
$ ./src/enrich.js   > data/enriched.json
$ ./src/entities.js

To generate the data archive files:

$ cd ..
$ tar zcf star-wars-dataset/archive/star-wars-dataset.tar.gz star-wars-dataset/{README.md,data,csv,json,mlsem,ttl,xml}
$ zip -r star-wars-dataset/archive/star-wars-dataset.zip star-wars-dataset/{README.md,data,csv,json,mlsem,ttl,xml}

MarkLogic

If you want to get the files inserted into a markLogic database, you can look at the following code and adapt it as you see fit:

'use strict';

declareUpdate();

const base = '/';
const url  = 'https://raw.githubusercontent.com/fgeorges/star-wars-dataset/master/archive/star-wars-dataset.zip';

const got = xdmp.httpGet(url);
if ( fn.head(got).code !== 200 ) {
    throw new Error(`Not 200: {fn.head(got).code} - {fn.head(got).message}`);
}

const zip = fn.head(fn.tail(got));
const res = {skipped: 0, inserted: 0};

for ( const entry of xdmp.zipManifest(zip) ) {
    if ( ! entry.uncompressedSize ) {
        // skip directories
        ++ res.skipped;
        continue;
    }
    const uri = base + entry.path;
    const doc = xdmp.zipGet(zip, entry.path);
    xdmp.documentInsert(uri, doc);
    ++ res.inserted;
}

res;

About

Sample dataset for test purposes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%