Skip to content

Service that produces Translator compliant nodes given a curie

License

Notifications You must be signed in to change notification settings

TomConlin/NodeNormalization

 
 

Repository files navigation

Build Status

NodeNormalization

Introduction

Node normalization takes a CURIE, and returns:

  • The preferred CURIE for this entity
  • All other known equivalent identifiers for the entity
  • Semantic types for the entity as defined by the Biolink Model

The data currently served by Node Normalization is created by the prototype project Babel, which attempts to find identifier equivalences, and makes sure that CURIE prefixes are BioLink Model compliant. The NodeNormalization service, however, is independent of Babel and as improved identifier equivalence tools are developed, their results can be easily incorporated.

To determine whether Node Normalization is likely to be useful, check /get_semantic_types, which lists the BioLink semantic types for which normalization has been attempted, and /get_curie_prefixes, which lists the number of times each prefix is used for a semantic type.

For examples of service usage, see the example notebook.

Most users of NodeNormalization can access it via the public service but instructions follow for standing up a new instance of the service.

Installation

Create a virtual environment

> python -m venv nodeNormalization-env

Activate the virtual environment

# on Linux
> source nodeNodemaization-env/bin/activate
# on Windows
> source nodeNormalization-env/Scripts/activate 

Install requirements

> pip install -r requirements.txt

Loading Redis

Starting redis server

The Load script can be used to put data to a running Redis instance. Inline with this we recommend using R3 (Redis-REST with referencing).

Config

Once we have a running redis-server we can modify our config file located at ./config.json as the following.

{
"compendium_directory": "<path to files>",
"redis_port": <redis-server-port>,
"redis_host": "<redis-host>",
"redis_password": "<redis-password>"
}   

compendium_directory Is a path to the files that are going to be loaded to the redis instance. And example of the files' contents
looks like :

{id": {"identifier": "PUBCHEM:50986940"}, "equivalent_identifiers": [{"identifier": "PUBCHEM:50986940"}, {"identifier": "INCHIKEY:CYMOSKLLKPIPCD-UHFFFAOYSA-N"}], "type": ["chemical_substance", "named_thing", "biological_entity", "molecular_entity"]}
{"id": {"identifier": "CHEMBL.COMPOUND:CHEMBL1546789", "label": "CHEMBL1546789"}, "equivalent_identifiers": [{"identifier": "CHEMBL.COMPOUND:CHEMBL1546789", "label": "CHEMBL1546789"}, {"identifier": "PUBCHEM:4879549"}, {"identifier": "INCHIKEY:FUIYIXDZTPMQEH-UHFFFAOYSA-N"}], "type": ["chemical_substance", "named_thing", "biological_entity", "molecular_entity"]}

where each line is a json parsable entry.

Loading

After the proper configuration run

> cd  src
> python load.py

About

Service that produces Translator compliant nodes given a curie

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Dockerfile 3.2%