Data Management Using Python Library

https://data-management-python.readthedocs.io

This repository contains the core Python library developed and maintained by the NIHR Imperial BRC Genomics Facility for managing raw and processed genomic datasets efficiently.

Key Features

1. Metadata Management

Utilizes an extended ENA metadata model for managing information about:
- Projects
- Samples
- Sequencing runs
- Analysis
- File paths and
- Pipeline instances

2. Genomic Sequencing Runs Processing

Tracks ongoing sequencing runs and initiates processing upon completion.
Generates summary reports and sends email notifications to users.

3. Analysis Pipelines

Includes wrappers for both community-developed and vendor-provided data pipelines.
Automates:
- Configuration generation
- Input formatting
Executes external pipelines on HPC using bash script wrappers.
Manages post-processing, including:
- Custom report generation
- Analysis data validation

Requirements

• Python v3.9

Installation

1. Clone the Repository

git clone https://github.com/imperial-genomics-facility/data-management-python.git

2. Install Dependencies Install required Python libraries:

pip install -r requirements_2.6.2.txt  # For compatibility with Apache Airflow v2.6.2

3. Update PYTHONPATH Add the core library path to PYTHONPATH:

export PYTHONPATH=/PATH/data-management-python

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3,753 Commits
.github/workflows		.github/workflows
data		data
doc		doc
ehive		ehive
igf_airflow		igf_airflow
igf_data		igf_data
igf_nextflow		igf_nextflow
igf_portal		igf_portal
migrations		migrations
scripts		scripts
sql/igfdb		sql/igfdb
template		template
test		test
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
full_environment.yml		full_environment.yml
requirements_2.6.1.txt		requirements_2.6.1.txt
requirements_2.6.2.txt		requirements_2.6.2.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Management Using Python Library

Key Features

Requirements

Installation

License

About

Releases 12

Packages

Languages

License

imperial-genomics-facility/data-management-python

Folders and files

Latest commit

History

Repository files navigation

Data Management Using Python Library

Key Features

Requirements

Installation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 12

Packages 0

Languages

Packages