Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
B3DB		B3DB
cleaning		cleaning
grouping		grouping
preprocessing		preprocessing
raw_data		raw_data
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

About B3DB

In this repo, we present a large benchmark dataset, Blood-Brain Barrier Database (B3DB), complied from 50 published resources (as summaried at raw_data/raw_data_summary.tsv) and categorized based on experimental uncertainty. A subset of the molecules in B3DB has numerical logBB values (1058 compounds), while the whole dataset has categorical (BBB+ or BBB-) BBB permeability labels (7807). Some physicochemical properties of the molecules are also provided.

Citation

Please use the following citation in any publication using Procrustes library:

"B3DB: A Curated Database of Blood-Brain Barrier Permeability and Chemical Descriptors for a Diverse Set of Compounds", F. Meng, et al.

To be updated once the publication is out.

Features of B3DB

The largest dataset with numerical and categorical values for Blood-Brain Barrier small molecules (to the best of our knowledge as of 2021/Feb/25).
Inclusion of sterochemistry information with isomeric SMILES with chiral specifications if available. Otherwise, canonical SMILES are used.
Characterization of uncertainty of experimental measurements by grouping the collected molecular data records.
Extended datasets for numerical and categorical data with precomputed physicochemical properties using mordred.

Usage

There are two types of dataset in B3DB, regression data and classification data and they can just simply load with pandas library. For example

import pandas as pd

# load regression data
regression_data = pd.read_csv("B3DB/B3DB_regression.tsv",
                              sep="\t")

# load classification data
classification_data = pd.read_csv("B3DB/B3DB_classification.tsv",
                                  sep="\t")

Working environment setting up

All the calculations were performed in a Python 3.7.9 virtual environment created with Conda in CentOS Linux release 7.9.2009 includes Python packages,

ChEMBL_Structure_Pipeline==1.0.0, https://github.com/chembl/ChEMBL_Structure_Pipeline/
RDKit==2020.09.1, https://www.rdkit.org/
openeye-toolkit===2020.2.0, https://docs.eyesopen.com/toolkits/python/index.html/
mordred==1.1.2, https://github.com/mordred-descriptor/mordred/ (required networkx==2.3.0)
numpy==1.19.2, https://numpy.org/
pandas==1.2.1, https://pandas.pydata.org/
pubchempy==1.0.4, https://github.com/mcs07/PubChemPy/
PyTDC==0.1.5, https://github.com/mims-harvard/TDC/
SciPy==1.5.2, https://www.scipy.org/
tabula-py==2.2.0, https://pypi.org/project/tabula-py/

We will create a virtual environment named bbb_data with Python 3.7.9 first,

conda create bbb_py37 python=3.7.9

Given that RDKit, ChEMBL_Structure_Pipeline are not available in PyPI and we will install them with conda,

# activate virtual environment
conda activate bbb_py37

conda install -c rdkit rdkit=2020.09.1.0
conda install -c conda-forge chembl_structure_pipeline=1.0.0
# https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html
conda install -c openeye openeye-toolkits=2020.2.0

Then we can install the requirements in requirements.txt with

pip install -r requirements.txt

An easier way is to run the follow script with bash,

#!/bin/bash

# create virtual environment
conda create bbb_py37 python=3.7.9
# activate virtual environment
conda activate bbb_py37

# install required packages
conda install -c rdkit rdkit=2020.09.1.0
conda install -c conda-forge chembl_structure_pipeline=1.0.0
pip install -r requirements.txt

ALOGPS version 2.1 can be accessed at http://www.vcclab.org/lab/alogps/.

The materials and data under this repo are under CC-BY-4.0 Licence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About B3DB

Citation

Features of B3DB

Usage

Working environment setting up

About

Releases

Packages

Languages

License

nishsm/B3DB

Folders and files

Latest commit

History

Repository files navigation

About B3DB

Citation

Features of B3DB

Usage

Working environment setting up

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages