Any contribution to DGL is welcome. This guide covers everything about how to contribute to DGL.
A non-inclusive list of types of contribution is as follows:
- New features and enhancements (example).
- New NN Modules (example).
- Bugfix (example).
- Document improvement (example).
- New models and examples (example).
For features and bugfix, we recommend first raise an issue using the corresponding issue template, so that the change could be fully discussed with the community before implementation. For document improvement and new models, we suggest post a thread in our discussion forum.
Before development, please first read the following sections about coding styles and testing. All the changes need to be reviewed in the form of pull request. Our committors (who have write permission on the repository) will review the codes and suggest the necessary changes. The PR could be merged once the reviewers approve the changes.
First, fork the DGL github repository. Suppose the forked repo is https://github.com/username/dgl
.
Clone your forked repository locally:
git clone --recursive https://github.com/username/dgl.git
Setup the upstream to the DGL official repository:
git remote add upstream https://github.com/dmlc/dgl.git
You could verify the remote setting by typing git remote -v
:
origin https://github.com/username/dgl.git (fetch)
origin https://github.com/username/dgl.git (push)
upstream https://github.com/dmlc/dgl.git (fetch)
upstream https://github.com/dmlc/dgl.git (push)
During developing, we suggest work on another branch than the master.
git branch working-branch
git checkout working-branch
Once the changes are done, create a pull request so we could review your codes.
Once the pull request is merged, update your forked repository and delete your working branch:
git checkout master
git pull upstream master
git push origin master # update your forked repo
git branch -D working-branch # the local branch could be deleted
For python codes, we generally follow the PEP8 style guide. The python comments follow NumPy style python docstrings.
For C++ codes, we generally follow the Google C++ style guide. The C++ comments should be Doxygen compatible.
Coding styles check is mandatory for every pull requests. To ease the development, please check it locally first (require cpplint and pylint to be installed first):
bash tests/scripts/task_lint.sh
The python code style configure file is tests/scripts/pylintrc
. We tweak it a little bit from
the standard. For example, following variable names are accepted:
i,j,k
: for loop variablesu,v
: for representing nodese
: for representing edgesg
: for representing graphfn
: for representing functionsn,m
: for representing sizesw,x,y
: for representing weight, input, output tensors_
: for unused variables
To contribute a new model within a specific supported tensor framework (e.g. PyTorch, or MXNet), simply
- Make a directory with the name of your model (say
awesome-gnn
) within the directoryexamples/${DGLBACKEND}
where${DGLBACKEND}
refers to the framework name. - Populate it with your work, along with a README. Make a pull request once you are done. Your README should contain at least these:
- Instructions for running your program.
- The performance results, such as speed or accuracy or any metric, along with comparisons against some alternative implementations (if available).
- Your performance metric does not have to beat others' implementation; they are just a signal of your code being likely correct.
- Your speed also does not have to surpass others'.
- However, better numbers are always welcomed.
- The committers will review it, suggesting or making changes as necessary.
- Resolve the suggestions and reviews, and go back to step 3 until approved.
- Merge it and enjoy your day.
One often wishes to upload a dataset when contributing a new runnable model example, especially when covering a new field not in our existing examples.
Uploading data file into the Git repository directly is a bad idea because we do not want the cloners to always download the dataset no matter what. Instead, we strongly suggest the data files be hosted on a permanent cloud storage service (e.g. DropBox, Amazon S3, Baidu, Google Drive, etc.).
One can either
- Make your scripts automatically download your data if possible (e.g. when using Amazon S3), or
- Clearly state the instructions of downloading your dataset (e.g. when using Baidu, where auto-downloading is hard).
If you have trouble doing so (e.g. you cannot find a permanent cloud storage), feel free to post in our discussion forum.
Depending on the commonality of the contributed task, model, or dataset, we (the DGL team) would migrate your dataset to the official DGL Dataset Repository on Amazon S3. If you wish to host a particular dataset, you can either
- DIY: make changes in the
dgl.data
module; see our :ref:`dataset APIs <apidata>` for more details, or, - Post in our discussion forum (again).
Currently, all the datasets of DGL model examples are hosted on Amazon S3.
We call a feature that goes into the Python dgl
package a core feature.
Since DGL supports multiple tensor frameworks, contributing a core feature is no easy job. However, we do NOT require knowledge of all tensor frameworks. Instead,
- Before making a pull request, please make sure your code is covered with unit tests on at least one supported frameworks; see the Building and Testing section for details.
- Once you have done that, make a pull request and summarize your changes, and wait for the CI to finish.
- If the CI fails on a tensor platform that you are unfamiliar with (which is well often the case), please refer to Supporting Multiple Platforms section.
- The committers will review it, suggesting or making changes as necessary.
- Resolve the suggestions and reviews, and go back to step 3 until approved.
- Merge it and enjoy your day.
This is the hard one, but you don't have to know PyTorch AND MXNet (maybe AND Tensorflow, AND Chainer, etc., in the future) to do so. The rule of thumb in supporting Multiple Platforms is simple:
- In the
dgl
Python package, always avoid using framework-specific operators (including array indexing!) directly. Use the wrappers indgl.backend
ornumpy
arrays instead. - If you have trouble doing so (either because
dgl.backend
does not cover the necessary operator, or you don't have a GPU, or for whatever reason), please label your PR with thebackend support
tag, and one or more DGL team member who understand CPU AND GPU AND PyTorch AND MXNet (AND Tensorflow AND Chainer AND etc.) will look into it.
To build DGL locally, follow the steps described in :ref:`Install from source <install-from-source>`. However, to ease the development, we suggest NOT install DGL but directly working in the source tree. To achieve this, export following environment variables:
export DGL_HOME=/path/to/your/dgl/clone
export DGL_LIBRARY_PATH=$DGL_HOME/build
export PYTHONPATH=$PYTHONPATH:$DGL_HOME/python
If you are working on performance critical part, you may want to turn on Cython build:
cd python
python setup.py build_ext --inplace
You could test the build by running the following command and see the path of your local clone.
python -c 'import dgl; print(dgl.__path__)'
Currently, we use nose
for unit tests. The organization goes as follows:
backend
: Additional unified tensor interface for supported frameworks. The functions there are only used in unit tests, not DGL itself. Note that the code there are not unit tests by themselves. The additional backend can be imported withimport backend
The additional backend contains the following files:
backend/backend_unittest.py
: stub file for all additional tensor functions.backend/${DGLBACKEND}/__init__.py
: implementations of the stubs for the backend${DGLBACKEND}
.backend/__init__.py
: when imported, it replaces the stub implementations with the framework-specific code, depending on the selected backend. It also changes the signature of some existing backend functions to automatically select dtypes and contexts.
compute
: All framework-agnostic computation-related unit tests go there. Anything inside should not depend on a specific tensor library. Tensor functions not provided in DGL unified tensor interface (i.e.dgl.backend
) should go intobackend
directory.${DGLBACKEND}
(e.g.pytorch
andmxnet
): All framework-specific computation-related unit tests go there.graph_index
: All unit tests for C++ graph structure implementation go there. The Python API being tested in this directory, if any, should be as minimal as possible (usually simple wrappers of corresponding C++ functions).lint
: Pylint-related files.scripts
: Automated test scripts for CI.
To run unit tests, run
sh tests/scripts/task_unit_test.sh <your-backend>
where <your-backend>
can be any supported backends (i.e. pytorch
or mxnet
).
If the change is about document improvement, we suggest (and strongly suggest if you change the runnable code there) building the document and render it locally before making a pull request.
In general building the docs locally involves the following:
Install
sphinx
,sphinx-gallery
, andsphinx_rtd_theme
.You need both PyTorch and MXNet because our tutorial contains code from both frameworks. This does not require knowledge of coding with both frameworks, though.
Run the following:
cd docs ./clean.sh make html cd build/html python3 -m http.server 8080
Open
http://localhost:8080
and enjoy your work.
See here for more details.
If one is only changing the wording (i.e. not touching the runnable code at all), one can simply do without the usage of Git CLI:
- Make your fork by clicking on the Fork button in the DGL main repository web page.
- Make whatever changes in the web interface within your own fork. You can usually tell
if you are inside your own fork or in the main repository by checking whether you can commit
to the
master
branch: if you cannot, you are in the wrong place. - Once done, make a pull request (on the web interface).
- The committers will review it, suggesting or making changes as necessary.
- Resolve the suggestions and reviews, and go back to step 4 until approved.
- Merge it and enjoy your day.
When changing code, please make sure to build it locally and see if it fails.