[TF] TF backend fix and new logic to choose backend (dmlc#1393)

* TF backend fix and new logic to choose backend * fix * fix * fix * fix * fix backend * fix * dlpack alignment * add flag * flag * lint * lint * remove unused * several fixes Co-authored-by: Minjie Wang <[email protected]>
zhuangz-ma · Mar 30, 2020 · e9440ac · e9440ac
1 parent 4b4186f
commit e9440ac
Show file tree

Hide file tree

Showing 23 changed files with 217 additions and 107 deletions.
diff --git a/README.md b/README.md
@@ -152,7 +152,7 @@ DGL should work on
 
 DGL requires Python 3.5 or later.
 
-Right now, DGL works on [PyTorch](https://pytorch.org) 1.1.0+, [MXNet](https://mxnet.apache.org) nightly build, and [TensorFlow](https://tensorflow.org) 2.0+.
+Right now, DGL works on [PyTorch](https://pytorch.org) 1.2.0+, [MXNet](https://mxnet.apache.org) 1.5.1+, and [TensorFlow](https://tensorflow.org) 2.1.0+.
 
 
 ### Using anaconda

diff --git a/docker/install/conda_env/mxnet_cpu.yml b/docker/install/conda_env/mxnet_cpu.yml
@@ -5,12 +5,12 @@ dependencies:
   - pip:
     - mxnet
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm
diff --git a/docker/install/conda_env/mxnet_gpu.yml b/docker/install/conda_env/mxnet_gpu.yml
@@ -5,12 +5,12 @@ dependencies:
   - pip:
     - mxnet-cu101
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm
diff --git a/docker/install/conda_env/tensorflow_cpu.yml b/docker/install/conda_env/tensorflow_cpu.yml
@@ -3,15 +3,16 @@ dependencies:
   - python=3.6.9
   - pip
   - pip:
-    - tensorflow==2.1.0rc1
+    - tensorflow==2.2.0rc1
+    # - tf-nightly==2.2.0.dev20200327
     - tfdlpack
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm
diff --git a/docker/install/conda_env/tensorflow_gpu.yml b/docker/install/conda_env/tensorflow_gpu.yml
@@ -1,18 +1,18 @@
-
 name: tensorflow-ci
 dependencies:
   - python=3.6.9
   - pip
   - pip:
-    - tensorflow-gpu==2.1.0rc1
+    - tensorflow==2.2.0rc1
+    # - tf-nightly==2.2.0.dev20200327
     - tfdlpack-gpu
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm
diff --git a/docker/install/conda_env/torch_cpu.yml b/docker/install/conda_env/torch_cpu.yml
@@ -6,12 +6,12 @@ dependencies:
     - torch
     - torchvision
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm
diff --git a/docker/install/conda_env/torch_gpu.yml b/docker/install/conda_env/torch_gpu.yml
@@ -6,12 +6,12 @@ dependencies:
     - torch
     - torchvision
     - pytest
-  - nose
-  - numpy
-  - cython
-  - scipy
-  - networkx
-  - matplotlib
-  - nltk
-  - requests[security]
-  - tqdm
+    - nose
+    - numpy
+    - cython
+    - scipy
+    - networkx
+    - matplotlib
+    - nltk
+    - requests[security]
+    - tqdm  
diff --git a/docs/source/install/backend.rst b/docs/source/install/backend.rst
@@ -3,14 +3,21 @@
 Working with different backends
 ===============================
 
-DGL supports PyTorch, MXNet and Tensorflow backends. To change them, set the ``DGLBACKEND``
-environcment variable. The default backend is PyTorch.
+DGL supports PyTorch, MXNet and Tensorflow backends. 
+DGL will choose the backend on the following options (high priority to low priority)
+- `DGLBACKEND` environment
+   - You can use `DGLBACKEND=[BACKEND] python gcn.py ...` to specify the backend
+   - Or `export DGLBACKEND=[BACKEND]` to set the global environment variable 
+- `config.json` file under "~/.dgl"
+   - You can use `python -m dgl.backend.set_default_backend [BACKEND]` to set the default backend
+
+Currently BACKEND can be chosen from mxnet, pytorch, tensorflow.
 
 PyTorch backend
 ---------------
 
 Export ``DGLBACKEND`` as ``pytorch`` to specify PyTorch backend. The required PyTorch
-version is 0.4.1 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
+version is 1.1.0 or later. See `pytorch.org <https://pytorch.org>`_ for installation instructions.
 
 MXNet backend
 -------------
@@ -32,18 +39,10 @@ Tensorflow backend
 ------------------
 
 Export ``DGLBACKEND`` as ``tensorflow`` to specify Tensorflow backend. The required Tensorflow
-version is 2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
-instructions. In addition, Tensorflow backend requires ``tfdlpack`` package installed as follows and set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
-
-.. code:: bash
-
-   pip install tfdlpack  # when using tensorflow cpu version
-
-
-or
+version is 2.2.0 or later. See `tensorflow.org <https://www.tensorflow.org/install>`_ for installation
+instructions. In addition, DGL will set ``TF_FORCE_GPU_ALLOW_GROWTH`` to ``true`` to prevent Tensorflow take over the whole GPU memory:
 
 .. code:: bash
 
-   pip install tfdlpack-gpu  # when using tensorflow gpu version
-   export TF_FORCE_GPU_ALLOW_GROWTH=true # and add this to your .bashrc/.zshrc file if needed
+   pip install "tensorflow>=2.2.0rc1"  # when using tensorflow cpu version
 
diff --git a/include/dgl/runtime/c_runtime_api.h b/include/dgl/runtime/c_runtime_api.h
@@ -474,8 +474,8 @@ DGL_DLL int DGLArrayFromDLPack(DLManagedTensor* from,
  * \param out The DLManagedTensor handle.
  * \return 0 when success, -1 when failure happens
  */
-DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from,
-                             DLManagedTensor** out);
+DGL_DLL int DGLArrayToDLPack(DGLArrayHandle from, DLManagedTensor** out,
+                             int alignment = 0);
 
 /*!
  * \brief Delete (free) a DLManagedTensor's data.

diff --git a/python/dgl/__init__.py b/python/dgl/__init__.py
@@ -5,7 +5,7 @@
 
 # Need to ensure that the backend framework is imported before load dgl libs,
 # otherwise weird cuda problem happens
-from .backend import load_backend
+from .backend import load_backend, backend_name
 
 from . import function
 from . import contrib

diff --git a/python/dgl/_ffi/_ctypes/ndarray.py b/python/dgl/_ffi/_ctypes/ndarray.py
@@ -73,15 +73,23 @@ def __del__(self):
     def _dgl_handle(self):
         return ctypes.cast(self.handle, ctypes.c_void_p).value
 
-    def to_dlpack(self):
+    def to_dlpack(self, alignment=0):
         """Produce an array from a DLPack Tensor without copying memory
 
+        Args
+        -------
+        alignment: int, default to be 0
+        Indicates the alignment requirement when converting to dlpack. Will copy to a
+        new tensor if the alignment requirement is not satisfied.
+        0 means no alignment requirement.
+
+
         Returns
         -------
         dlpack : DLPack tensor view of the array data
         """
         ptr = ctypes.c_void_p()
-        check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr)))
+        check_call(_LIB.DGLArrayToDLPack(self.handle, ctypes.byref(ptr), alignment))
         return ctypes.pythonapi.PyCapsule_New(ptr, _c_str_dltensor, _c_dlpack_deleter)
 
 

diff --git a/python/dgl/_ffi/_cython/base.pxi b/python/dgl/_ffi/_cython/base.pxi
@@ -112,7 +112,8 @@ cdef extern from "dgl/runtime/c_runtime_api.h":
     int DGLArrayFromDLPack(DLManagedTensor* arr_from,
                            DLTensorHandle* out)
     int DGLArrayToDLPack(DLTensorHandle arr_from,
-                         DLManagedTensor** out)
+                         DLManagedTensor** out,
+                         int alignment)
     void DGLDLManagedTensorCallDeleter(DLManagedTensor* dltensor)
 
 cdef extern from "dgl/runtime/c_object_api.h":

diff --git a/python/dgl/_ffi/_cython/ndarray.pxi b/python/dgl/_ffi/_cython/ndarray.pxi
@@ -59,17 +59,24 @@ cdef class NDArrayBase:
         if self.c_is_view == 0:
             CALL(DGLArrayFree(self.chandle))
 
-    def to_dlpack(self):
+    def to_dlpack(self, alignment=0):
         """Produce an array from a DLPack Tensor without copying memory
 
+        Args
+        -------
+        alignment: int, default to be 0
+        Indicates the alignment requirement when converting to dlpack. Will copy to a 
+        new tensor if the alignment requirement is not satisfied. 
+        0 means no alignment requirement.
+        
         Returns
         -------
         dlpack : DLPack tensor view of the array data
         """
         cdef DLManagedTensor* dltensor
         if self.c_is_view != 0:
             raise ValueError("to_dlpack do not work with memory views")
-        CALL(DGLArrayToDLPack(self.chandle, &dltensor))
+        CALL(DGLArrayToDLPack(self.chandle, &dltensor, alignment))
         return pycapsule.PyCapsule_New(dltensor, _c_str_dltensor, _c_dlpack_deleter)
 
 

diff --git a/python/dgl/backend/__init__.py b/python/dgl/backend/__init__.py
@@ -1,19 +1,24 @@
 from __future__ import absolute_import
 
-import sys, os
+import sys
+import os
+import json
 import importlib
 
 from . import backend
+from .set_default_backend import set_default_backend
 
 _enabled_apis = set()
 
+
 def _gen_missing_api(api, mod_name):
     def _missing_api(*args, **kwargs):
         raise ImportError('API "%s" is not supported by backend "%s".'
                           ' You can switch to other backends by setting'
                           ' the DGLBACKEND environment.' % (api, mod_name))
     return _missing_api
 
+
 def load_backend(mod_name):
     mod = importlib.import_module('.%s' % mod_name, __name__)
     thismod = sys.modules[__name__]
@@ -45,7 +50,29 @@ def load_backend(mod_name):
             else:
                 setattr(thismod, api, _gen_missing_api(api, mod_name))
 
-load_backend(os.environ.get('DGLBACKEND', 'pytorch').lower())
+
+def get_preferred_backend():
+    config_path = os.path.join(os.path.expanduser('~'), '.dgl', 'config.json')
+    backend_name = None
+    if "DGLBACKEND" in os.environ:
+        backend_name = os.getenv('DGLBACKEND')
+    elif os.path.exists(config_path):
+        with open(config_path, "r") as config_file:
+            config_dict = json.load(config_file)
+            backend_name = config_dict.get('backend', '').lower()
+
+    if (backend_name in ['tensorflow', 'mxnet', 'pytorch']):
+        return backend_name 
+    else:
+        while not(backend_name in ['tensorflow', 'mxnet', 'pytorch']):
+            print("DGL does not detect a valid backend option. Which backend would you like to work with?")
+            backend_name = input("Backend choice (pytorch, mxnet or tensorflow): ").lower()
+        set_default_backend(backend_name)
+        return backend_name
+
+
+load_backend(get_preferred_backend())
+
 
 def is_enabled(api):
     """Return true if the api is enabled by the current backend.

diff --git a/python/dgl/backend/mxnet/tensor.py b/python/dgl/backend/mxnet/tensor.py
@@ -14,7 +14,7 @@
 
 MX_VERSION = LooseVersion(mx.__version__)
 if MX_VERSION.version[0] == 1 and MX_VERSION.version[1] < 5:
-    raise Exception("DGL has to work with MXNet version >= 1.5")
+    raise RuntimeError("DGL requires mxnet >= 1.5")
 
 # After MXNet 1.5, empty tensors aren't supprted by default.
 # After we turn on the numpy compatible flag, MXNet supports empty NDArray.

diff --git a/python/dgl/backend/pytorch/tensor.py b/python/dgl/backend/pytorch/tensor.py
@@ -2,15 +2,19 @@
 
 from distutils.version import LooseVersion
 
+import scipy # Weird bug in new pytorch when import scipy after import torch
 import torch as th
 import builtins
 from torch.utils import dlpack
 
 from ... import ndarray as nd
 from ... import kernel as K
 from ...function.base import TargetCode
+from ...base import dgl_warning
 
-TH_VERSION = LooseVersion(th.__version__)
+if LooseVersion(th.__version__) < LooseVersion("1.2.0"):
+    dgl_warning("Detected an old version of PyTorch. Suggest using torch>=1.2.0 "
+                "for the best experience.")
 
 def data_type_dict():
     return {'float16' : th.float16,

diff --git a/python/dgl/backend/set_default_backend.py b/python/dgl/backend/set_default_backend.py
@@ -0,0 +1,21 @@
+import argparse
+import os
+import json
+
+def set_default_backend(backend_name):
+    default_dir = os.path.join(os.path.expanduser('~'), '.dgl')
+    if not os.path.exists(default_dir):
+        os.makedirs(default_dir)
+    config_path = os.path.join(default_dir, 'config.json')
+    with open(config_path, "w") as config_file: 
+        json.dump({'backend': backend_name.lower()}, config_file)
+    print('Set the default backend to "{}". You can change it in the '
+          '~/.dgl/config.json file or export the DGLBACKEND environment variable.'.format(
+              backend_name))
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("backend", nargs=1, type=str, choices=[
+                        'pytorch', 'tensorflow', 'mxnet'], help="Set default backend")
+    args = parser.parse_args()
+    set_default_backend(args.backend[0])