Skip to content

Commit

Permalink
Merge pull request Azure#39 from Azure/jrr-environment-file
Browse files Browse the repository at this point in the history
Jrr environment file
  • Loading branch information
jreynolds01 authored Aug 1, 2018
2 parents f97bb20 + d3b1bec commit a4c2e42
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 8 deletions.
4 changes: 4 additions & 0 deletions DataScienceUtilities/DataReport-Utils/Python/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
log.txt
tmp
.ipynb_checkpoints
__pycache__
26 changes: 26 additions & 0 deletions DataScienceUtilities/DataReport-Utils/Python/idear_env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: idear_env
channels:
- defaults
- conda-forge
dependencies:
- python=3.5.2
- pandas=0.19.2
- numpy=1.11.3
- matplotlib=2.2.2
- nbformat=4.4.0
- ipython=6.3.1
- ipywidgets=5.2.2
- scipy=0.18.1
- ipykernel=4.8.2 # to show up as a kernel...
- statsmodels=0.9.0
- seaborn=0.9.0
- pyyaml=3.12
- scikit-learn=0.18.1
# - os
# - [collections*](https://docs.python.org/2/library/collections.html)
# - [io*](https://docs.python.org/2/library/io.html)
# - sys
# - operator
# - [errno*](https://docs.python.org/2/library/errno.html)
# - string
# - functools
39 changes: 31 additions & 8 deletions DataScienceUtilities/DataReport-Utils/Python/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ Interactive Data Exploratory Analysis and Reporting (IDEAR) is a tool developed
The prerequisites to run IDEAR in Jupyter Notebooks (Python 2.7 or 3.5) are:

- Jupyter Notebook with Python (2.7 or 3.5).
- The Jupyter Notebook server has been set up and is running on the machine that you have access. You should be able to clone the Azure-TDSP-Utilities repository to a directory on that machine.
- The Jupyter Notebook server has been set up and is running on the machine that you have access. You should be able to clone the Azure-TDSP-Utilities repository to a directory on that machine.

If you are running IDEAR Python in Azure Notebooks, you need to have:

- Azure subscription and access to Azure Notebooks account
- Azure subscription and access to Azure Notebooks account
- Azure Blob storage account and be able to upload data to Azure blobs

To start IDEAR in Jupyter Notebook running on Python (2.7 or 3.5),
Expand All @@ -26,16 +26,17 @@ To start IDEAR in Jupyter Notebook running on Python (2.7 or 3.5),

To start IDEAR in Azure Notebooks:

- Uplaod data and yaml file into Azure Blob storage
- Log in to Azure Notebooks account
- Upload [IDEAR-Python-AzureNotebooks.ipynb](IDEAR-Python-AzureNotebooks.ipynb) to your library
- Uplaod data and yaml file into Azure Blob storage
- Log in to Azure Notebooks account
- Upload [IDEAR-Python-AzureNotebooks.ipynb](IDEAR-Python-AzureNotebooks.ipynb) to your library
- Type in the Azure Blob storage credentials when prompt

For details, please read [instructions](IDEAR-Python-Instructions-JupyterNotebook.md).

### Python Modules
The Python modules that are used in IDEAR are as follows. If your Jupyter Notebook server is running on Anaconda Python (2.7 or 3.5), most of the needed modules have been installed when you install Anaconda Python, with a few exceptions. However, if you are using [Azure Data Science Virtual Machines (DSVM)](https://azure.microsoft.com/en-us/marketplace/partners/microsoft-ads/standard-data-science-vm/), all modules are installed.

### Python Modules

The Python modules that are used in IDEAR are as follows. If your Jupyter Notebook server is running on Anaconda Python (2.7 or 3.5), most of the needed modules have been installed when you install Anaconda Python, with a few exceptions. However, if you are using [Azure Data Science Virtual Machines (DSVM)](https://azure.microsoft.com/en-us/marketplace/partners/microsoft-ads/standard-data-science-vm/), all modules are installed.

- pandas
- numpy
- os
Expand All @@ -53,9 +54,31 @@ The Python modules that are used in IDEAR are as follows. If your Jupyter Notebo
- seaborn
- string
- functools
- pyyaml
- scikit-learn

*Not included in Anaconda Python, but included in DSVM.

See the next section and the `idear_env.yml` file for specific versions of additional installs.

### Conda environment

The `idear_env.yml` provides a mechanism for creating a reproducible environment for running this tool. It requires that some version of the `conda` tool is installed (testing was done on Windows 10 machine with conda 4.3.30). You can create the conda environment by running a command prompt with access to conda in it and running:

```
cd PATH/TO/THIS/README/FILE
conda env create -f idear_env.yml
```

In order to execute the `IDEAR.ipynb` notebook against this, you will need to either:

- (RECOMMENDED): [Install the nb_conda_kernels package](https://github.com/Anaconda-Platform/nb_conda_kernels) to the environment in which you are launching jupyter. Once you open the notebook with the conda environment created, you can then just set that environment as the appropriate kernel for execution.
- install jupyter as a dependency within the newly created environment and launch jupyter from within that environment

**NOTE** There are likely to be some warnings around different versions of Javascript that result from this environment, particularly if you leverage it in the RECOMMENDED approach above.

### Support Functions

TDSP team from Microsoft also defined some functions to support IDEAR in Jupyter Notebook (Python 2.7 and 3.5). These functions are encapsulated in the following Python source code files. These files are in the same directory as this readme.md.

- ReportMagics.py
Expand Down
21 changes: 21 additions & 0 deletions ReleaseNotes/Release-Notes-2018-08-01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Release Notes

## Date of release

- 2018-08-01

## Content of release

The goal of this release is to facilitate setup and address outstanding issues:

- Added a conda [environment file](..\DataScienceUtilities\DataReport-Utils\Python\idear_env.yml) to be more specific in versions of packages and to facilitate usage and reproducibility.
- Added [documentation](..\DataScienceUtilities\DataReport-Utils\Python\readme.md) to discuss how to leverage the conda file.
- Added `.gitignore` file to Python directory to make tracking files with `git` simpler

## Version of release

`0.14.1`

## Prior Release Notes

- [2018-01-22](Release-Notes-2018-01-22.md)

0 comments on commit a4c2e42

Please sign in to comment.