The Open Source resources in Data Engineering, Machine Learning, Data Science areas, inspired by [The Open-Source Data Science Masters] (http://datasciencemasters.org/).
The Jupyter notebook is a part of Anaconda Distribution
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
Pandas package is a part of Anaconda Distribution
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
-
Python for Data Analysis is a good book to start, written by Wes McKinney the main author of Pandas package. The second book is planning to release in August 2017.
-
NumPy is the fundamental package for scientific computing with Python. http://www.numpy.org/
-
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. http://matplotlib.org/
-
Seaborn is a Python visualization library based on matplotlib. https://seaborn.pydata.org/