title | description | services | author | manager | editor | ms.service | ms.component | ms.topic | ms.date | ms.author | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|---|
HDInsight Hadoop data science walkthroughs using Hive on Azure | Microsoft Docs |
Examples of the Team Data Science Process that walk through the use of Hive on Azure HDInsight Hadoop to do predictive analytics. |
machine-learning |
marktab |
cgronlun |
cgronlun |
machine-learning |
team-data-science-process |
article |
09/04/2017 |
tdsp |
(previous author=deguhath, ms.author=deguhath) |
These walkthroughs use Hive with an HDInsight Hadoop cluster to do predictive analytics. They follow the steps outlined in the Team Data Science Process. For an overview of the Team Data Science Process, see Data Science Process. For an introduction to Azure HDInsight, see Introduction to Azure HDInsight, the Hadoop technology stack, and Hadoop clusters.
Additional data science walkthroughs that execute the Team Data Science Process are grouped by the platform that they use. See Walkthroughs executing the Team Data Science Process for an itemization of these examples.
The Use HDInsight Hadoop clusters walkthrough uses data from New York taxis to predict:
- Whether a tip is paid
- The distribution of tip amounts
The scenario is implemented using Hive with an Azure HDInsight Hadoop cluster. You learn how to store, explore, and feature engineer data from a publicly available NYC taxi trip and fare dataset. You also use Azure Machine Learning to build and deploy the models.
The Use Azure HDInsight Hadoop Clusters on a 1-TB dataset walkthrough uses a publicly available Criteo click dataset to predict whether a tip is paid and the range of amounts expected. The scenario is implemented using Hive with an Azure HDInsight Hadoop cluster to store, explore, feature engineer, and down sample data. It uses Azure Machine Learning to build, train, and score a binary classification model predicting whether a user clicks on an advertisement. The walkthrough concludes showing how to publish one of these models as a Web service.
For a discussion of the key components that comprise the Team Data Science Process, see Team Data Science Process overview.
For a discussion of the Team Data Science Process lifecycle that you can use to structure your data science projects, see Team Data Science Process lifecycle. The lifecycle outlines the steps, from start to finish, that projects usually follow when they are executed.