Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 4.53 KB

data-science-process-walkthroughs.md

File metadata and controls

52 lines (39 loc) · 4.53 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author
Team Data Science Process walkthroughs | Microsoft Docs
Walkthoughs show how to combine cloud and on-premise tools and services into a workflow or pipeline to create an intelligent application.
machine-learning
bradsev
jhubbard
cgronlun
aa63d5a5-25ee-4c4b-9a4c-7553b98d7f6e
machine-learning
data-services
na
na
article
10/07/2016
bradsev

Team Data Science Process walkthroughs

The end-to-end walkthroughs itemized here each demonstrate the steps in the Team Data Science Process for specific scenarios. They illustrate how to combine cloud, on-premise tools, and services into a workflow or pipeline to create an intelligent application.

Use SQL Data Warehouse

The Team Data Science Process in action: using SQL Data Warehouse walkthrough shows you how to build and deploy machine learning classification and regression models using SQL Data Warehouse (SQL DW) for a publicly available NYC taxi trip and fare dataset.

Use SQL Server

The Team Data Science Process in action: using SQL Server walkthrough shows you build and deploy machine learning classification and regression models using SQL Server and a publicly available NYC taxi trip and fare dataset.

Use HDInsight Hadoop clusters

The Team Data Science Process in action: using HDInsight Hadoop clusters walkthrough uses an Azure HDInsight Hadoop cluster to store, explore and feature engineer data from a publicly available NYC taxi trip and fare dataset

Use Azure HDInsight Hadoop Clusters on a 1-TB dataset

The Team Data Science Process in action: using Azure HDInsight Hadoop Clusters on a 1-TB dataset walkthrough presents an end-to-end scenario that uses an Azure HDInsight Hadoop cluster to store, explore, feature engineer, and down sample data from a publicly available Criteo dataset.

Data Science using Python with Spark on Azure

The Data Science using Spark on Azure HDInsight walkthrough uses the Team Data Science Process in an end-to-end scenario using an Azure HDInsight Spark cluster to store, explore and feature engineer data from the publicly available NYC taxi trip and fare dataset.

Data Science using Scala with Spark on Azure

The Data Science using Scala with Spark on Azure walkthrough shows how to use Scala for supervised machine learning tasks with the Spark scalable machine learning library (MLlib) and SparkML packages on an Azure HDInsight Spark cluster. It walks you through the tasks that constitute the Data Science Process: data ingestion and exploration, visualization, feature engineering, modeling, and model consumption. The models built include logistic and linear regression, random forests, and gradient boosted trees.

Use Azure Data Lake Storage and Analytics

The Scalable Data Science in Azure Data Lake: An end-to-end Walkthrough shows how to use Azure Data Lake to do data exploration and binary classification tasks on a sample of the NYC taxi dataset to predict whether or not a tip is paid by a customer.

Use R with SQL Server R Services

The Data Science End-to-End Walkthrough using SQL Server R Services walkthrough provides data scientists with a combination of R code, SQL Server data, and custom SQL functions to build and deploy an R model to SQL Server.

Use T-SQL with SQL Server R Services

The In-Database Advanced Analytics for SQL Developers walkthrough provides SQL programmers with experience building an advanced analytics solution with Transact-SQL using SQL Server R Services to operationalize an R solution.

What's next?

For an overview of topics that walk you through the tasks that comprise the data science process in Azure, see Data Science Process.