title	description	services	documentationcenter	author	manager	editor	ms.assetid	ms.service	ms.workload	ms.tgt_pltfrm	ms.devlang	ms.topic	ms.date	ms.author
Sample data in Azure blob containers, SQL Server, and Hive tables \| Microsoft Docs	How to explore data stored in various Azure enviromnents.	machine-learning		bradsev	jhubbard	cgronlun	80a9dfae-e3a6-4cfb-aecc-5701cfc7e39d	machine-learning	data-services	na	na	article	12/19/2016	fashah;garye;bradsev

Sample data in Azure blob containers, SQL Server, and Hive tables

This document links to topics that covers how to sample data that is stored in one of three different Azure locations:

Azure blob container data is sampled by downloading it programmatically and then sampling it with sample Python code.
SQL Server data is sampled using both SQL and the Python Programming Language.
Hive table data is sampled using Hive queries.

The following menu links to the topics that describe how to sample data from each of these Azure storage environments.

[!INCLUDE cap-sample-data-selector]

This sampling task is a step in the Team Data Science Process (TDSP).

Why sample data?

If the dataset you plan to analyze is large, it's usually a good idea to down-sample the data to reduce it to a smaller but representative and more manageable size. This facilitates data understanding, exploration, and feature engineering. Its role in the Cortana Analytics Process is to enable fast prototyping of the data processing functions and machine learning models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-learning-data-science-sample-data.md

machine-learning-data-science-sample-data.md

Sample data in Azure blob containers, SQL Server, and Hive tables

Why sample data?

Files

machine-learning-data-science-sample-data.md

Latest commit

History

machine-learning-data-science-sample-data.md

File metadata and controls

Sample data in Azure blob containers, SQL Server, and Hive tables

Why sample data?