title	description	services	documentationcenter	author	manager	editor	ms.assetid	ms.service	ms.workload	ms.tgt_pltfrm	ms.devlang	ms.topic	ms.date	ms.author
Explore data in SQL Server Virtual Machine on Azure \| Microsoft Docs	How to explore data that is stored in a SQL Server VM on Azure.	machine-learning		bradsev	jhubbard	cgronlun	ccbb3085-af9e-4ec2-9df2-15dcab261d05	machine-learning	data-services	na	na	article	12/09/2016	bradsev

Explore data in SQL Server Virtual Machine on Azure

This document covers how to explore data that is stored in a SQL Server VM on Azure. This can be done by data wrangling using SQL or by using a programming language like Python.

The following menu links to topics that describe how to use tools to explore data from various storage environments. This task is a step in the Cortana Analytics Process (CAP).

[!INCLUDE cap-explore-data-selector]

Note

The sample SQL statements in this document assume that data is in SQL Server. If it isn't, refer to the cloud data science process map to learn how to move your data to SQL Server.

Explore SQL data with SQL scripts

Here are a few sample SQL scripts that can be used to explore data stores in SQL Server.

Get the count of observations per day

SELECT CONVERT(date, <date_columnname>) as date, count(*) as c from <tablename> group by CONVERT(date, <date_columnname>)
Get the levels in a categorical column

select distinct <column_name> from <databasename>
Get the number of levels in combination of two categorical columns

select <column_a>, <column_b>,count(*) from <tablename> group by <column_a>, <column_b>
Get the distribution for numerical columns

select <column_name>, count(*) from <tablename> group by <column_name>

Note

For a practical example, you can use the NYC Taxi dataset and refer to the IPNB titled NYC Data wrangling using IPython Notebook and SQL Server for an end-to-end walk-through.

Explore SQL data with Python

Using Python to explore data and generate features when the data is in SQL Server is similar to processing data in Azure blob using Python, as documented in Process Azure Blob data in your data science environment. The data needs to be loaded from the database into a pandas DataFrame and then can be processed further. We document the process of connecting to the database and loading the data into the DataFrame in this section.

The following connection string format can be used to connect to a SQL Server database from Python using pyodbc (replace servername, dbname, username, and password with your specific values):

#Set up the SQL Azure connection
import pyodbc    
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=<servername>;DATABASE=<dbname>;UID=<username>;PWD=<password>')

The Pandas library in Python provides a rich set of data structures and data analysis tools for data manipulation for Python programming. The following code reads the results returned from a SQL Server database into a Pandas data frame:

# Query database and load the returned results in pandas data frame
data_frame = pd.read_sql('''select <columnname1>, <cloumnname2>... from <tablename>''', conn)

Now you can work with the Pandas DataFrame as covered in the topic Process Azure Blob data in your data science environment.

Cortana Analytics Process in Action Example

For an end-to-end walkthrough example of the Cortana Analytics Process using a public dataset, see The Team Data Science Process in action: using SQL Server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-learning-data-science-explore-data-sql-server.md

machine-learning-data-science-explore-data-sql-server.md

Explore data in SQL Server Virtual Machine on Azure

Explore SQL data with SQL scripts

Explore SQL data with Python

Cortana Analytics Process in Action Example

Files

machine-learning-data-science-explore-data-sql-server.md

Latest commit

History

machine-learning-data-science-explore-data-sql-server.md

File metadata and controls

Explore data in SQL Server Virtual Machine on Azure

Explore SQL data with SQL scripts

Explore SQL data with Python

Cortana Analytics Process in Action Example