title	description	services	documentationcenter	author	manager	editor	ms.assetid	ms.service	ms.workload	ms.tgt_pltfrm	ms.devlang	ms.topic	ms.date	ms.author
Step 2: Upload data into a Machine Learning experiment \| Microsoft Docs	Step 2 of the Develop a predictive solution walkthrough: Upload stored public data into Azure Machine Learning Studio.	machine-learning		garyericson	jhubbard	cgronlun	9f4bc52e-9919-4dea-90ea-5cf7cc506d85	machine-learning	tbd	na	na	article	12/16/2016	garye

Walkthrough Step 2: Upload existing data into an Azure Machine Learning experiment

This is the second step of the walkthrough, Develop a predictive analytics solution in Azure Machine Learning

Create a Machine Learning workspace
Upload existing data
Create a new experiment
Train and evaluate the models
Deploy the Web service
Access the Web service

To develop a predictive model for credit risk, we need data that we can use to train and then test the model. For this walkthrough, we'll use the "UCI Statlog (German Credit Data) Data Set" from the UC Irvine Machine Learning repository. You can find it here:
http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

We'll use the file named german.data. Download this file to your local hard drive.

This dataset contains rows of 20 variables for 1000 past applicants for credit. These 20 variables represent the dataset's set of features (the feature vector), which provides identifying characteristics for each credit applicant. An additional column in each row represents the applicant's calculated credit risk, with 700 applicants identified as a low credit risk and 300 as a high risk.

The UCI website provides a description of the attributes of the feature vector for this data. This includes financial information, credit history, employment status, and personal information. For each applicant, a binary rating has been given indicating whether they are a low or high credit risk.

We'll use this data to train a predictive analytics model. When we're done, our model should be able to accept a feature vector for a new individual and predict whether he or she is a low or high credit risk.

Here's one interesting twist. The description of the dataset explains that misclassifying a person as a low credit risk when they are actually a high credit risk is 5 times more costly to the financial institution than misclassifying a low credit risk as high. One simple way to take this into account in our experiment is by duplicating (5 times) those entries that represent someone with a high credit risk. Then, if the model misclassifies that high credit risk as low, it will do that misclassification 5 times, once for each duplicate. This will increase the cost of this error in the training results.

Convert the dataset format

The original dataset uses a blank-separated format. Machine Learning Studio works better with a comma-separated value (CSV) file, so we'll convert the dataset by replacing spaces with commas.

There are many ways to convert this data. One way is by using the following Windows PowerShell command:

cat german.data | %{$_ -replace " ",","} | sc german.csv

Another way is by using the Unix sed command:

sed 's/ /,/g' german.data > german.csv

In either case, we have created a comma-separated version of the data in a file named german.csv that we'll use in our experiment.

Upload the dataset to Machine Learning Studio

Once the data has been converted to CSV format, we need to upload it into Machine Learning Studio.

Open the Machine Learning Studio home page (https://studio.azureml.net).
Click the menu in the upper-left corner of the window, click Azure Machine Learning, select Studio, and sign in.
Click +NEW at the bottom of the window.
Select DATASET.
Select FROM LOCAL FILE.
In the Upload a new dataset dialog, click Browse and find the german.csv file you created.
Enter a name for the dataset. For this walkthrough, we'll call it "UCI German Credit Card Data".
For data type, select Generic CSV File With no header (.nh.csv).
Add a description if you’d like.
Click the OK check mark.

This uploads the data into a dataset module that we can use in an experiment.

You can manage datasets that you've uploaded to Studio by clicking the DATASETS tab to the left of the Studio window.

For more information about importing other types of data into an experiment, see Import your training data into Azure Machine Learning Studio.

Next: Create a new experiment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-learning-walkthrough-2-upload-data.md

machine-learning-walkthrough-2-upload-data.md

Walkthrough Step 2: Upload existing data into an Azure Machine Learning experiment

Convert the dataset format

Upload the dataset to Machine Learning Studio

Files

machine-learning-walkthrough-2-upload-data.md

Latest commit

History

machine-learning-walkthrough-2-upload-data.md

File metadata and controls

Walkthrough Step 2: Upload existing data into an Azure Machine Learning experiment

Convert the dataset format

Upload the dataset to Machine Learning Studio