Skip to content

Latest commit

 

History

History
130 lines (93 loc) · 7.43 KB

machine-learning-r-csharp-cluster-model.md

File metadata and controls

130 lines (93 loc) · 7.43 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author
Cluster Model | Microsoft Docs
Cluster Model
machine-learning
FrancescaLazzeri
jhubbard
cgronlun
51b8d012-ed44-4312-920c-9c808dfd4ff6
machine-learning
data-services
na
na
article
11/21/2016
lazzeri

Cluster Model

How can we predict groups of credit cardholders’ behaviors in order to reduce the charge-off risk of credit card issuers? How can we define groups of personality traits of employees in order to improve their performance at work? How can doctors classify patients into groups based on the characteristics of their diseases? In principle, all of these questions can be answered through cluster analysis.

[!INCLUDE machine-learning-free-trial]

Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of variables. The purpose of cluster analysis is to discover a system of organizing observations, usually people or their characteristics, into groups, where members of the groups share properties in common. This service uses the K-Means methodology, a commonly used clustering technique, to cluster arbitrary data into groups. This web service takes the data and the number of k clusters as input, and produces predictions of which of the k groups to which each observations belongs.

This web service could be consumed by users – potentially through a mobile app, through a website, or even on a local computer, for example. But the purpose of the web service is also to serve as an example of how Azure Machine Learning can be used to create web services on top of R code. With just a few lines of R code and clicks of a button within Azure Machine Learning Studio, an experiment can be created with R code and published as a web service. The web service can then be published to the Azure Marketplace and consumed by users and devices across the world with no infrastructure setup by the author of the web service.

Consumption of web service

This web service groups the data into a set of k groups and outputs the group assignment for each row. The web service expects the end user to input data as a string where rows are separated by commas (,) and columns are separated by semicolons (;). The web service expects 1 row at a time. An example dataset could look like this:

Sample data

Suppose the user wanted to separate this data into 3 mutually exclusive groups. The data input for the above dataset would be the following: value = “10;5;2,18;1;6,7;5;5,22;3;4,12;2;1,10;3;4”; k=”3”. The output is the predicted group membership for each of the rows.

This service, as hosted on the Azure Marketplace, is an OData service; these may be called through POST or GET methods.

There are multiple ways of consuming the service in an automated fashion (an example app is here).

Starting C# code for web service consumption:

public class Input
{
        public string value;
        public string k;
}

public AuthenticationHeaderValue CreateBasicHeader(string username, string password)
{
        byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(username + ":" + password);
        return new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));
}

void Main()
{
        var input = new Input() { value = TextBox1.Text, k = TextBox2.Text };
        var json = JsonConvert.SerializeObject(input);
        var acitionUri = "PutAPIURLHere,e.g.https://api.datamarket.azure.com/..../v1/Score";
        var httpClient = new HttpClient();

        httpClient.DefaultRequestHeaders.Authorization = CreateBasicHeader("PutEmailAddressHere", "ChangeToAPIKey");
        httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

        var response = httpClient.PostAsync(acitionUri, new StringContent(json));
        var result = response.Result.Content;
        var scoreResult = result.ReadAsStringAsync().Result;
}

Creation of web service

This web service was created using Azure Machine Learning. For a free trial, as well as introductory videos on creating experiments and publishing web services, please see azure.com/ml. Below is a screenshot of the experiment that created the web service and example code for each of the modules within the experiment.

From within Azure Machine Learning, a new blank experiment was created and two Execute R Script modules pulled onto the workspace. The data schema was created with a simple Execute R Script. Then, the data schema was linked to the cluster model section, again created with an Execute R Script. In the Execute R Script used for the cluster model, the web service then utilizes the “k-means” function, which is prebuilt into the Execute R Script of Azure Machine Learning.

Experiment flow

Module 1:

#Enter the input data as a string 
mydata <- data.frame(value = "1; 3; 5; 6; 7; 7, 5; 5; 6; 7; 2; 1, 3; 7; 2; 9; 56; 6, 1; 4; 5; 26; 4; 23, 15; 35; 6; 7; 12; 1, 32; 51; 62; 7; 21; 1", k=5, stringsAsFactors=FALSE)

maml.mapOutputPort("mydata");     

Module 2:

# Map 1-based optional input ports to variables
mydata <- maml.mapInputPort(1) # class: data.frame

data.split <- strsplit(mydata[1,1], ",")[[1]]
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- as.data.frame(t(data.split))

data.split <- data.matrix(data.split)
data.split <- data.frame(data.split)

# K-Means cluster analysis
fit <- kmeans(data.split, mydata$k) # k-cluster solution

# Get cluster means 
aggregate(data.split,by=list(fit$cluster),FUN=mean)
# Append cluster assignment
mydatafinal <- data.frame(t(fit$cluster))
n_col=ncol(mydatafinal)
colnames(mydatafinal) <- paste("V",1:n_col,sep="")

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("mydatafinal");

Limitations

This is a very simple example of a clustering web service. As can be seen from the example code above, no error catching is implemented and the service assumes everything is a continuous variable (no categorical features allowed), as the service only inputs numeric values at the time of the creation of this web service. Also, the service currently handles limited data size, due to the request/response nature of the web service call and the fact that the model is being fit every time the web service is called.

FAQ

For frequently asked questions on consumption of the web service or publishing to the Azure Marketplace, see here.