title | description | services | documentationcenter | author | manager | editor | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cluster Model | Microsoft Docs |
Cluster Model |
machine-learning |
FrancescaLazzeri |
jhubbard |
cgronlun |
51b8d012-ed44-4312-920c-9c808dfd4ff6 |
machine-learning |
data-services |
na |
na |
article |
11/21/2016 |
lazzeri |
How can we predict groups of credit cardholders’ behaviors in order to reduce the charge-off risk of credit card issuers? How can we define groups of personality traits of employees in order to improve their performance at work? How can doctors classify patients into groups based on the characteristics of their diseases? In principle, all of these questions can be answered through cluster analysis.
[!INCLUDE machine-learning-free-trial]
Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of variables. The purpose of cluster analysis is to discover a system of organizing observations, usually people or their characteristics, into groups, where members of the groups share properties in common. This service uses the K-Means methodology, a commonly used clustering technique, to cluster arbitrary data into groups. This web service takes the data and the number of k clusters as input, and produces predictions of which of the k groups to which each observations belongs.
This web service could be consumed by users – potentially through a mobile app, through a website, or even on a local computer, for example. But the purpose of the web service is also to serve as an example of how Azure Machine Learning can be used to create web services on top of R code. With just a few lines of R code and clicks of a button within Azure Machine Learning Studio, an experiment can be created with R code and published as a web service. The web service can then be published to the Azure Marketplace and consumed by users and devices across the world with no infrastructure setup by the author of the web service.
This web service groups the data into a set of k groups and outputs the group assignment for each row. The web service expects the end user to input data as a string where rows are separated by commas (,) and columns are separated by semicolons (;). The web service expects 1 row at a time. An example dataset could look like this:
Suppose the user wanted to separate this data into 3 mutually exclusive groups. The data input for the above dataset would be the following: value = “10;5;2,18;1;6,7;5;5,22;3;4,12;2;1,10;3;4”; k=”3”. The output is the predicted group membership for each of the rows.
This service, as hosted on the Azure Marketplace, is an OData service; these may be called through POST or GET methods.
There are multiple ways of consuming the service in an automated fashion (an example app is here).
public class Input
{
public string value;
public string k;
}
public AuthenticationHeaderValue CreateBasicHeader(string username, string password)
{
byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(username + ":" + password);
return new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));
}
void Main()
{
var input = new Input() { value = TextBox1.Text, k = TextBox2.Text };
var json = JsonConvert.SerializeObject(input);
var acitionUri = "PutAPIURLHere,e.g.https://api.datamarket.azure.com/..../v1/Score";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Authorization = CreateBasicHeader("PutEmailAddressHere", "ChangeToAPIKey");
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = httpClient.PostAsync(acitionUri, new StringContent(json));
var result = response.Result.Content;
var scoreResult = result.ReadAsStringAsync().Result;
}
This web service was created using Azure Machine Learning. For a free trial, as well as introductory videos on creating experiments and publishing web services, please see azure.com/ml. Below is a screenshot of the experiment that created the web service and example code for each of the modules within the experiment.
From within Azure Machine Learning, a new blank experiment was created and two Execute R Script modules pulled onto the workspace. The data schema was created with a simple Execute R Script. Then, the data schema was linked to the cluster model section, again created with an Execute R Script. In the Execute R Script used for the cluster model, the web service then utilizes the “k-means” function, which is prebuilt into the Execute R Script of Azure Machine Learning.
#Enter the input data as a string
mydata <- data.frame(value = "1; 3; 5; 6; 7; 7, 5; 5; 6; 7; 2; 1, 3; 7; 2; 9; 56; 6, 1; 4; 5; 26; 4; 23, 15; 35; 6; 7; 12; 1, 32; 51; 62; 7; 21; 1", k=5, stringsAsFactors=FALSE)
maml.mapOutputPort("mydata");
# Map 1-based optional input ports to variables
mydata <- maml.mapInputPort(1) # class: data.frame
data.split <- strsplit(mydata[1,1], ",")[[1]]
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE)
data.split <- as.data.frame(t(data.split))
data.split <- data.matrix(data.split)
data.split <- data.frame(data.split)
# K-Means cluster analysis
fit <- kmeans(data.split, mydata$k) # k-cluster solution
# Get cluster means
aggregate(data.split,by=list(fit$cluster),FUN=mean)
# Append cluster assignment
mydatafinal <- data.frame(t(fit$cluster))
n_col=ncol(mydatafinal)
colnames(mydatafinal) <- paste("V",1:n_col,sep="")
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("mydatafinal");
This is a very simple example of a clustering web service. As can be seen from the example code above, no error catching is implemented and the service assumes everything is a continuous variable (no categorical features allowed), as the service only inputs numeric values at the time of the creation of this web service. Also, the service currently handles limited data size, due to the request/response nature of the web service call and the fact that the model is being fit every time the web service is called.
For frequently asked questions on consumption of the web service or publishing to the Azure Marketplace, see here.