Skip to content

Latest commit

 

History

History
122 lines (92 loc) · 7.01 KB

machine-learning-r-csharp-binary-classifier.md

File metadata and controls

122 lines (92 loc) · 7.01 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author
Binary Classifier | Microsoft Docs
Binary Classifier
machine-learning
jaymathe
jhubbard
cgronlun
8045038a-9dcf-44b9-a6de-7f1f8e791575
machine-learning
data-services
na
na
article
11/21/2016
jaymathe

Binary Classifier

Suppose you have a dataset and would like to predict a binary dependent variable based on the independent variables. ‘Logistic Regression’ is a popular statistical technique used for such predictions. Here the dependent variable is binary or dichotomous, and p is the probability of presence of the characteristic of interest.

[!INCLUDE machine-learning-free-trial]

A simple scenario could be where a researcher is trying to predict whether a prospective student is likely to accept an admission offer to a university based on information (GPA in high school, family income, resident state, gender). The predicted outcome is the probability of the prospective student accepting the admission offer. This web service fits the logistic regression model to the data and outputs the probability value (y) for each of the observations in the data.

This web service could be consumed by users – potentially through a mobile app, through a website, or even on a local computer, for example. But the purpose of the web service is also to serve as an example of how Azure Machine Learning can be used to create web services on top of R code. With just a few lines of R code and clicks of a button within Azure Machine Learning Studio, an experiment can be created with R code and published as a web service. The web service can then be published to the Azure Marketplace and consumed by users and devices across the world with no infrastructure setup by the author of the web service.

Consumption of web service

This web service gives the predicted values of the dependent variable based on the independent variables for all of the observations. The web service expects the end user to input data as a string where rows are separated by comma (,) and columns are separated by semicolon (;). The web service expects 1 row at a time and expects the first column to be the dependent variable. An example dataset could look like this:

Sample data

Observations without a dependent variable should be input as “NA” for y. The data input for the above dataset would be the following string: “1;5;2,1;1;6,0;5.3;2.1,0;5;5,0;3;4,1;2;1,NA;3;4”. The output is the predicted value for each of the rows based on the independent variables.

This service, as hosted on the Azure Marketplace, is an OData service; these may be called through POST or GET methods.

There are multiple ways of consuming the service in an automated fashion (an example app is here).

Starting C# code for web service consumption:

public class Input
{
       public string value;
}

public AuthenticationHeaderValue CreateBasicHeader(string username, string password)
{
    byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(username + ":" + password);
    return new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray));
}

void Main()
{
    var input = new Input() { value = TextBox1.Text };
    var json = JsonConvert.SerializeObject(input);
    var acitionUri = "PutAPIURLHere,e.g.https://api.datamarket.azure.com/..../v1/Score";
    var httpClient = new HttpClient();

    httpClient.DefaultRequestHeaders.Authorization = CreateBasicHeader("PutEmailAddressHere", "ChangeToAPIKey");
    httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

    var response = httpClient.PostAsync(acitionUri, new StringContent(json));
    var result = response.Result.Content;
    var scoreResult = result.ReadAsStringAsync().Result;
}

Creation of web service

This web service was created using Azure Machine Learning. For a free trial, as well as introductory videos on creating experiments and publishing web services, please see azure.com/ml. Below is a screenshot of the experiment that created the web service and example code for each of the modules within the experiment.

From within Azure Machine Learning, a new blank experiment was created and two Execute R Script modules pulled onto the workspace. This web service runs an Azure Machine Learning experiment with an underlying R script. There are 2 parts to this experiment: schema definition, and training model + scoring. The first module defines the expected structure of the input dataset, where the first variable is the dependent variable and the remaining variables are independent. The second module fits a generic logistic regression model for the input data.

Experiment flow

Module 1:

#Schema definition  
data <- data.frame(value = "1;2;3,1;5;6,0;8;9", stringsAsFactors=FALSE) 
maml.mapOutputPort("data");  

Module 2:

#GLM modeling   
data <- maml.mapInputPort(1) # class: data.frame  

data.split <- strsplit(data[1,1], ",")[[1]] 
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE) 
data.split <- sapply(data.split, strsplit, ";", simplify = TRUE) 
data.split <- as.data.frame(t(data.split)) data.split <- 
data.matrix(data.split) 
data.split <- data.frame(data.split) 

model <- glm(data.split$V1 ~., family='binomial', data=data.split)  
out <- data.frame(predict(model,data.split, type="response")) 
pred1 <- as.data.frame(out) 
group <- array(1:nrow(pred1)) 
for (i in 1:nrow(pred1))  
    {
    if(as.numeric(pred1[i,])>0.5) {group[i]=1} 
    else {group[i]=0}
    } 
pred2 <- as.data.frame(group) 
maml.mapOutputPort("pred2");  

Limitations

This is a very simple example of a binary classification web service. As can be seen from the example code above, no error catching is implemented and the service assumes everything is a binary/continuous variable (no categorical features allowed), as the service only inputs numeric values at the time of the creation of this web service. Also, the service currently handles limited data size, due to the request/response nature of the web service call and the fact that the model is being fit every time the web service is called.

FAQ

For frequently asked questions on consumption of the web service or publishing to the Azure Marketplace, see here.