Skip to content

Latest commit

 

History

History
243 lines (204 loc) · 9.85 KB

text-moderation-quickstart-dotnet.md

File metadata and controls

243 lines (204 loc) · 9.85 KB
title titlesuffix description services author manager ms.service ms.component ms.topic ms.date ms.author
Quickstart: Analyze text content for objectionable material in C#
Azure Cognitive Services
How to analyze text content for various objectionable material using the Content Moderator SDK for .NET
cognitive-services
sanjeev3
cgronlun
cognitive-services
content-moderator
quickstart
10/31/2018
sajagtap

Quickstart: Analyze text content for objectionable material in C#

This article provides information and code samples to help you get started using the Content Moderator SDK for .NET. You will learn how to execute term-based filtering and classification of text content with the aim of moderating potentially objectionable material.

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

Note

This guide uses a free-tier Content Moderator subscription. For information on what is provided with each subscription tier, see the Pricing and limits page.

Create the Visual Studio project

  1. In Visual Studio, create a new Console app (.NET Framework) project and name it TextModeration.
  2. If there are other projects in your solution, select this one as the single startup project.
  3. Get the required NuGet packages. Right-click on your project in the Solution Explorer and select Manage NuGet Packages; then find and install the following packages:
    • Microsoft.Azure.CognitiveServices.ContentModerator
    • Microsoft.Rest.ClientRuntime
    • Newtonsoft.Json

Add text moderation code

Next, you'll copy and paste the code from this guide into your project to implement a basic content moderation scenario.

Include namespaces

Add the following using statements to the top of your Program.cs file.

using Microsoft.Azure.CognitiveServices.ContentModerator;
using Microsoft.CognitiveServices.ContentModerator;
using Microsoft.CognitiveServices.ContentModerator.Models;
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;

Create the Content Moderator client

Add the following code to your Program.cs file to create a Content Moderator client provider for your subscription. Add the code alongside the Program class, in the same namespace. You'll need to update the AzureRegion and CMSubscriptionKey fields with the values of your region identifier and subscription key.

// Wraps the creation and configuration of a Content Moderator client.
public static class Clients
{
	// The region/location for your Content Moderator account, 
	// for example, westus.
	private static readonly string AzureRegion = "YOUR API REGION";

	// The base URL fragment for Content Moderator calls.
	private static readonly string AzureBaseURL =
		$"https://{AzureRegion}.api.cognitive.microsoft.com";

	// Your Content Moderator subscription key.
	private static readonly string CMSubscriptionKey = "YOUR API KEY";

	// Returns a new Content Moderator client for your subscription.
	public static ContentModeratorClient NewClient()
	{
		// Create and initialize an instance of the Content Moderator API wrapper.
		ContentModeratorClient client = new ContentModeratorClient(new ApiKeyServiceClientCredentials(CMSubscriptionKey));

		client.Endpoint = AzureBaseURL;
		return client;
	}
}

Set up input and output targets

Add the following static fields to the Program class in Program.cs. These specify the files for input text content and output JSON content.

// The name of the file that contains the text to evaluate.
private static string TextFile = "TextFile.txt";

// The name of the file to contain the output from the evaluation.
private static string OutputFile = "TextModerationOutput.txt";

You will need to create the TextFile.txt input file and update its path accordingly (relative paths are relative to the execution directory). Open TextFile.txt and add the text to moderate. This quickstart uses the following sample text:

Is this a grabage or crap email [email protected], phone: 6657789887, IP: 255.255.255.255, 1 Microsoft Way, Redmond, WA 98052.
These are all UK phone numbers, the last two being Microsoft UK support numbers: +44 870 608 4000 or 0344 800 2400 or 
0800 820 3300. Also, 999-99-9999 looks like a social security number (SSN).

Load the input text

Add the following code to the Main method. The ScreenText method is the essential operation. Its parameters specify which content moderation operations will be done. In this example, the method is configured to:

  • Detect potential profanity in the text.
  • Normalize the text and autocorrect typos.
  • Detect personally identifiable information (PII) such as US and UK phone numbers, email addresses, and US mailing addresses.
  • Use machine-learning-based models to classify the text into three categories.

If you want to learn more about what these operations do, follow the link in the Next steps section.

// Load the input text.
string text = File.ReadAllText(TextFile);
Console.WriteLine("Screening {0}", TextFile);

text = text.Replace(System.Environment.NewLine, " ");
byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(text);
MemoryStream stream = new MemoryStream(byteArray);

// Save the moderation results to a file.
using (StreamWriter outputWriter = new StreamWriter(OutputFile, false))
{
	// Create a Content Moderator client and evaluate the text.
	using (var client = Clients.NewClient())
	{
		// Screen the input text: check for profanity,
		// autocorrect text, check for personally identifying
		// information (PII), and classify the text into three categories
		outputWriter.WriteLine("Autocorrect typos, check for matching terms, PII, and classify.");
		var screenResult =
		client.TextModeration.ScreenText("text/plain", stream, "eng", true, true, null, true);
		outputWriter.WriteLine(
				JsonConvert.SerializeObject(screenResult, Formatting.Indented));
	}
	outputWriter.Flush();
	outputWriter.Close();
}

Run the program

The program will write JSON string data to the TextModerationOutput.txt file. The sample text used in this quickstart gives the following output:

Autocorrect typos, check for matching terms, PII, and classify.
{
"OriginalText": "\"Is this a grabage or crap email [email protected], phone: 6657789887, IP: 255.255.255.255, 1 Microsoft Way, Redmond, WA 98052. These are all UK phone numbers, the last two being Microsoft UK support numbers: +44 870 608 4000 or 0344 800 2400 or 0800 820 3300. Also, 999-99-9999 looks like a social security number (SSN).\"",
"NormalizedText": "\" Is this a garbage or crap email abide@ abed. com, phone: 6657789887, IP: 255. 255. 255. 255, 1 Microsoft Way, Redmond, WA 98052. These are all UK phone numbers, the last two being Microsoft UK support numbers: +44 870 608 4000 or 0344 800 2400 or 0800 820 3300. Also, 999-99-9999 looks like a social security number ( SSN) . \"",
"AutoCorrectedText": "\" Is this a garbage or crap email abide@ abed. com, phone: 6657789887, IP: 255. 255. 255. 255, 1 Microsoft Way, Redmond, WA 98052. These are all UK phone numbers, the last two being Microsoft UK support numbers: +44 870 608 4000 or 0344 800 2400 or 0800 820 3300. Also, 999-99-9999 looks like a social security number ( SSN) . \"",
"Misrepresentation": null,
  	
"Classification": {
    	"Category1": {
      	"Score": 1.5113095059859916E-06
    	},
    	"Category2": {
      	"Score": 0.12747249007225037
    	},
    	"Category3": {
      	"Score": 0.98799997568130493
    	},
    	"ReviewRecommended": true
  },
  "Status": {
    "Code": 3000,
    "Description": "OK",
    "Exception": null
  },
  "PII": {
    	"Email": [
      		{
        	"Detected": "[email protected]",
        	"SubType": "Regular",
        	"Text": "[email protected]",
        	"Index": 33
      		}
    		],
    	"IPA": [
      		{
        	"SubType": "IPV4",
        	"Text": "255.255.255.255",
        	"Index": 73
      		}
    		],
    	"Phone": [
      		{
        	"CountryCode": "US",
        	"Text": "6657789887",
        	"Index": 57
      		},
      		{
        	"CountryCode": "US",
        	"Text": "870 608 4000",
        	"Index": 211
      		},
      		{
        	"CountryCode": "UK",
        	"Text": "+44 870 608 4000",
        	"Index": 207
      		},
      		{
        	"CountryCode": "UK",
        	"Text": "0344 800 2400",
        	"Index": 227
      		},
      		{
        "CountryCode": "UK",
        	"Text": "0800 820 3300",
        	"Index": 244
      		}
    		],
    	 "Address": [{
     		 "Text": "1 Microsoft Way, Redmond, WA 98052",
      		 "Index": 89
    	        }]
  	},
  "Language": "eng",
  "Terms": [
    {
      	"Index": 22,
      	"OriginalIndex": 22,
      	"ListId": 0,
      	"Term": "crap"
    }
  ],
  "TrackingId": "9392c53c-d11a-441d-a874-eb2b93d978d3"
}

Next steps

In this quickstart, you've developed a simple .NET application that uses the Content Moderator service to return relevant information about a given text sample. Next, learn more about what the different flags and classifications mean so you can decide which data you need and how your app should handle it.

[!div class="nextstepaction"] Text moderation guide