title	description	services	documentationcenter	author	manager	editor	keywords	ms.assetid	ms.service	ms.devlang	ms.workload	ms.tgt_pltfrm	ms.topic	ms.date	ms.author
Scaling web service \| Microsoft Docs	Learn how to scale a web service by increasing concurrency and adding new endpoints.	machine-learning		neerajkh	srikants	cgronlun	azure machine learning, web services, operationalization, scaling, endpoint, concurrency	c2c51d7f-fd2d-4f03-bc51-bf47e6969296	machine-learning	NA	data-services	na	article	10/05/2016	neerajkh

Scaling a Web service

Note

This topic describes techniques applicable to a Classic Machine Learning Web service.

By default, each published Web service is configured to support 20 concurrent requests and can be as high as 200 concurrent requests. While the Azure classic portal provides a way to set this value, Azure Machine Learning automatically optimizes the setting to provide the best performance for your web service and the portal value is ignored.

If you plan to call the API with a higher load than a Max Concurrent Calls value of 200 will support, you should create multiple endpoints on the same Web service. You can then randomly distribute your load across all of them.

Add new endpoints for same web service

The scaling of a Web service is a common task. Some reasons to scale are to support more than 200 concurrent requests, increase availability through multiple endpoints, or provide separate endpoints for the web service. You can increase the scale by adding additional endpoints for the same Web service through Azure classic portal or the Azure Machine Learning Web Service portal.

For more information on adding new endpoints, see Creating Endpoints.

Keep in mind that using a high concurrency count can be detrimental if you're not calling the API with a correspondingly high rate. You might see sporadic timeouts and/or spikes in the latency if you put a relatively low load on an API configured for high load.

The synchronous APIs are typically used in situations where a low latency is desired. Latency here implies the time it takes for the API to complete one request, and doesn't account for any network delays. Let's say you have an API with a 50-ms latency. To fully consume the available capacity with throttle level High and Max Concurrent Calls = 20, you need to call this API 20 * 1000 / 50 = 400 times per second. Extending this further, a Max Concurrent Calls of 200 allows you to call the API 4000 times per second, assuming a 50-ms latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine-learning-scaling-webservice.md

machine-learning-scaling-webservice.md

Scaling a Web service

Add new endpoints for same web service

Files

machine-learning-scaling-webservice.md

Latest commit

History

machine-learning-scaling-webservice.md

File metadata and controls

Scaling a Web service

Add new endpoints for same web service