diff --git a/api/detector_batch_processing/README.md b/api/detector_batch_processing/README.md index 80210b43f..e540179a3 100644 --- a/api/detector_batch_processing/README.md +++ b/api/detector_batch_processing/README.md @@ -57,23 +57,24 @@ Not yet supported. Meanwhile, once the shards of images are submitted for proces ### Inputs -| Parameter | Is required | Explanation | -|--------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------| -| input_container_sas | Yes | SAS URL with list and read permissions to the Blob Storage container where the images are stored. | -| images_required_json_sas | No | SAS URL with list and read permissions to a json file in Blob Storage. The json contains a list, where each item (a string) in the list is the full path to an image from the root of the container. An example of the content of this file: `["Season1/Location1/Camera1/image1.jpg", "Season1/Location1/Camera1/image2.jpg"]`. Only images whose paths are listed here will be processed. | -| image_path_prefix | No | Only process images whose full path starts with `image_path_prefix`. Note that any image paths specified in `images_required_json_sas` will need to be the full path from the root of the container, regardless of `image_path_prefix`. | -| first_n | No | Only process the first `first_n` images. Order of images is not guaranteed, but is likely to be alphabetical. Set this to a small number to avoid taking time to fully list all images in the blob (about 15 minutes for 1 million images) if you just want to try this API. | -| sample_n (not yet implemented) | No | Randomly sample `sample_n` images to process. | +| Parameter | Is required | Type | Explanation | +|--------------------------|-------------|-------|-------------------------------------------------------------------------------------------------------------------------------| +| input_container_sas | Yes | string | SAS URL with list and read permissions to the Blob Storage container where the images are stored. | +| images_requested_json_sas | No | string | SAS URL with list and read permissions to a json file in Blob Storage. The json contains a list, where each item (a string) in the list is the full path to an image from the root of the container. An example of the content of this file: `["Season1/Location1/Camera1/image1.jpg", "Season1/Location1/Camera1/image2.jpg"]`. Only images whose paths are listed here will be processed. | +| image_path_prefix | No | string | Only process images whose full path starts with `image_path_prefix` (case-_sensitive_). Note that any image paths specified in `images_requested_json_sas` will need to be the full path from the root of the container, regardless whether `image_path_prefix` is provided. | +| first_n | No | int | Only process the first `first_n` images. Order of images is not guaranteed, but is likely to be alphabetical. Set this to a small number to avoid taking time to fully list all images in the blob (about 15 minutes for 1 million images) if you just want to try this API. | +| sample_n | No |int | Randomly select `sample_n` images to process. | - We assume that all images you would like to process in this batch are uploaded to a container in Azure Blob Storage. - Only images with file name ending in '.jpg' or '.jpeg' (case insensitive) will be processed, so please make sure the file names are compliant before you upload them to the container (you cannot rename a blob without copying it entirely once it is in Blob Storage). +- The path to the images in blob storage cannot contain commas (this would confuse the output CSV). - By default we process all such images in the specified container. You can choose to only process a subset of them by specifying the other input parameters, and the images will be filtered out accordingly in this order: - `images_requested_json_sas` - `image_path_prefix` - `first_n` - - `sample_n` (not yet implemented) + - `sample_n` - For example, if you specified both `images_requested_json_sas` and `first_n`, only images that are in your provided list at `images_requested_json_sas` will be considered, and then we process the `first_n` of those. @@ -151,12 +152,23 @@ The second column is the confidence value of the most confident detection on the The third column contains details of the detections so you can visualize them. It is a stringfied json of a list of lists, representing the detections made on that image. Each detection list has the coordinates of the bounding box surrounding the detection, followed by its confidence: ``` -[ymin, xmin, ymax, xmax, confidence] +[ymin, xmin, ymax, xmax, confidence, (class)] ``` where `(xmin, ymin)` is the upper-left corner of the detection bounding box. The coordinates are relative to the height and width of the image. -When the detector model detects no animal, the confidence is shown as 0.0 (not confident that there is an animal) and the detection column is an empty list. +An integer `class` comes after `confidence` in versions of the API that uses MegaDetector version 3 or later. The `class` label corresponds to the following: + +``` +1: animal +2: person +4: vehicle +``` + +Note that the `vehicle` class (available in Mega Detector version 4 or later) is number 4. Class number 3 (group) is not included in training and should be ignored (and so should any other class labels not listed here) if it shows up in the result. + +When the detector model detects no animal (or person or vehicle), the confidence is shown as 0.0 (not confident that there is an object of interest) and the detection column is an empty list. + ## Post-processing tools diff --git a/api/detector_batch_processing/api/Dockerfile b/api/detector_batch_processing/api/Dockerfile index 9ede2658a..44c2372c1 100755 --- a/api/detector_batch_processing/api/Dockerfile +++ b/api/detector_batch_processing/api/Dockerfile @@ -3,8 +3,10 @@ FROM ai4eregistry.azurecr.io/1.0-base-py-ubuntu16.04:latest RUN echo "source activate ai4e_py_api" >> ~/.bashrc \ && conda install -c conda-forge -n ai4e_py_api numpy pandas +RUN pip install --upgrade pip + # Azure blob packages should already be installed in the base image. Just need to install Azure ML SDK -RUN pip install --upgrade azureml-sdk +RUN pip install azureml-sdk==1.0.33 # Note: supervisor.conf reflects the location and name of your api code. # If the default (./my_api/runserver.py) is renamed, you must change supervisor.conf @@ -35,7 +37,7 @@ ENV SERVICE_OWNER=AI4E_Test \ SERVICE_MODEL_FRAMEOWRK_VERSION=3.6.6 \ SERVICE_MODEL_VERSION=1.0 -ENV API_PREFIX=/v1/camera-trap/detection-batch +ENV API_PREFIX=/v2/camera-trap/detection-batch ENV AZUREML_PASSWORD= diff --git a/api/detector_batch_processing/api/README.md b/api/detector_batch_processing/api/README.md index 4ea7e62fd..8767a5220 100644 --- a/api/detector_batch_processing/api/README.md +++ b/api/detector_batch_processing/api/README.md @@ -36,7 +36,7 @@ You can do this at the command line (where Azure CLI is installed) of the VM whe ``` az acr login --name ai4eregistry ``` -You need to have the subscription where this registry is set as the default subscription. +You need to have the subscription where this registry is set as the default subscription, and you may need to use `sudo` with this command. ### Step 3. Build the Docker container @@ -45,7 +45,7 @@ Now you're all set to build the container. Navigate to the current directory (`detector_batch_processing/api`) where the `Dockerfile` is. ``` -docker build . -t name.azurecr.io/camera-trap-detection-sync:2 +docker build . -t name.azurecr.io/camera-trap-detection-batch-v3:1 ``` You can supply your own tag (`-t` option) and build number. You may need to use `sudo` with this command. @@ -55,7 +55,7 @@ You can supply your own tag (`-t` option) and build number. You may need to use To launch the service, in a `tmux` session, issue: ``` -docker run -p 6000:80 name.azurecr.io/camera-trap-detection-batch:2 |& tee -a camera-trap-api-async-log/log20190415.txt +docker run -p 6000:80 name.azurecr.io/camera-trap-detection-batch-v3:1 |& tee -a camera-trap-api-async-log/log20190415.txt ``` Substitute the tag of the image you built in the last step (or that of a pre-built one), the port you'd like to expose the API at (6000 above), and specify the location to store the log messages (printed to console too). @@ -65,6 +65,6 @@ You may need to use `sudo` with this command. ## Work items -- [ ] Rename `aml_config_scripts` to `aml_scripts` now that the cluster config file is no longer used +- [x] Rename `aml_config_scripts` to `aml_scripts` now that the cluster config file is no longer used -- [ ] Make use of Key Vault to access crendentials +- [ ] Make use of Key Vault to access credentials diff --git a/api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/score.py b/api/detector_batch_processing/api/orchestrator_api/aml_scripts/score.py similarity index 99% rename from api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/score.py rename to api/detector_batch_processing/api/orchestrator_api/aml_scripts/score.py index 5a69e1d6a..19c6c6cd2 100644 --- a/api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/score.py +++ b/api/detector_batch_processing/api/orchestrator_api/aml_scripts/score.py @@ -1,4 +1,4 @@ -print('score.py, beginning - NEW') +print('score.py, beginning - megadetector_v3') import argparse import csv diff --git a/api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/tf_detector.py b/api/detector_batch_processing/api/orchestrator_api/aml_scripts/tf_detector.py similarity index 98% rename from api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/tf_detector.py rename to api/detector_batch_processing/api/orchestrator_api/aml_scripts/tf_detector.py index 1b9538fd8..810bb1f0c 100644 --- a/api/detector_batch_processing/api/orchestrator_api/aml_config_scripts/tf_detector.py +++ b/api/detector_batch_processing/api/orchestrator_api/aml_scripts/tf_detector.py @@ -124,10 +124,11 @@ def generate_detections_batch(self, images, image_ids, batch_size, detection_thr boxes, scores, classes = b_box[i], b_score[i], b_class[i] detections_cur_image = [] # will be empty for an image with no confident detections - for b, s in zip(boxes, scores): + for b, s, c in zip(boxes, scores, classes): if s > detection_threshold: li = TFDetector.convert_numpy_floats(b) li.append(float(s)) + li.append(int(c)) detections_cur_image.append(li) detections.append(detections_cur_image) diff --git a/api/detector_batch_processing/api/orchestrator_api/api_config.py b/api/detector_batch_processing/api/orchestrator_api/api_config.py index edd0887bc..442136f9b 100644 --- a/api/detector_batch_processing/api/orchestrator_api/api_config.py +++ b/api/detector_batch_processing/api/orchestrator_api/api_config.py @@ -1,15 +1,15 @@ # version of the detector model in use -MODEL_VERSION = 'models/object_detection/faster_rcnn_inception_resnet_v2_atrous/megadetector' +MODEL_VERSION = 'models/object_detection/faster_rcnn_inception_resnet_v2_atrous/megadetector_v3/step_686872' # name of the container in the internal storage account to store user facing files: # image list, detection results and failed images list. -INTERNAL_CONTAINER = 'async-api-v2' +INTERNAL_CONTAINER = 'async-api-v3-2' # name of the container in the internal storage account to store outputs of each AML job -AML_CONTAINER = 'aml-out' +AML_CONTAINER = 'aml-out-2' # how often does the checking thread wake up to check if all jobs are done -MONITOR_PERIOD_MINUTES = 30 +MONITOR_PERIOD_MINUTES = 15 # if this number of times the thread wakes up to check is exceeded, stop the monitoring thread MAX_MONITOR_CYCLES = 14 * 48 # 2 weeks, 30-minute interval @@ -31,12 +31,12 @@ 'subscription_id': '74d91980-e5b4-4fd9-adb6-263b8f90ec5b', 'workspace_region': 'eastus', 'resource_group': 'camera_trap_api_rg', - 'workspace_name': 'camera_trap_aml_workspace', + 'workspace_name': 'camera_trap_aml_workspace_2', 'aml_compute_name': 'camera-trap-com', - 'model_name': 'megadetector', + 'model_name': 'megadetector_v3_tf19', - 'source_dir': '/app/orchestrator_api/aml_config_scripts', + 'source_dir': '/app/orchestrator_api/aml_scripts', 'script_name': 'score.py', 'param_batch_size': 8, diff --git a/api/detector_batch_processing/api/orchestrator_api/orchestrator.py b/api/detector_batch_processing/api/orchestrator_api/orchestrator.py index 15bbf186b..484a89fe2 100644 --- a/api/detector_batch_processing/api/orchestrator_api/orchestrator.py +++ b/api/detector_batch_processing/api/orchestrator_api/orchestrator.py @@ -1,9 +1,11 @@ import copy import io import os +import pickle from collections import defaultdict from datetime import datetime, timedelta +import azureml.core import pandas as pd from azure.storage.blob import BlockBlobService, BlobPermissions from azureml.core import Workspace, Experiment @@ -19,6 +21,9 @@ import api_config from sas_blob_utils import SasBlob +print('Version of AML: {}'.format(azureml.core.__version__)) + + # Service principle authentication for AML svc_pr_password = os.environ.get('AZUREML_PASSWORD') svc_pr = ServicePrincipalAuthentication( @@ -97,6 +102,7 @@ def __init__(self, request_id, input_container_sas, internal_datastore): batch_score_step = PythonScriptStep(aml_config['script_name'], source_directory=aml_config['source_dir'], + hash_paths= ['.'], # include all contents of source_directory name='batch_scoring', arguments=['--job_id', param_job_id, '--model_name', aml_config['model_name'], @@ -272,7 +278,7 @@ def _generate_urls_for_outputs(self): sas = self.internal_storage_service.generate_blob_shared_access_signature( self.internal_container, blob_path, permission=BlobPermissions.READ, expiry=expiry ) - url = self.internal_storage_service.make_blob_url('async-api-v2', blob_path, sas_token=sas) + url = self.internal_storage_service.make_blob_url(self.internal_container, blob_path, sas_token=sas) urls[output] = url return urls except Exception as e: diff --git a/api/detector_batch_processing/api/orchestrator_api/runserver.py b/api/detector_batch_processing/api/orchestrator_api/runserver.py index ca8414dd4..a809f2965 100644 --- a/api/detector_batch_processing/api/orchestrator_api/runserver.py +++ b/api/detector_batch_processing/api/orchestrator_api/runserver.py @@ -7,6 +7,7 @@ import os import time from datetime import datetime +from random import shuffle from ai4e_app_insights import AppInsights from ai4e_app_insights_wrapper import AI4EAppInsights @@ -23,6 +24,7 @@ print('Creating application') api_prefix = os.getenv('API_PREFIX') +print('API prefix: ', api_prefix) app = Flask(__name__) api = Api(app) @@ -105,7 +107,7 @@ def _request_detections(**kwargs): input_container_sas = body['input_container_sas'] images_requested_json_sas = body.get('images_requested_json_sas', None) - image_path_prefix = body.get('image_path_prefix', '') + image_path_prefix = body.get('image_path_prefix', None) first_n = body.get('first_n', None) first_n = int(first_n) if first_n else None @@ -120,30 +122,48 @@ def _request_detections(**kwargs): print('runserver.py, running - listing all images to process.') # list all images to process + blob_prefix = None if image_path_prefix is None else image_path_prefix image_paths = SasBlob.list_blobs_in_container(api_config.MAX_NUMBER_IMAGES_ACCEPTED, sas_uri=input_container_sas, - blob_prefix=image_path_prefix, blob_suffix='.jpg') + blob_prefix=blob_prefix, blob_suffix='.jpg') else: print('runserver.py, running - using provided list of images.') image_paths_text = SasBlob.download_blob_to_text(images_requested_json_sas) image_paths = json.loads(image_paths_text) print('runserver.py, length of image_paths provided by the user: {}'.format(len(image_paths))) + image_paths = [i for i in image_paths if str(i).lower().endswith(api_config.ACCEPTED_IMAGE_FILE_ENDINGS)] - print('runserver.py, length of image_paths provided by the user, after filtering to jpg: {}'.format(len(image_paths))) + print('runserver.py, length of image_paths provided by the user, after filtering to jpg: {}'.format( + len(image_paths))) + + if image_path_prefix is not None: + image_paths =[i for i in image_paths if str(i).startswith(image_path_prefix)] + print('runserver.py, length of image_paths provided by the user, after filtering for image_path_prefix: {}'.format( + len(image_paths))) res = orchestrator.spot_check_blob_paths_exist(image_paths, input_container_sas) if res is not None: - raise LookupError('failed - path {} provided in list of images to process does not exist in the container pointed to by data_container_sas.'.format(res)) + raise LookupError('path {} provided in list of images to process does not exist in the container pointed to by data_container_sas.'.format(res)) # apply the first_n and sample_n filters if first_n is not None: - assert first_n > 0, 'parameter first_n is zero.' + assert first_n > 0, 'parameter first_n is 0.' image_paths = image_paths[:first_n] # will not error if first_n > total number of images - # TODO implement sample_n - need to check that sample_n <= len(image_paths) + if sample_n is not None: + assert sample_n > 0, 'parameter sample_n is 0.' + if sample_n > len(image_paths): + raise ValueError('parameter sample_n specifies more images than available (after filtering by other provided params).') + + # we sample by just shuffling the image paths and take the first sample_n images + print('First path before shuffling:', image_paths[0]) + shuffle(image_paths) + print('First path after shuffling:', image_paths[0]) + image_paths = image_paths[:sample_n] + image_paths = sorted(image_paths) num_images = len(image_paths) - print('runserver.py, num_images: {}'.format(num_images)) + print('runserver.py, num_images after applying all filters: {}'.format(num_images)) if num_images < 1: api_task_manager.UpdateTaskStatus(request_id, 'completed - zero images found in container or in provided list of images after filtering with the provided parameters.') return @@ -173,7 +193,7 @@ def _request_detections(**kwargs): begin, end = job_index * num_images_per_job, (job_index + 1) * num_images_per_job job_id = 'request{}_jobindex{}_total{}'.format(request_id, job_index, num_jobs) list_jobs[job_id] = { 'begin': begin, 'end': end } - # TODO send list_jobs_submitted in a pickle to intermediate storage as a record / for restarting the monitoring thread + list_jobs_submitted = aml_compute.submit_jobs(request_id, list_jobs, api_task_manager, num_images) api_task_manager.UpdateTaskStatus(request_id, 'running - all {} images submitted to cluster for processing.'.format(num_images)) @@ -186,6 +206,7 @@ def _request_detections(**kwargs): try: aml_monitor = orchestrator.AMLMonitor(request_id, list_jobs_submitted) + # start another thread to monitor the jobs and consolidate the results when they finish ai4e_wrapper.wrap_async_endpoint(_monitor_detections_request, 'post:_monitor_detections_request', request_id=request_id, @@ -205,6 +226,8 @@ def _monitor_detections_request(**kwargs): max_num_checks = api_config.MAX_MONITOR_CYCLES num_checks = 0 + print('Monitoring thread with _monitor_detections_request started.') + # time.sleep() blocks the current thread only while True: time.sleep(api_config.MONITOR_PERIOD_MINUTES * 60) diff --git a/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py b/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py index a0970a0d6..57f539aa8 100644 --- a/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py +++ b/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py @@ -181,7 +181,7 @@ def list_blobs_in_container(max_number_to_list, sas_uri=None, datastore=None, sas_uri: Azure blob storage SAS token datastore: dict with fields account_name (of the Blob storage account), account_key and container_name blob_prefix: Optional, a string as the prefix to blob names/paths to filter the results to those - with this prefix + with this prefix. Case-sensitive! blob_suffix: Optional, an all lower case string or a tuple of strings, to filter the results to those with this/these suffix(s). The blob names will be lowercased first before comparing with the suffix(es). diff --git a/api/detector_batch_processing/create_new_AML_instance.ipynb b/api/detector_batch_processing/create_new_AML_instance.ipynb new file mode 100644 index 000000000..e6eb9f265 --- /dev/null +++ b/api/detector_batch_processing/create_new_AML_instance.ipynb @@ -0,0 +1,338 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from IPython.core.interactiveshell import InteractiveShell\n", + "InteractiveShell.ast_node_interactivity = 'all'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'1.0.33'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import azureml.core\n", + "azureml.core.__version__\n", + "\n", + "# to upgrade:\n", + "# pip install --upgrade azureml-sdk" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace, ComputeTarget\n", + "from azureml.core.compute import AmlCompute\n", + "from azureml.core.model import Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create a new AML instance\n", + "\n", + "Run this notebook to create a new instance of Azure Machine Learning to support a new instance of the batch processing API.\n", + "\n", + "Azure Machine Learning SDK for Python documentation: https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Modify params in this section:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# Workspace name has to be between 2 and 32 characters of letters and numbers.\n", + "workspace_name = 'camera_trap_aml_workspace_2'\n", + "\n", + "compute_name = 'camera-trap-com' # 2 to 16 chars\n", + "max_nodes = 16\n", + "min_nodes = 0 # set to 0 to allow the cluster to completely deallocate\n", + "idle_seconds_before_scaledown = 120\n", + "\n", + "# set these credentials so you can ssh into the nodes to debug if needed. There's no way to set this after this step!\n", + "admin_username = 'admin'\n", + "admin_user_password = 'default_password'\n", + "\n", + "assert len(workspace_name) < 33\n", + "assert len(compute_name) < 17\n", + "\n", + "# models you'd like to register. The .pb files need to be local\n", + "models = [\n", + " {\n", + " 'name': 'megadetector_v3',\n", + " 'description': 'megadetector version 3',\n", + " 'path': 'path/megadetector_v3.pb'\n", + " },\n", + " {\n", + " 'name': 'megadetector_v2',\n", + " 'description': 'megadetector version 2',\n", + " 'path': 'path/frozen_inference_graph.pb'\n", + " }\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Modify the following section if you'd like to create the AML workspace in your own subscription." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# keep these the same for internal deployments\n", + "subscription_id = '' # fill out the subscription_id\n", + "resource_group = 'camera_trap_api_rg'\n", + "location = 'eastus'\n", + "\n", + "# used by the workspace to save run outputs, code, logs etc\n", + "storage_account='subscriptions/{}/resourcegroups/{}/providers/microsoft.storage/storageaccounts/cameratrstorageiuquhxss'.format(\n", + " subscription_id, resource_group) \n", + "key_vault = 'subscriptions/{}/resourcegroups/{}/providers/microsoft.keyvault/vaults/cameratrkeyvaulthblvewsj'.format(\n", + " subscription_id, resource_group)\n", + "app_insights = 'subscriptions/{}/resourcegroups/{}/providers/microsoft.insights/components/cameratrinsightsiqgcufll'.format(\n", + " subscription_id, resource_group)\n", + "container_registry = 'subscriptions/{}/resourcegroups/{}/providers/microsoft.containerregistry/registries/cameratracrsppftkje'.format(\n", + " subscription_id, resource_group)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Log in to Azure on the CLI\n", + "\n", + "After logging in, make sure the default account shown here is the subscription specified above.\n", + "\n", + "```\n", + "az account show\n", + "```\n", + "\n", + "Otherwise do \n", + "\n", + "```\n", + "az account set \n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create AML workspace\n", + "\n", + "[Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace%28class%29?view=azure-ml-py#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deploying Workspace with name camera_trap_aml_workspace_2.\n", + "Deployed Workspace with name camera_trap_aml_workspace_2.\n" + ] + } + ], + "source": [ + "# takes ~10 seconds and it will show a message \"Deployed Workspace with name...\" if successful.\n", + "\n", + "workspace = Workspace.create(workspace_name, \n", + " auth=None, # If None the default Azure CLI credentials will be used or the API will prompt for credentials\n", + " subscription_id=subscription_id, \n", + " resource_group=resource_group, \n", + " location=location, \n", + " create_resource_group=False, \n", + " friendly_name=None, \n", + " storage_account=storage_account, \n", + " key_vault=key_vault, \n", + " app_insights=app_insights, \n", + " container_registry=container_registry,\n", + " exist_ok=False, \n", + " show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "# optinally save the workspace's config to a text file - not necessary - you can identify a workspace without this file\n", + "# workspace.write_config(path='')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a compute target in the workspace\n", + "if it doesn't yet exist\n", + "\n", + "[Documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py#create-workspace--name--provisioning-configuration-)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "No existing compute target found, creating...\n", + "Creating\n", + "Succeeded\n", + "AmlCompute wait for completion finished\n", + "Minimum number of nodes requested have been provisioned\n" + ] + } + ], + "source": [ + "if compute_name in workspace.compute_targets:\n", + " compute_target = workspace.compute_targets[compute_name]\n", + " if compute_target and type(compute_target) is AmlCompute:\n", + " print('Found existing compute target with this name. You can just use it.' + compute_name)\n", + "else:\n", + " print('No existing compute target found, creating...')\n", + " compute_config = AmlCompute.provisioning_configuration(\n", + " vm_size='STANDARD_NC6S_V3',\n", + " min_nodes=min_nodes, max_nodes=max_nodes,\n", + " idle_seconds_before_scaledown=idle_seconds_before_scaledown,\n", + " admin_username=admin_username, admin_user_password=admin_user_password)\n", + " \n", + " compute_target = AmlCompute.create(workspace, name=compute_name, provisioning_configuration=compute_config)\n", + " compute_target.wait_for_completion(show_output = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register and upload model files\n", + "\n", + "You could ask each job to load the model from blob storage, but registering them with the AML workspace allows you to switch models on the go better.\n", + "\n", + "This takes a while depending on the size of the model files." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Registering model megadetector_v3\n", + "Registering model megadetector_v2\n" + ] + } + ], + "source": [ + "for m in models:\n", + " model = Model.register(workspace, m['path'], m['name'], description=m['description'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Authorize our application to access this AML workspace\n", + "\n", + "We created the workspace above by authenticating to our subscription on the CLI. When our API needs to access the AML instance, it has to authenticate as an application (a service principle). We now need the AML workspace to give that application access.\n", + "\n", + "Instructions for doing this is [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azure-ml.ipynb). \n", + "\n", + "- If you already have an application/service principle that the API instance will be using, go to the step starting with \"Finally, you need to give the service principal permissions to access your workspace\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python [tensorflow]", + "language": "python", + "name": "Python [tensorflow]" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.4" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/api/detector_batch_processing/postprocess_batch_results.py b/api/detector_batch_processing/postprocess_batch_results.py index eb9738412..032e951f6 100644 --- a/api/detector_batch_processing/postprocess_batch_results.py +++ b/api/detector_batch_processing/postprocess_batch_results.py @@ -630,7 +630,8 @@ def process_batch_results(options): # Using iMerit labels as ground truth (all) if True: options.ground_truth_json_file = os.path.join(baseDir,'rspb_20190409_presence.json') - options.output_dir = os.path.join(baseDir,'postprocessing_output_mdv3_presence_verified_filtered') + # options.output_dir = os.path.join(baseDir,'postprocessing_output_mdv3_presence_verified_filtered') + options.output_dir = os.path.join(baseDir,'postprocessing_output_merge_test') # Using iMerit labels as ground truth (val) if False: diff --git a/visualization/visualization_utils.py b/visualization/visualization_utils.py index fccb4f67d..1188946bf 100644 --- a/visualization/visualization_utils.py +++ b/visualization/visualization_utils.py @@ -69,7 +69,8 @@ def resize_image(image, targetWidth, targetHeight=-1): return resizedImage -def render_iMerit_boxes(boxes, classes, image, label_map=annotation_constants.bbox_category_id_to_name): +def render_iMerit_boxes(boxes, classes, image, + label_map=annotation_constants.bbox_category_id_to_name): """ Renders bounding boxes and their category labels on a PIL image. @@ -96,10 +97,11 @@ def render_iMerit_boxes(boxes, classes, image, label_map=annotation_constants.bb display_strs.append([clss]) display_boxes = np.array(display_boxes) - draw_bounding_boxes_on_image(image, display_boxes, display_str_list_list=display_strs) + draw_bounding_boxes_on_image(image, display_boxes, classes, display_strs=display_strs) -def render_db_bounding_boxes(boxes, classes, image, original_size=None, label_map=None, thickness=4): +def render_db_bounding_boxes(boxes, classes, image, original_size=None, + label_map=None, thickness=4): """ Render bounding boxes (with class labels) on [image]. This is a wrapper for draw_bounding_boxes_on_image, allowing the caller to operate on a resized image @@ -114,7 +116,9 @@ def render_db_bounding_boxes(boxes, classes, image, original_size=None, label_ma image_size = image.size img_width, img_height = image_size + for box, clss in zip(boxes, classes): + x_min_abs, y_min_abs, width_abs, height_abs = box ymin = y_min_abs / img_height @@ -130,12 +134,14 @@ def render_db_bounding_boxes(boxes, classes, image, original_size=None, label_ma display_strs.append([clss]) display_boxes = np.array(display_boxes) - draw_bounding_boxes_on_image(image, display_boxes, display_str_list_list=display_strs, + + draw_bounding_boxes_on_image(image, display_boxes, display_strs=display_strs, thickness=thickness) -def render_detection_bounding_boxes(boxes_and_scores, image, label_map=annotation_constants.bbox_category_id_to_name, - confidence_threshold=0.8, thickness=4, color_map=annotation_constants.bbox_color_map): +def render_detection_bounding_boxes(boxes_scores_classes, image, + label_map=annotation_constants.bbox_category_id_to_name, + confidence_threshold=0.5, thickness=4): """ Renders bounding boxes, label and confidence on an image if confidence is above the threshold. This is works with the output of the detector batch processing API. @@ -156,37 +162,66 @@ def render_detection_bounding_boxes(boxes_and_scores, image, label_map=annotatio """ display_boxes = [] display_strs = [] # list of lists, one list of strings for each bounding box (to accommodate multiple labels) - display_colors = [] - for detection in boxes_and_scores: + classes = [] + + for detection in boxes_scores_classes: + score = detection[4] if score > confidence_threshold: display_boxes.append(detection[:4]) - clss = 1 - if len(detection) > 5: - clss = detection[5] + + if len(detection) < 6: + clss = 1 # megadetector_v2 did not output a class label + else: + clss = int(detection[5]) + label = label_map[clss] if clss in label_map else str(clss) displayed_label = '{}: {}%'.format(label, round(100 * score)) display_strs.append([displayed_label]) - if len(color_map) == 0 or clss not in color_map: - display_colors.append('red') - else: - display_colors.append(color_map[clss]) + classes.append(clss) display_boxes = np.array(display_boxes) - draw_bounding_boxes_on_image(image, display_boxes, color=display_colors, - display_str_list_list=display_strs, - thickness=thickness) + draw_bounding_boxes_on_image(image, display_boxes, classes, + display_strs=display_strs, thickness=thickness) -# The following functions are from: + +# The following functions are modified versions of those at: # # https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py +COLORS = [ + 'AliceBlue', 'Red', 'RoyalBlue', 'Gold', 'Chartreuse', 'Aqua', 'Azure', 'Beige', 'Bisque', + 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite', + 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', + 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange', + 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet', + 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite', + 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'GoldenRod', + 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki', + 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', + 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey', + 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', + 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime', + 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid', + 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', + 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin', + 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', + 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', + 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', + 'RosyBrown', 'Aquamarine', 'SaddleBrown', 'Green', 'SandyBrown', + 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue', + 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow', + 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White', + 'WhiteSmoke', 'Yellow', 'YellowGreen' +] + + def draw_bounding_boxes_on_image(image, boxes, - color='red', + classes, thickness=1, - display_str_list_list=()): + display_strs=()): """ Draws bounding boxes on image. @@ -194,16 +229,12 @@ def draw_bounding_boxes_on_image(image, image: a PIL.Image object. boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax). The coordinates are in normalized format between [0, 1]. - color: color to draw bounding box. Default is red. Can be a list of colors, one per box. thickness: line thickness. Default value is 4. - display_str_list_list: list of list of strings. + display_strs: list of list of strings. a list of strings for each bounding box. The reason to pass a list of strings for a bounding box is that it might contain multiple labels. - - Raises: - ValueError: if boxes is not a [N, 4] array """ boxes_shape = boxes.shape if not boxes_shape: @@ -212,15 +243,12 @@ def draw_bounding_boxes_on_image(image, # print('Input must be of size [N, 4], but is ' + str(boxes_shape)) return # no object detection on this image, return for i in range(boxes_shape[0]): - if isinstance(color,list): - boxcolor = color[i] - else: - boxcolor = color - display_str_list = () - if display_str_list_list: - display_str_list = display_str_list_list[i] - draw_bounding_box_on_image(image, boxes[i, 0], boxes[i, 1], boxes[i, 2], - boxes[i, 3], boxcolor, thickness, display_str_list) + if display_strs: + display_str_list = display_strs[i] + draw_bounding_box_on_image(image, + boxes[i, 0], boxes[i, 1], boxes[i, 2], boxes[i, 3], + classes[i], + thickness=thickness, display_str_list=display_str_list) def draw_bounding_box_on_image(image, @@ -228,11 +256,12 @@ def draw_bounding_box_on_image(image, xmin, ymax, xmax, - color='red', + clss, thickness=4, display_str_list=(), use_normalized_coordinates=True): - """Adds a bounding box to an image. + """ + Adds a bounding box to an image. Bounding box coordinates can be specified in either absolute (pixel) or normalized coordinates by setting the use_normalized_coordinates argument. @@ -248,7 +277,7 @@ def draw_bounding_box_on_image(image, xmin: xmin of bounding box. ymax: ymax of bounding box. xmax: xmax of bounding box. - color: color to draw bounding box. Default is red. + clss: int, the class of the object in this bounding box. thickness: line thickness. Default value is 4. display_str_list: list of strings to display in box (each to be shown on its own line). @@ -256,6 +285,8 @@ def draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax as relative to the image. Otherwise treat coordinates as absolute. """ + color = COLORS[int(clss) % len(COLORS)] + draw = ImageDraw.Draw(image) im_width, im_height = image.size if use_normalized_coordinates: @@ -282,10 +313,13 @@ def draw_bounding_box_on_image(image, text_bottom = top else: text_bottom = bottom + total_display_str_height + # Reverse list and print from bottom to top. for display_str in display_str_list[::-1]: text_width, text_height = font.getsize(display_str) margin = np.ceil(0.05 * text_height) + + draw.rectangle( [(left, text_bottom - text_height - 2 * margin), (left + text_width, text_bottom)], diff --git a/visualization/visualize_detector_output.py b/visualization/visualize_detector_output.py index e330288a2..584fd8bae 100644 --- a/visualization/visualize_detector_output.py +++ b/visualization/visualize_detector_output.py @@ -21,12 +21,8 @@ import visualization_utils as vis_utils - #%% Settings and user-supplied arguments -viz_size = (675, 450) # width by height, in pixels - - parser = argparse.ArgumentParser(description=('Annotate the bounding boxes predicted by a detector ' 'above some confidence threshold, and save the annotated images.')) @@ -35,7 +31,8 @@ default='RequestID_all_output.csv') parser.add_argument('out_dir', type=str, - help='path to a directory where the annotated images will be saved. Created if does not already exist') + help=('path to a directory where the annotated images will be saved. ' + 'The directory will be created if does not exit')) parser.add_argument('-c', '--confidence', type=float, help=('a value between 0 and 1, indicating the confidence threshold above which to visualize ' @@ -48,17 +45,26 @@ default=None) parser.add_argument('-s', '--sas_url', type=str, - help=('SAS URL with list and read permissions to an Azure blob storage container where the ' - 'images are stored. You can use Azure Storage Explorer to obtain a SAS URL'), + help=('SAS URL, in double quotes, with list and read permissions to an Azure blob storage ' + 'container where the images are stored. ' + 'You can use Azure Storage Explorer to obtain a SAS URL'), default=None) parser.add_argument('-n', '--sample', type=int, help=('an integer specifying how many images should be annotated and rendered. Default (-1) is all ' - 'images that have detector result. There may result in fewer images if some are not ' - 'found in images_dir'), + 'images that are in the detector output file. There may result in fewer images if some are ' + 'not found in images_dir'), default=-1) +parser.add_argument('-w', '--output_image_width', type=int, + help=('an integer indicating the desired width in pixels of the output annotated images. ' + 'Use -1 to not resize.'), + default=700) + args = parser.parse_args() +print('Options to the script: ') +print(args) +print() assert args.confidence < 1.0 and args.confidence > 0.0, \ 'The confidence threshold {} supplied is not valid; choose a threshold between 0 and 1.'.format(args.confidence) @@ -77,7 +83,13 @@ os.makedirs(args.out_dir, exist_ok=True) -#%% Helper functions +#%% Helper functions and constants + +DETECTOR_LABEL_MAP = { + 1: 'animal', + 2: 'person', + 3: 'vehicle' # will be available in megadetector v4 +} def get_sas_key_from_uri(sas_uri): """Get the query part of the SAS token that contains permissions, access times and @@ -143,9 +155,11 @@ def get_container_from_uri(sas_uri): if images_local: image_obj = os.path.join(args.images_dir, image_id) if not os.path.exists(image_obj): - print('Image {} is not found at images_dir; skipped.'.format(image_id)) + print('Image {} is not found at local images_dir; skipped.'.format(image_id)) continue else: + print('image_id:', image_id) + print('container_name:', container_name) if not blob_service.exists(container_name, blob_name=image_id): print('Image {} is not found in the blob container {}; skipped.'.format(image_id, container_name)) continue @@ -153,10 +167,13 @@ def get_container_from_uri(sas_uri): image_obj = io.BytesIO() _ = blob_service.get_blob_to_stream(container_name, image_id, image_obj) - image = vis_utils.open_image(image_obj).resize(viz_size) # resize is to display them more quickly - vis_utils.render_detection_bounding_boxes(boxes_and_scores, image, confidence_threshold=args.confidence) + # resize is for displaying them more quickly + image = vis_utils.resize_image(vis_utils.open_image(image_obj), args.output_image_width) + + vis_utils.render_detection_bounding_boxes(boxes_and_scores, image, label_map=DETECTOR_LABEL_MAP, + confidence_threshold=args.confidence) - annotated_img_name = image_id.replace('/', '~') + annotated_img_name = image_id.replace('/', '~').replace('\\', '~') annotated_img_path = os.path.join(args.out_dir, annotated_img_name) image.save(annotated_img_path) num_saved += 1