Merge branch 'master' into users/dan/batch_postprocessing

# Conflicts: # visualization/visualization_utils.py
vdedyukhin · May 12, 2019 · bcddfa1 · bcddfa1
2 parents 375c472 + 6e4c2ff
commit bcddfa1
Show file tree

Hide file tree

Showing 13 changed files with 523 additions and 89 deletions.
diff --git a/api/detector_batch_processing/README.md b/api/detector_batch_processing/README.md
@@ -57,23 +57,24 @@ Not yet supported. Meanwhile, once the shards of images are submitted for proces
 
 ### Inputs
 
-| Parameter                | Is required | Explanation                                                                                                                          |
-|--------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------|
-| input_container_sas      | Yes         | SAS URL with list and read permissions to the Blob Storage container where the images are stored.                             |
-| images_required_json_sas | No          | SAS URL with list and read permissions to a json file in Blob Storage. The json contains a list, where each item (a string) in the list is the full path to an image from the root of the container. An example of the content of this file: `["Season1/Location1/Camera1/image1.jpg", "Season1/Location1/Camera1/image2.jpg"]`.  Only images whose paths are listed here will be processed. |
-| image_path_prefix        | No          | Only process images whose full path starts with `image_path_prefix`. Note that any image paths specified in `images_required_json_sas` will need to be the full path from the root of the container, regardless of `image_path_prefix`. |
-| first_n                  | No          | Only process the first `first_n` images. Order of images is not guaranteed, but is likely to be alphabetical. Set this to a small number to avoid taking time to fully list all images in the blob (about 15 minutes for 1 million images) if you just want to try this API. |
-| sample_n (not yet implemented)                | No          | Randomly sample `sample_n` images to process. |
+| Parameter                | Is required | Type | Explanation                                                                                                                          |
+|--------------------------|-------------|-------|-------------------------------------------------------------------------------------------------------------------------------|
+| input_container_sas      | Yes         | string | SAS URL with list and read permissions to the Blob Storage container where the images are stored.                             |
+| images_requested_json_sas | No          | string | SAS URL with list and read permissions to a json file in Blob Storage. The json contains a list, where each item (a string) in the list is the full path to an image from the root of the container. An example of the content of this file: `["Season1/Location1/Camera1/image1.jpg", "Season1/Location1/Camera1/image2.jpg"]`.  Only images whose paths are listed here will be processed. |
+| image_path_prefix        | No          | string | Only process images whose full path starts with `image_path_prefix` (case-_sensitive_). Note that any image paths specified in `images_requested_json_sas` will need to be the full path from the root of the container, regardless whether `image_path_prefix` is provided. |
+| first_n                  | No          | int | Only process the first `first_n` images. Order of images is not guaranteed, but is likely to be alphabetical. Set this to a small number to avoid taking time to fully list all images in the blob (about 15 minutes for 1 million images) if you just want to try this API. |
+| sample_n                | No          |int | Randomly select `sample_n` images to process. |
 
 
 - We assume that all images you would like to process in this batch are uploaded to a container in Azure Blob Storage. 
 - Only images with file name ending in '.jpg' or '.jpeg' (case insensitive) will be processed, so please make sure the file names are compliant before you upload them to the container (you cannot rename a blob without copying it entirely once it is in Blob Storage). 
+- The path to the images in blob storage cannot contain commas (this would confuse the output CSV).
 
 - By default we process all such images in the specified container. You can choose to only process a subset of them by specifying the other input parameters, and the images will be filtered out accordingly in this order:
     - `images_requested_json_sas`
     - `image_path_prefix`
     - `first_n`
-    - `sample_n`  (not yet implemented)
+    - `sample_n`
 
     - For example, if you specified both `images_requested_json_sas` and `first_n`, only images that are in your provided list at `images_requested_json_sas` will be considered, and then we process the `first_n` of those.
 
@@ -151,12 +152,23 @@ The second column is the confidence value of the most confident detection on the
 The third column contains details of the detections so you can visualize them. It is a stringfied json of a list of lists, representing the detections made on that image. Each detection list has the coordinates of the bounding box surrounding the detection, followed by its confidence:
 
 ```
-[ymin, xmin, ymax, xmax, confidence]
+[ymin, xmin, ymax, xmax, confidence, (class)]
 ```
 
 where `(xmin, ymin)` is the upper-left corner of the detection bounding box. The coordinates are relative to the height and width of the image. 
 
-When the detector model detects no animal, the confidence is shown as 0.0 (not confident that there is an animal) and the detection column is an empty list.
+An integer `class` comes after `confidence` in versions of the API that uses MegaDetector version 3 or later. The `class` label corresponds to the following:
+
+```
+1: animal
+2: person
+4: vehicle
+```
+
+Note that the `vehicle` class (available in Mega Detector version 4 or later) is number 4. Class number 3 (group) is not included in training and should be ignored (and so should any other class labels not listed here) if it shows up in the result.
+
+When the detector model detects no animal (or person or vehicle), the confidence is shown as 0.0 (not confident that there is an object of interest) and the detection column is an empty list.
+
 
 ## Post-processing tools
 

diff --git a/api/detector_batch_processing/api/Dockerfile b/api/detector_batch_processing/api/Dockerfile
@@ -3,8 +3,10 @@ FROM ai4eregistry.azurecr.io/1.0-base-py-ubuntu16.04:latest
 RUN echo "source activate ai4e_py_api" >> ~/.bashrc \
     && conda install -c conda-forge -n ai4e_py_api numpy pandas
 
+RUN pip install --upgrade pip
+
 # Azure blob packages should already be installed in the base image. Just need to install Azure ML SDK
-RUN pip install --upgrade azureml-sdk
+RUN pip install azureml-sdk==1.0.33
 
 # Note: supervisor.conf reflects the location and name of your api code.
 # If the default (./my_api/runserver.py) is renamed, you must change supervisor.conf
@@ -35,7 +37,7 @@ ENV SERVICE_OWNER=AI4E_Test \
     SERVICE_MODEL_FRAMEOWRK_VERSION=3.6.6 \
     SERVICE_MODEL_VERSION=1.0
 
-ENV API_PREFIX=/v1/camera-trap/detection-batch
+ENV API_PREFIX=/v2/camera-trap/detection-batch
 
 ENV AZUREML_PASSWORD=
 

diff --git a/api/detector_batch_processing/api/README.md b/api/detector_batch_processing/api/README.md
@@ -36,7 +36,7 @@ You can do this at the command line (where Azure CLI is installed) of the VM whe
 ```
 az acr login --name ai4eregistry
 ```
-You need to have the subscription where this registry is set as the default subscription.
+You need to have the subscription where this registry is set as the default subscription, and you may need to use `sudo` with this command.
 
 
 ### Step 3. Build the Docker container
@@ -45,7 +45,7 @@ Now you're all set to build the container.
 Navigate to the current directory (`detector_batch_processing/api`) where the `Dockerfile` is.
 
 ```
-docker build . -t name.azurecr.io/camera-trap-detection-sync:2
+docker build . -t name.azurecr.io/camera-trap-detection-batch-v3:1
 ```
 
 You can supply your own tag (`-t` option) and build number. You may need to use `sudo` with this command.
@@ -55,7 +55,7 @@ You can supply your own tag (`-t` option) and build number. You may need to use
 To launch the service, in a `tmux` session, issue:
 
 ```
-docker run -p 6000:80 name.azurecr.io/camera-trap-detection-batch:2 |& tee -a camera-trap-api-async-log/log20190415.txt
+docker run -p 6000:80 name.azurecr.io/camera-trap-detection-batch-v3:1 |& tee -a camera-trap-api-async-log/log20190415.txt
 ```
 
 Substitute the tag of the image you built in the last step (or that of a pre-built one), the port you'd like to expose the API at (6000 above), and specify the location to store the log messages (printed to console too).
@@ -65,6 +65,6 @@ You may need to use `sudo` with this command.
 
 ## Work items
 
-- [ ] Rename `aml_config_scripts` to `aml_scripts` now that the cluster config file is no longer used
+- [x] Rename `aml_config_scripts` to `aml_scripts` now that the cluster config file is no longer used
 
-- [ ] Make use of Key Vault to access crendentials
+- [ ] Make use of Key Vault to access credentials
diff --git a/...hestrator_api/aml_config_scripts/score.py → ...api/orchestrator_api/aml_scripts/score.py b/...hestrator_api/aml_config_scripts/score.py → ...api/orchestrator_api/aml_scripts/score.py
@@ -1,4 +1,4 @@
-print('score.py, beginning - NEW')
+print('score.py, beginning - megadetector_v3')
 
 import argparse
 import csv

diff --git a/...tor_api/aml_config_scripts/tf_detector.py → ...chestrator_api/aml_scripts/tf_detector.py b/...tor_api/aml_config_scripts/tf_detector.py → ...chestrator_api/aml_scripts/tf_detector.py
@@ -124,10 +124,11 @@ def generate_detections_batch(self, images, image_ids, batch_size, detection_thr
                         boxes, scores, classes = b_box[i], b_score[i], b_class[i]
                         detections_cur_image = []  # will be empty for an image with no confident detections
 
-                        for b, s in zip(boxes, scores):
+                        for b, s, c in zip(boxes, scores, classes):
                             if s > detection_threshold:
                                 li = TFDetector.convert_numpy_floats(b)
                                 li.append(float(s))
+                                li.append(int(c))
                                 detections_cur_image.append(li)
 
                         detections.append(detections_cur_image)

diff --git a/api/detector_batch_processing/api/orchestrator_api/api_config.py b/api/detector_batch_processing/api/orchestrator_api/api_config.py
@@ -1,15 +1,15 @@
 # version of the detector model in use
-MODEL_VERSION = 'models/object_detection/faster_rcnn_inception_resnet_v2_atrous/megadetector'
+MODEL_VERSION = 'models/object_detection/faster_rcnn_inception_resnet_v2_atrous/megadetector_v3/step_686872'
 
 # name of the container in the internal storage account to store user facing files:
 # image list, detection results and failed images list.
-INTERNAL_CONTAINER = 'async-api-v2'
+INTERNAL_CONTAINER = 'async-api-v3-2'
 
 # name of the container in the internal storage account to store outputs of each AML job
-AML_CONTAINER = 'aml-out'
+AML_CONTAINER = 'aml-out-2'
 
 # how often does the checking thread wake up to check if all jobs are done
-MONITOR_PERIOD_MINUTES = 30
+MONITOR_PERIOD_MINUTES = 15
 
 # if this number of times the thread wakes up to check is exceeded, stop the monitoring thread
 MAX_MONITOR_CYCLES = 14 * 48  # 2 weeks, 30-minute interval
@@ -31,12 +31,12 @@
     'subscription_id': '74d91980-e5b4-4fd9-adb6-263b8f90ec5b',
     'workspace_region': 'eastus',
     'resource_group': 'camera_trap_api_rg',
-    'workspace_name': 'camera_trap_aml_workspace',
+    'workspace_name': 'camera_trap_aml_workspace_2',
     'aml_compute_name': 'camera-trap-com',
 
-    'model_name': 'megadetector',
+    'model_name': 'megadetector_v3_tf19',
 
-    'source_dir': '/app/orchestrator_api/aml_config_scripts',
+    'source_dir': '/app/orchestrator_api/aml_scripts',
     'script_name': 'score.py',
 
     'param_batch_size': 8,

diff --git a/api/detector_batch_processing/api/orchestrator_api/orchestrator.py b/api/detector_batch_processing/api/orchestrator_api/orchestrator.py
@@ -1,9 +1,11 @@
 import copy
 import io
 import os
+import pickle
 from collections import defaultdict
 from datetime import datetime, timedelta
 
+import azureml.core
 import pandas as pd
 from azure.storage.blob import BlockBlobService, BlobPermissions
 from azureml.core import Workspace, Experiment
@@ -19,6 +21,9 @@
 import api_config
 from sas_blob_utils import SasBlob
 
+print('Version of AML: {}'.format(azureml.core.__version__))
+
+
 # Service principle authentication for AML
 svc_pr_password = os.environ.get('AZUREML_PASSWORD')
 svc_pr = ServicePrincipalAuthentication(
@@ -97,6 +102,7 @@ def __init__(self, request_id, input_container_sas, internal_datastore):
 
             batch_score_step = PythonScriptStep(aml_config['script_name'],
                                                 source_directory=aml_config['source_dir'],
+                                                hash_paths= ['.'],  # include all contents of source_directory
                                                 name='batch_scoring',
                                                 arguments=['--job_id', param_job_id,
                                                            '--model_name', aml_config['model_name'],
@@ -272,7 +278,7 @@ def _generate_urls_for_outputs(self):
                 sas = self.internal_storage_service.generate_blob_shared_access_signature(
                     self.internal_container, blob_path, permission=BlobPermissions.READ, expiry=expiry
                 )
-                url = self.internal_storage_service.make_blob_url('async-api-v2', blob_path, sas_token=sas)
+                url = self.internal_storage_service.make_blob_url(self.internal_container, blob_path, sas_token=sas)
                 urls[output] = url
             return urls
         except Exception as e:

diff --git a/api/detector_batch_processing/api/orchestrator_api/runserver.py b/api/detector_batch_processing/api/orchestrator_api/runserver.py
@@ -7,6 +7,7 @@
 import os
 import time
 from datetime import datetime
+from random import shuffle
 
 from ai4e_app_insights import AppInsights
 from ai4e_app_insights_wrapper import AI4EAppInsights
@@ -23,6 +24,7 @@
 print('Creating application')
 
 api_prefix = os.getenv('API_PREFIX')
+print('API prefix: ', api_prefix)
 app = Flask(__name__)
 api = Api(app)
 
@@ -105,7 +107,7 @@ def _request_detections(**kwargs):
 
         input_container_sas = body['input_container_sas']
         images_requested_json_sas = body.get('images_requested_json_sas', None)
-        image_path_prefix = body.get('image_path_prefix', '')
+        image_path_prefix = body.get('image_path_prefix', None)
 
         first_n = body.get('first_n', None)
         first_n = int(first_n) if first_n else None
@@ -120,30 +122,48 @@ def _request_detections(**kwargs):
             print('runserver.py, running - listing all images to process.')
 
             # list all images to process
+            blob_prefix = None if image_path_prefix is None else image_path_prefix
             image_paths = SasBlob.list_blobs_in_container(api_config.MAX_NUMBER_IMAGES_ACCEPTED,
                                                           sas_uri=input_container_sas,
-                                                          blob_prefix=image_path_prefix, blob_suffix='.jpg')
+                                                          blob_prefix=blob_prefix, blob_suffix='.jpg')
         else:
             print('runserver.py, running - using provided list of images.')
             image_paths_text = SasBlob.download_blob_to_text(images_requested_json_sas)
             image_paths = json.loads(image_paths_text)
             print('runserver.py, length of image_paths provided by the user: {}'.format(len(image_paths)))
+
             image_paths = [i for i in image_paths if str(i).lower().endswith(api_config.ACCEPTED_IMAGE_FILE_ENDINGS)]
-            print('runserver.py, length of image_paths provided by the user, after filtering to jpg: {}'.format(len(image_paths)))
+            print('runserver.py, length of image_paths provided by the user, after filtering to jpg: {}'.format(
+                len(image_paths)))
+
+            if image_path_prefix is not None:
+                image_paths =[i for i in image_paths if str(i).startswith(image_path_prefix)]
+                print('runserver.py, length of image_paths provided by the user, after filtering for image_path_prefix: {}'.format(
+                    len(image_paths)))
 
             res = orchestrator.spot_check_blob_paths_exist(image_paths, input_container_sas)
             if res is not None:
-                raise LookupError('failed - path {} provided in list of images to process does not exist in the container pointed to by data_container_sas.'.format(res))
+                raise LookupError('path {} provided in list of images to process does not exist in the container pointed to by data_container_sas.'.format(res))
 
         # apply the first_n and sample_n filters
         if first_n is not None:
-            assert first_n > 0, 'parameter first_n is zero.'
+            assert first_n > 0, 'parameter first_n is 0.'
             image_paths = image_paths[:first_n]  # will not error if first_n > total number of images
 
-        # TODO implement sample_n - need to check that sample_n <= len(image_paths)
+        if sample_n is not None:
+            assert sample_n > 0, 'parameter sample_n is 0.'
+            if sample_n > len(image_paths):
+                raise ValueError('parameter sample_n specifies more images than available (after filtering by other provided params).')
+
+            # we sample by just shuffling the image paths and take the first sample_n images
+            print('First path before shuffling:', image_paths[0])
+            shuffle(image_paths)
+            print('First path after shuffling:', image_paths[0])
+            image_paths = image_paths[:sample_n]
+            image_paths = sorted(image_paths)
 
         num_images = len(image_paths)
-        print('runserver.py, num_images: {}'.format(num_images))
+        print('runserver.py, num_images after applying all filters: {}'.format(num_images))
         if num_images < 1:
             api_task_manager.UpdateTaskStatus(request_id, 'completed - zero images found in container or in provided list of images after filtering with the provided parameters.')
             return
@@ -173,7 +193,7 @@ def _request_detections(**kwargs):
             begin, end = job_index * num_images_per_job, (job_index + 1) * num_images_per_job
             job_id = 'request{}_jobindex{}_total{}'.format(request_id, job_index, num_jobs)
             list_jobs[job_id] = { 'begin': begin, 'end': end }
-        # TODO send list_jobs_submitted in a pickle to intermediate storage as a record / for restarting the monitoring thread
+
         list_jobs_submitted = aml_compute.submit_jobs(request_id, list_jobs, api_task_manager, num_images)
         api_task_manager.UpdateTaskStatus(request_id,
                                           'running - all {} images submitted to cluster for processing.'.format(num_images))
@@ -186,6 +206,7 @@ def _request_detections(**kwargs):
 
     try:
         aml_monitor = orchestrator.AMLMonitor(request_id, list_jobs_submitted)
+
         # start another thread to monitor the jobs and consolidate the results when they finish
         ai4e_wrapper.wrap_async_endpoint(_monitor_detections_request, 'post:_monitor_detections_request',
                                          request_id=request_id,
@@ -205,6 +226,8 @@ def _monitor_detections_request(**kwargs):
         max_num_checks = api_config.MAX_MONITOR_CYCLES
         num_checks = 0
 
+        print('Monitoring thread with _monitor_detections_request started.')
+
         # time.sleep() blocks the current thread only
         while True:
             time.sleep(api_config.MONITOR_PERIOD_MINUTES * 60)

diff --git a/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py b/api/detector_batch_processing/api/orchestrator_api/sas_blob_utils.py
@@ -181,7 +181,7 @@ def list_blobs_in_container(max_number_to_list, sas_uri=None, datastore=None,
             sas_uri: Azure blob storage SAS token
             datastore: dict with fields account_name (of the Blob storage account), account_key and container_name
             blob_prefix: Optional, a string as the prefix to blob names/paths to filter the results to those
-                        with this prefix
+                        with this prefix. Case-sensitive!
             blob_suffix: Optional, an all lower case string or a tuple of strings, to filter the results to
                         those with this/these suffix(s).
                         The blob names will be lowercased first before comparing with the suffix(es).