Skip to content

Commit

Permalink
Merge branch 'users/dan/timelapse_integration' into users/dan/batch_s…
Browse files Browse the repository at this point in the history
…cript_output_format
  • Loading branch information
agentmorris committed Jun 6, 2019
2 parents f47c1ab + 1c2f731 commit 7e5d585
Show file tree
Hide file tree
Showing 8 changed files with 552 additions and 1 deletion.
7 changes: 6 additions & 1 deletion api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ We package useful components developed in the Camera Traps project into APIs and

## Detector

Our animal detection model ([MegaDetector](https://github.com/Microsoft/CameraTraps#megadetector)) trained on camera trap images from a variety of ecosystems is exposed through two APIs, one for real-time applications or small batches of test images (synchronous API), and one for processing large collections of images (batch processing API). These APIs can be adapted to deploy any algorithms or models - see our tutorial in the [AI for Earth API Framework](https://github.com/Microsoft/AIforEarth-API-Development) repo.
Our animal detection model ([MegaDetector](https://github.com/Microsoft/CameraTraps#megadetector)) trained on camera trap images from a variety of ecosystems is exposed through two APIs, one for real-time applications or small batches of test images (synchronous API), and one for processing large collections of images (batch processing API). These APIs can be adapted to deploy any algorithms or models – see our tutorial in the [AI for Earth API Framework](https://github.com/Microsoft/AIforEarth-API-Development) repo.


### Synchronous API
Expand All @@ -22,3 +22,8 @@ This API runs the detector on up to 2 million images in one request using [Azure
Upcoming improvements:
- [ ] Adapt `runserver.py` to use the newest version of the AI4E API Framework
- [ ] More checks on the input container and image list SAS keys


## Integration with other tools

The “integration” folder contains guidelines and postprocessing scripts for using the output of our API in other applications.
Binary file added api/integration/MLDebugTemplate.tdb
Binary file not shown.
Binary file added api/integration/images/tl_boxes.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added api/integration/images/tl_confidence.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added api/integration/images/tl_template.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
200 changes: 200 additions & 0 deletions api/integration/prepare_api_output_for_timelapse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
#
# prepare_api_output_for_timelapse.py
#
# Takes output from the batch API and does some conversions to prepare
# it for use in Timelapse.
#
# Specifically:
#
# * Removes the class field from each bounding box
# * Optionally does query-based subsetting of rows
# * Optionally does a search and replace on filenames
# * Replaces backslashes with forward slashes
# * Renames "detections" to "predicted_boxes"
#
# Note that "relative" paths as interpreted by Timelapse aren't strictly relative as
# of 6/5/2019. If your project is in:
#
# c:\myproject
#
# ...and your .tdb file is:
#
# c:\myproject\blah.tdb
#
# ...and you have an image at:
#
# c:\myproject\imagefolder1\img.jpg
#
# The .csv that Timelapse sees should refer to this as:
#
# myproject/imagefolder1/img.jpg
#
# ...*not* as:
#
# imagefolder1/img.jpg
#
# Hence all the search/replace functionality in this script. It's very straightforward
# once you get this and doesn't take time, but it's easy to forget to do this. This will
# be fixed in an upcoming release.
#

#%% Constants and imports

# Python standard
import csv
import os

# pip-installable
from tqdm import tqdm

# AI4E repos, expected to be available on the path
from api.batch_processing.load_api_results import load_api_results
import matlab_porting_tools as mpt


#%% Helper classes

class TimelapsePrepOptions:

# Only process rows matching this query (if not None); this is processed
# after applying os.normpath to filenames.
query = None

# If not none, replace the query token with this
replacement = None

# If not none, prepend matching filenames with this
prepend = None

removeClassLabel = False
nRows = None
temporaryMatchColumn = '_bMatch'


#%% Helper functions

def process_row(row,options):

if options.removeClassLabel:

detections = row['detections']
for iDetection,detection in enumerate(detections):
detections[iDetection] = detection[0:5]

# If there's no query, we're just pre-pending
if options.query is None:

row[options.temporaryMatchColumn] = True
if options.prepend is not None:
row['image_path'] = options.prepend + row['image_path']

else:

fn = row['image_path']
if options.query in os.path.normpath(fn):

row[options.temporaryMatchColumn] = True

if options.prepend is not None:
row['image_path'] = options.prepend + row['image_path']

if options.replacement is not None:
fn = fn.replace(options.query,options.replacement)
row['image_path'] = fn

return row


#%% Main function

def prepare_api_output_for_timelapse(inputFilename,outputFilename,options):

if options is None:
options = TimelapsePrepOptions()

if options.query is not None:
options.query = os.path.normpath(options.query)

detectionResults = load_api_results(inputFilename,nrows=options.nRows)
nRowsLoaded = len(detectionResults)

# Create a temporary column we'll use to mark the rows we want to keep
detectionResults[options.temporaryMatchColumn] = False

# This is the main loop over rows
tqdm.pandas()
detectionResults = detectionResults.progress_apply(lambda x: process_row(x,options), axis=1)

print('Finished main loop, post-processing output')

# Trim to matching rows
detectionResults = detectionResults.loc[detectionResults[options.temporaryMatchColumn]]
print('Trimmed to {} matching rows (from {})'.format(len(detectionResults),nRowsLoaded))

detectionResults = detectionResults.drop(columns=options.temporaryMatchColumn)

# Timelapse legacy issue; we used to call this column 'predicted_boxes'
detectionResults.rename(columns={'detections':'predicted_boxes'},inplace=True)
detectionResults['image_path'] = detectionResults['image_path'].str.replace('\\','/')

# Write output
# write_api_results(detectionResults,outputFilename)
detectionResults.to_csv(outputFilename,index=False,quoting=csv.QUOTE_MINIMAL)

return detectionResults


#%% Interactive driver

if False:

#%%

inputFilename = r"D:\temp\demo_images\snapshot_serengeti\detections.csv"
outputFilename = mpt.insert_before_extension(inputFilename,'for_timelapse')

options = TimelapsePrepOptions()
options.prepend = ''
options.replacement = 'snapshot_serengeti'
options.query = r'd:\temp\demo_images\snapshot_serengeti'
options.nRows = None
options.removeClassLabel = True

detectionResults = prepare_api_output_for_timelapse(inputFilename,outputFilename,options)
print('Done, found {} matches'.format(len(detectionResults)))


#%% Command-line driver (** outdated **)

import argparse
import inspect

# Copy all fields from a Namespace (i.e., the output from parse_args) to an object.
#
# Skips fields starting with _. Does not check existence in the target object.
def argsToObject(args, obj):

for n, v in inspect.getmembers(args):
if not n.startswith('_'):
# print('Setting {} to {}'.format(n,v))
setattr(obj, n, v);

def main():

parser = argparse.ArgumentParser()
parser.add_argument('inputFile')
parser.add_argument('outputFile')
parser.add_argument('--query', action='store', type=str, default=None)
parser.add_argument('--prepend', action='store', type=str, default=None)
parser.add_argument('--replacement', action='store', type=str, default=None)
args = parser.parse_args()

# Convert to an options object
options = TimelapsePrepOptions()
argsToObject(args,options)

prepare_api_output_for_timelapse(args.inputFile,args.outputFile,args.query,options)

if __name__ == '__main__':

main()
77 changes: 77 additions & 0 deletions api/integration/timelapse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Overview

[Timelapse](http://saul.cpsc.ucalgary.ca/timelapse/) is an open-source tool for annotating camera trap images. We have worked with the Timelapse developer to integrate the output of our API into Timelapse, so a user can:

- Select or sort images based on whether they contain people or animals
- View bounding boxes during image annotation (which can speed up review)

This page contains instructions about how to load our API output into Timelapse. It assumes familiarity with Timelapse, most importantly with the concept of Timlapse templates.


# Download the ML-enabled version of Timelapse

This feature is not in the stable release of Timelapse yet; you can download from (obfuscated URL) or, if you’re feeling ambitious, you can build from source on the [machinelearning-experimental](https://github.com/saulgreenberg/Timelapse/tree/machinelearning-experimental) branch of the Timelapse repo.


# Prepare your Timelapse template

Using the Timelapse template editor, add two fields to your template (which presumably already contains lots of other things specific to your project):

- <i>Confidence</i> (of type &ldquo;note&rdquo;, i.e., string)
- <i>BoundingBoxes</i> (of type &ldquo;note&rdquo;, i.e., string)

<img src="images/tl_template.jpg">

These fields will be used internally by Timelapse to store the results you load from our API.

A sample template containing these fields is available [here](MLDebugTemplate.tdb).


# Create your Timelapse database

...exactly the way you would for any other Timelapse project. Specifically, put your .tdb file in the root directory of your project, and load it with file &rarr; load template, then let it load all the images (can take a couple hours if you have millions of images). This should create your database (.ddb file).


# Prepare API output for Timelapse

This is a temporary step, used only while we're reconciling the output format expected by Timelapse with the output format currently produced by our API.

Use the script [prepare_api_output_for_timelapse.py](prepare_api_output_for_timelapse.py). Because this is temporary, I&rsquo;m not going to document it here, but the script is reasonably well-commented.


# Load ML results into Timelapse

Click recognition &rarr; import recognition data, and point it to the Timelapse-ready .csv file. It doesn&rsquo;t matter where this file is, though it&rsquo; probably cleanest to put it in the same directory as your template/database.

This step can also take a few hours if you have lots of images.


# Do useful stuff with your ML results!

Now that you&rsquo;ve loaded ML results, there are two major differences in your Timelapse workflow... first, and most obvious, there are bounding boxes around animals:

<img src="images/tl_boxes.jpg">

<br/>This is fun; we love both animals and bounding boxes. But far more important is the fact that you can select images based on whether they contain animals. We recommend the following workflow:

## Confidence level selection

Find the confidence threshold that you&rsquo;re comfortable using to discard images, by choosing select &rarr; custom selection &rarr; confidence < [some number]. 0.6 is a decent starting point. Note that you need to type 0.6, rather than .6, i.e. <i>numbers other than 1.0 need to include a leading zero</i>.

<img src="images/tl_confidence.jpg">

<br/>Now you should only be seeing images with no animals... if you see animals, something is amiss. You can use the &ldquo;play forward quickly&rdquo; button to very rapidly assess whether there are animals hiding here. If you&rsquo;re feeling comfortable...

## Labeling

Change the selection to confidence >= [your threshold]. Now you should be seeing mostly images with animals, though you probably set that threshold low enough that you&rsquo;re still seeing <i>some</i> empty images. At this point, go about your normal Timelapse business, without wasting all that time on empty images!


# In the works...

Right now animals and people are treated as one entity; we hope to allow selection separately based on animals, people, or both.





Loading

0 comments on commit 7e5d585

Please sign in to comment.