This is a script for generating dense-captioned videos by merging frame-level dense captions produced by Densecap by Justin Johnson, Andrej Karpathy, and Li Fei-Fei at Stanford Vision Lab. More info about densecap here.
This script was used to produce the following densecap videos:
Merging captions across frame sequences is a bit hacky. It would be better if the RNN captioned frame sequences itself but no such code (as of now) exists. This approach is a bit hacky and very inefficient but does a decent job.
I will update this readme with instructions for using this code shortly. It's a bit messy right now but is self-explanatory. I will replace it either with a command line tool or an iPython notebook and give instructions in the next few days.