A full-text search for YouTube subtitles and video metadata with a command line interface.
- video title, description, keywords and subtitles (also called closed captions/CC or transcript)
- in the the scope of one or multiple videos, a playlist, channel or user
- supporting multiple terms and multi-word phrases (combining them via boolean OR; i.e. logical either/or)
- matching phrases spanning multiple captions
- ignoring the case of the search terms
- a list of search results with highlighted matches
- including time-stamped video links to the matched part of the video
- as a text or HTML file if you need it
- searchable video metadata and subtitles in all available languages
- videos in playlists, channels or user accounts for a configurable time
- in your local user profile, i.e.
- %AppData%\Roaming on Windows
- ~/.config on Linux and macOS
- until you explicitly clear it
- so that subsequent searches on the same scope can be done offline and are way faster than the first one
- no installation except for .NET Core 3.1
- no YouTube login
- few resources, as this project was partly an excercise in the newer async features of .NET Core 3.0 and C# 8 for concurrent, non-blocking operations
- YoutubeExplode licensed under LGPL 3 for doing a better job at getting the relevant data off of YouTube's public web API than YouTube's own Data API v3 is able to do at the time of writing. And for not requiring a clunky app registration and user authorization for every bit of data on top of that. A real game-changer!
- including AngleSharp licensed under MIT
- CommandLineParser licensed under MIT for elegantly parsing and validating command line arguments and generating help text
- subtitle download in any common, reusable format (although that could probably be added quite easily)
- fuzzy search. Only exact matches are returned.
All search commands share the following parameters:
-f, --for | Required. What to search for. Quote "multi-word phrases" and "separate,multiple terms,by comma". |
-m, --html | If set, outputs the highlighted search result in an HTML file including hyperlinks for easy navigation. |
-o, --out | Writes the search results to a file, the format of which - depending on the 'html' flag - is either text or HTML including hyperlinks for easy navigation. Supply EITHER the FULL FILE PATH (any existing file will be overwritten), a FOLDER PATH to output files into - auto-named according to your search parameters - OR OMIT while setting the 'html' flag to have auto-named files written to the 'out' folder of SubTubular's AppData directory. |
All search commands searching a playlist containing multiple videos (including search-user
and search-channel
) support the following parameters in addition to the common search parameters:
-t, --top | (Default: 50) The number of videos to return from the top of the playlist. The special Uploads playlist of a channel or user are sorted latest uploaded first, but custom playlists may be sorted differently. |
-h, --cachehours | (Default: 24) The maximum age of a playlist cache in hours before it is considered stale and the videos in it are refreshed. |
Searches the {videos} {for} the specified terms. Supports the common search parameters.
value(s) at pos. 0 | Required. The space-separated YouTube video IDs and/or URLs. |
Searches the {top} n videos from the {playlist} {for} the specified terms. Supports the common playlist search parameters.
value at pos. 0 | Required. The playlist ID or URL. |
Searches the {top} n videos from the Uploads playlist of the {channel} {for} the specified terms. Supports the common playlist search parameters.
value at pos. 0 | Required. The channel ID or URL. |
Searches the {top} n videos from the Uploads playlist of the {user}'s channel {for} the specified terms. Supports the common playlist search parameters.
value at pos. 0 | Required. The user name or URL. |
Clears cached user, channel, playlist and video info.
Do not use this software with the intent of infringing on any creator's freedom of speech or any viewer's freedom of choice.
Specifically, you may not use this software or its output to target content for flagging, banning or demonitizing.
Those to whom this limitation applies, should feel encouraged to explore the origins of their right to censor third party conversation and come back another day with better intentions <3
Scott Adams mentioned this psychological phenomenon named after a physicist one of these days. Or did he say physician? What was its name again?
> SubTubular search-videos https://www.youtube.com/watch?v=egeCYaIe21Y https://www.youtube.com/watch?v=gDrFdxWNk8c --for physician,physicist
or short
> SubTubular search-videos egeCYaIe21Y gDrFdxWNk8c -f physician,physicist
gives you
15/08/2020 15:34 https://youtu.be/egeCYaIe21Y English (auto-generated) 17:31 gail mann was the name of a physicist https://youtu.be/egeCYaIe21Y?t=1051
(turns out, it was the Gell-Mann Amnesia effect)
I might have gazed into the abyss for a little too long and now I need a deep breath, some unclenching and a refresher on the importance of free speech. I know StyxHexenhammer has a lot to say on the matter - if I can dig it out of the gardening content and occult literature.
> SubTubular search-channel https://www.youtube.com/channel/UC0rZoXAD5lxgBHMsjrGwWWQ --for "free speech,censorship,cancel culture,cancelculture,freespeech" --top 500
or short
> SubTubular search-channel UC0rZoXAD5lxgBHMsjrGwWWQ -f "free speech,censorship,cancel culture,cancelculture,freespeech" -t 500
Note that title, description and keywords are matched as well as subtitles.
08/10/2020 07:58 https://youtu.be/xoZOMpoeots in description: #Qanon #Censorship in keywords: censorship, tech censorship, #censorship English (auto-generated) 03:58 in extreme free speech which means https://youtu.be/xoZOMpoeots?t=238 04:00 free speech i'm an extremist when it https://youtu.be/xoZOMpoeots?t=240 06/10/2020 08:42 https://youtu.be/8TysuANlPic in title: Cancel Culture Comes for the CEO of the Babylon Bee in keywords: cancel culture, #cancelculture English (auto-generated) 01:07 why is it that cancel culture would come https://youtu.be/8TysuANlPic?t=67 06:31 and cancel culture is something that's https://youtu.be/8TysuANlPic?t=391 06:50 cancel culture because it reminds them https://youtu.be/8TysuANlPic?t=410 08:35 with censorship whether government https://youtu.be/8TysuANlPic?t=515 08:57 cancel culture it's something that gets https://youtu.be/8TysuANlPic?t=537
I have here a pile of rocks that needs grinding. Also, the Middle East could do with some peace. Let's make a supercut of Jörg Sprave's laugh. And while we're at it, let me show you its features:
> SubTubular search-user https://www.youtube.com/user/JoergSprave --for "haha,let me show you its features" --top 100 --cachehours 0 #disable cache to make sure I get the freshest laughs
or short
> SubTubular search-user JoergSprave -f "haha,let me show you its features" -t 100 -h 0
thankfully at any given time will yield something like
18/07/2020 16:52 https://youtu.be/WOFNUPH2hUY English (auto-generated) 01:50 cutter like a mini pizza cutter hahaha I https://youtu.be/WOFNUPH2hUY?t=110 24:02 hahahaha so it may be a lot of things https://youtu.be/WOFNUPH2hUY?t=1442 13/07/2020 16:40 https://youtu.be/52miCqsi7lo English (auto-generated) 37:38 upper band haha https://youtu.be/52miCqsi7lo?t=2258 11/07/2020 12:18 https://youtu.be/nyze8uJovdo English (auto-generated) 00:21 let me show you its features I know I https://youtu.be/nyze8uJovdo?t=21 21/06/2020 21:03 https://youtu.be/BF_OuEba3a4 English (auto-generated) 00:39 boat let me show you its features https://youtu.be/BF_OuEba3a4?t=39 24:31 hahaha victory and now of course coconut https://youtu.be/BF_OuEba3a4?t=1471 28:19 hahaha bye bye well the week is setting https://youtu.be/BF_OuEba3a4?t=1699 39:18 hahaha and it is also clear that Odin https://youtu.be/BF_OuEba3a4?t=2358
If you can't seem to find what you're looking for, here's some things to keep in mind:
- Make sure the videos you search have subtitles. Not all do. Or at least not immediately. Allow for some time before the auto-generated subtitles of newly-uploaded videos are available.
- Keep your multi-word phrases short. Only exact matches are returned - so the longer and more complex your query, the less likely it is to match anything.
- Omit punctuation (dots and commas). As of writing this, the auto-generated subtitles are not structured into sentences.
- Don't overestimate YouTube's speech recognition algorithm (yet). Auto-generated subtitles don't always make sense, semantically speaking. Similar sounding words will be misunderstood, especially for speakers with poor pronunciation, high throughput, an accent or simply due to background noise. A statement about defense could for example easily be misunderstood as being about a fence, because the first syllable is often de-emphasized - something a human mind does not struggle with, reading a lot of meaning out of the context of a statement.
- You'll find that the speech recognition algorithm will replace
- inaudible words with ? and
- swear words with [ __ ] .
Feel free to contribute your own best practices in the issues.