CDDB is the automated system that your computer uses to list track names, artist info, and cover art for CD’s that you pop into your drive. We want to do something similar for videos and subtitles.
We’ve been toying with the following idea: you’re watching a video—you notice that Miro’s “subtitles” button is glowing. This means there are subtitles available in a language that you speak; clicking the button pops the subtitles over your video (holding the button displays all the different languages available and subtitle versions).
In our scenario, the subtitles wouldn’t necessarily be served from a single centralized server, or even from the same location as the videos themselves. Miro (or your preferred video player) would automatically search many different subtitle repositories and find subs for everything from individual YouTube videos to episodes of Democracy Now!
The search for subtitles would be based on a number of criteria: a hash from the video file, the video title, the originating URI of the file, and so forth. We’d prioritize the data and make an educated guess—it wouldn’t be perfect, but we think it could work pretty reliably. Of course, however we do this, it will be in a totally open and decentralized way, not just a centralized service — we want every video player (and even a Firefox extension) to be able to automatically find/display subtitles for things you’re watching.
So back to reality—right this second, Miro has admittedly poor support for basic subtitles. We’re fixing that for Miro 2.1, and once that’s done we can work on the interesting stuff…
I’ve been doing some research on this idea and want to double check with all of you readers, in order to make sure I’m not missing any good distributed subtitle systems or protocols that are currently out there.
The thing I’ve found that seems closest is: http://opensubtitles.org/.
OpenSubtitles Advantages:
- They’ve have a basic API/protocol for doing hash and title based searching for video subtitles
- They already have some users, but I didn’t research too closely how much traction they have overall.
- They’re the closest thing (I can find) that does what we want to do.
Potential OpenSubtitles Disadvantages:
- I didn’t find an open source implementation of the server.
- It is a centralized service, and their server is the core.
- You must register your application (useragent) before their API will work w/ your app.
If I’m wrong on any of the above, please feel free to correct me (for the record, I haven’t reached out to them yet, will definitely be doing so—it’s always easier to work together on stuff like this) . If my assumptions are correct, then they’re are about 50% of the way to where we want to ultimately go (here’s the OpenSubtitles Dev FAQ for anyone who is interested). Does anyone know of someone else closer?
Also of possible interest for this project is URIplay. They’re developing a protocol for retrieving metadata for audiovisual media, based on the URI. The URIplay FAQ puts them squarely in line with our goals for a completely decentralized, openly documented, and open source system.
Feedback and advice are much appreciated, as we’re not trying to reinvent any wheels here. We believe a system like what we’re describing could be revolutionary. If you’ve got any input or would like to be involved in designing such a system, please leave a comment or contact me directly: dean at pculture.org.
Note: We aren’t tackling the issue of creating transcripts/subtitles here. People are already making subtitles, and we can improve that once people have a simple way to find/display available subtitles.