Miro

Miro Internet TV Blog

Doing Open Subtitles Like an Open CDDB

April 15th, 2009 by Dean Jansen

cddb_logoCDDB is the automated system that your computer uses to list track names, artist info, and cover art for CD’s that you pop into your drive. We want to do something similar for videos and subtitles.

We’ve been toying with the following idea: you’re watching a video—you notice that Miro’s “subtitles” button is glowing. This means there are subtitles available in a language that you speak; clicking the button pops the subtitles over your video (holding the button displays all the different languages available and subtitle versions).

In our scenario, the subtitles wouldn’t necessarily be served from a single centralized server, or even from the same location as the videos themselves. Miro (or your preferred video player) would automatically search many different subtitle repositories and find subs for everything from individual YouTube videos to episodes of Democracy Now!

The search for subtitles would be based on a number of criteria: a hash from the video file, the video title, the originating URI of the file, and so forth. We’d prioritize the data and make an educated guess—it wouldn’t be perfect, but we think it could work pretty reliably. Of course, however we do this, it will be in a totally open and decentralized way, not just a centralized service — we want every video player (and even a Firefox extension) to be able to automatically find/display subtitles for things you’re watching.

So back to reality—right this second, Miro has admittedly poor support for basic subtitles. We’re fixing that for Miro 2.1, and once that’s done we can work on the interesting stuff…

I’ve been doing some research on this idea and want to double check with all of you readers, in order to make sure I’m not missing any good distributed subtitle systems or protocols that are currently out there.

The thing I’ve found that seems closest is: http://opensubtitles.org/.

OpenSubtitles Advantages:

  • They’ve have a basic API/protocol for doing hash and title based searching for video subtitles
  • They already have some users, but I didn’t research too closely how much traction they have overall.
  • They’re the closest thing (I can find) that does what we want to do.

Potential OpenSubtitles Disadvantages:

  • I didn’t find an open source implementation of the server.
  • It is a centralized service, and their server is the core.
  • You must register your application (useragent) before their API will work w/ your app.

If I’m wrong on any of the above, please feel free to correct me (for the record, I haven’t reached out to them yet, will definitely be doing so—it’s always easier to work together on stuff like this) . If my assumptions are correct, then they’re are about 50% of the way to where we want to ultimately go (here’s the OpenSubtitles Dev FAQ for anyone who is interested). Does anyone know of someone else closer?

Also of possible interest for this project is URIplay. They’re developing a protocol for retrieving metadata for audiovisual media, based on the URI. The URIplay FAQ puts them squarely in line with our goals for a completely decentralized, openly documented, and open source system.

Feedback and advice are much appreciated, as we’re not trying to reinvent any wheels here. We believe a system like what we’re describing could be revolutionary. If you’ve got any input or would like to be involved in designing such a system, please leave a comment or contact me directly: dean at pculture.org.

Note: We aren’t tackling the issue of creating transcripts/subtitles here. People are already making subtitles, and we can improve that once people have a simple way to find/display available subtitles.

16 Responses to “Doing Open Subtitles Like an Open CDDB”

  1. Lachlan says:

    The Transmission network, http://transmission.cc , has a subtitling working group, this is their research page:

    http://wiki.transmission.cc/index.php/Subtitles_r…

  2. Johan says:

    In practice you don't have to register a client for the service to work, but that's what they want you to do. Speak to the admins, admin@opensubtitles.org, or make a forum post at OpenSubtitles.org. The admins are really nice and I think they would be really, really excited about this kind of thing. So I bet they would be pretty flexible when it comes to making adjustments etc.

    PS. I'm the developer of Undertext, a client for downloading subtitles from OpenSubtitles.org

  3. Johan says:

    A quick tip: The XMLRPC Debugger, http://gggeek.raprap.it/debugger/

    Might come in handy if you want to try the OpenSubtitles.org service.

  4. DeanJansen says:

    @Lachlan, thanks — will check that out.

    @Johan, good to know that the OS.org admins are nice — I'll definitely be getting in touch with them.

  5. [...] Doing Open Subtitles Like an Open CDDB « Miro – Internet TV Blog [...]

  6. harc says:

    yeah, the guy(s) behind opensubtitles seem to be thinking “in the same direction” and based on my experience with the originator of that site very open.

    outfitting miro with their subtitle database would be a major step forward, that could only be matched by finally making the space bar stop or start playback….

  7. [...] equipa do Miro quer fazer algo semelhante ao CDDB e FreeDB, mas para legendas de vídeos. Num post no blog do projecto, Dean Jansen descreve a ideia da equipa da seguinte forma: está a ver um [...]

  8. mickfuzz says:

    We had a really good discussion on this just over a year ago in Amsterdam at the Transmission gathering there.

    There is no really clear summary of what we worked out would be a good decentralised way of doing it.

    Here are some notes
    http://wiki.transmission.cc/index.php/Metadata_wo…

    Ideally you would have a subtitles included in video feeds which could be read by Miro. This gets away from the centralisation and extent doesn't it?

    Have a look at our atom feed that would do that. See page 8.
    http://wiki.transmission.cc/upload/b/b4/Tx_metada…

    But if they were not present in a feed then somekind of media hash could be made for the file and cross referrenced on a database of files. Which looks like what open subtitles.org is doing. Would they be into the idea of decentralising the process?

  9. Rar Player says:

    There is my FireFox addiation for opensubtitles:
    https://addons.mozilla.org/pl/firefox/addon/6982

    Another services is:
    http://www.napiprojekt.pl – only polish and english subs. (no very open, my addition only display links to movie descriptions using is hash)
    napisy.info – polish (and sometimes another language).
    gom player have options register subtitles but I don’t use it and I think that it don’t work.

    I write Opensubtitles Ffdshow
    http://ds6.ovh.org/ffdshow.html
    if Miro use windows codec only install it and You have opensubtitles support.
    But better install dziobas rar player this is better solution: http://ds6.ovh.org
    Dziobas Rar Player is the best ;-)

  10. Rar Player says:

    Another services is:
    http://www.napiprojekt.pl – only polish and english subs. (no very open, my addition only display links to movie descriptions using is hash)
    napisy.info – polish (and sometimes another language).
    gom player have options register subtitles but I don’t use it and I think that it don’t work.

    I write Opensubtitles Ffdshow
    http://ds6.ovh.org/ffdshow.html
    if Miro use windows codec only install it and You have opensubtitles support.

  11. Great stuff. Nice to read some well written posts. A long way between them.

  12. asf asdfl asdfl asfl asdlf; aslf adlsf als;f la;skf l;sadf l;asdf l;sad

  13. Diego Gomez says:

    but, miro can use ffdshow to solve the subtitles problem?

  14. whatwouldmattdo says:

    Media player classic does this (auto sub downloading) based on hash an file size i think.. i think it's open source too

  15. Kris says:

    Sounds like an awesome idea. I am already using subdownloader to download subtitles automatically using opensubtitles.org but a decentralized solution directly in the player is way better!

Looking for something?