Miro

Miro Internet TV Blog

HD Detection Algorithm — Help Us Build It!

September 8th, 2008 by Dean Jansen

We’re working on a pretty cool system for detecting HD video, which has presented us with a small math challenge. It’s nothing too crazy, but we thought it’d be fun to see if anyone was up for helping us figure it out.

If you’d like to help with this algorithm, drop a comment here and/or contact me at: dean [at] pculture dot org.

Scanning for HD Video

Jason (our programming intern) built a scanner that will tell us some stuff about each video file, the most important things being: dimensions (WxH), video bitrate (kbps), and codec. The idea is to end up with a single number that gives us an idea of how relatively badass a video will look.

Resolution

The resolution is pretty straightforward, and I assume that adding width plus height is a good way to end up with a single number that can be compared across resolutions.

Codec & Video Bitrate

The codec/bitrate intersection is where things start to get a little crazier; different codecs have video bitrate sweet spots. When you nail it, you’re getting the best picture quality possible — go higher, and you’re wasting bits — go lower, and you’re going to have a degraded image. So the key here is to give more props to a video with a higher bitrate, until it reaches its optimal level, and then you just leave that number stable.

Codec vs Codec

Finally, all codecs are not created equally, so we want to adjust the above modifier against a baseline codec. H.264 gives the best looking picture at the lowest possible bitrate, so we’ll choose it as the base. Then we can create a ratio for other codecs, for example MPEG2 needs double the bitrate of h.264 to achieve the same image quality.

Comparing the Data

It’d be great to have some help mixing these numbers together in a way that gives us a final value, allowing us to compare channels across codecs, resolutions, and bitrates.

Below are some arbitrary numbers I worked out for optimal bitrates and codec-to-codec ratios. Two disclaimers here: I’m neither a math person, nor a codec guru… if you see holes in my thinking or feel like my codec comparisons are off, definitely say so in the comments.

Optimal Bitrates for h.264

1920×1080 = 10000 kbps

1280×720 = 5000 kbps

640×480 = 1800 kbps

400×300 = 600 kbps

Optimal Bitrate Ratio for Other Codecs

optimal vp6, xvid, divx bitrates should be 1.5x higher

optimal theora bitrates should be 1.7x higher

optimal mpeg2 bitrates should be 2x higher

optimal mpeg1 bitrates should be 3.5x higher

optimal mjpeg or mjpg bitrates should be 5x higher

17 Responses to “HD Detection Algorithm — Help Us Build It!”

  1. Tommy says:

    I am also neither a math person nor a codec guru, but I would be willing to give it a shot if you still need help.

  2. Dean Jansen says:

    Hi Tommy,

    I think the comments are probably a good place to start working… then others can join in too. We can take the discussion to email or a list or something, if it gets out of hand here.

  3. Andre says:

    The biggest flaw in this theory is that many people uprez SD footage to HD resolutions/bitrates, which doesn’t look good.

  4. Dean Jansen says:

    It’s true that we can’t detect interlacing and uprezzing; the HD detection won’t be flawless. That said, we’d like to have software help us out as much as possible when it comes to weeding out low resolution/bitrate video.

  5. Adam says:

    You could use a Bayes’ Theorem approach, like used in spam filters. It can be quite useful, telling you the probability something is a HD video given the criteria you have. It’s powerful because you can calculate this probability based on the probability an HD video would have that set of specs, the probability a non-HD video would have that set of specs, and the percent of videos that are HD. A database of already-identified HD and non-HD videos and their specs would not be difficult for you guys to generate, and that’s how you’d get the data you need to plug in.

  6. Dean Jansen says:

    Adam,

    If I understand you correctly, we’d pick a crop of videos and say “yeah, HD” or “no, not HD” and then we would compare new videos to the attributes of the samples we rated?

    That seems like it may be more complex than what I suggest in the post — what are the advantages?

  7. MDoggyDog says:

    I think there are two related but distinct concepts here. The first
    is a question of format, “High-Definition or not”, which is pretty
    well defined these days. The second is a question of quality,
    “badass, or just ass”, which isn’t so well defined.

    I wouldn’t use the term “HD” unless I were talking about format, since
    HD has a commonly understood technical meaning which is now also in
    widespread public/consumer use (e.g. “what I intend to display on my
    badass HD video projector”). So, if you’re going to identify
    videos/channels as High-Definition, I suggest you stick to criteria
    based on frame size and rate, inline with the ATSC standard: HD is
    720p or better, i.e. a vertical resolution of at least 720 pixels and
    at least 24 frames per second. Content with lesser resolution is
    “SD” (Standard Definition) for stuff at least 480i (480 vertical
    pixels, 24 frames per second), and “Not Even SD” for everything else.
    This all jives with common usage/understanding — SD is what we all
    watch on DVD’s. (Some tweaking needs to be done to be fair to
    non-standard aspect ratios, but you get the idea.)

    This leads us to the second question: how to estimate quality? That
    pretty much boils down to bitrate, perhaps with the codec-dependent
    fudge factor thrown in. Frame size and frame rate all end up
    factored into the bitrate, so you don’t have to work those in again.
    It’s all about the amount of information delivered to the eyeball,
    bits per second (and the codecs are just trying to be clever about
    picking and choosing only the bits that the eyeball will actually
    notice).

    I suggest that the reference/baseline should be “broadcast quality”,
    i.e. what someone sees when they turn on their (digital) TV set
    today. Why? Because it is ubiquitous — everyone knows (or will
    soon) what off-the-air DTV looks like. (And, broadcast TV is still
    held to very high technical standards.) If broadcast-quality is the
    baseline, then one can classify miro content as “Broadcast Quality
    HD” or “Broadcast Quality SD” (or the third choice, “Crappy Sixth
    Generation Youtube Video Recorded at Night on a Cell Phone with a Low
    Battery”).

    This implies that the baseline codec should be MPEG-2. I’m curious
    how you came up with your “Optimal Bitrates for h.264″, since, after
    including your 2x fudge-factor between MPEG-2 and H.264, the rates
    roughly jive with those used in ATSC:

    720p, 1080i – ~11 Mb/s
    480i – ~4 Mb/s

    What is the source for the “Optimal Bitrate Ratio for Other Codecs”?

    PS: “When you nail it, you’re getting the best picture quality
    possible – go higher, and you’re wasting bits” — it’s not a
    waste of bits to the guy who wants to do a quality remix of the
    content. If you go higher, you will get better fidelity. Your
    eyeballs might not notice it now, but four generations of editing
    later, they will.

    PPS: As Andre pointed out, all of this theory does of course rely on a
    pristine source.

  8. Adam says:

    Dean,

    Yes, using the Bayesian approach I suggested would require a crop of videos to compare to. If you can’t get or easily generate a database of files’ important attributes and whether they’re HD (e.g. they come from a pre-identified HD channel or non-HD channel), it may not be the best approach. (Though having estimates of the percentages themselves would work too, I doubt you would stumble upon statistics like that on the web.)

    Some advantages would include that it can continue to “learn” by user or moderator feedback. Say, an HD video type with specs you didn’t think of comes around, you don’t need to re-code an algorithm. Instead, you mark a video (or a few videos) with those specs as HD and put that info in the database. Similarly, if something gets mis-identified as HD, then that error can be made less likely by continuing to add stats along with a “high/standard def” label to your database as time goes on. Since you said you want the software to help you in weeding out low-resolution video, it seems that having a system that progressively takes more and more load off of humans is a good fit.

    Another advantage is mentioned a little in the previous paragraph, but you don’t have to rely on textbook definitions of what an HD video has.

    I really think spam filter example demonstrates well how the system works, except that the spammers keep trying to trick Bayesian filters by obfuscating the things that would identify it as spam. In this usage, there seems to be no reason for video encoders to try to fool the system.

    It may or may not be a good fit. I’m getting a degree in stats and this was the first thing that came to mind.

  9. ben says:

    No algorithm is going to be perfect. In addition to the issues above, you also have the issue of how hard the movie is to compress in the first place. I won’t get into why my friend choose to screen a film that consisted of more than 1 hour of a single frame. However, I would think 1kbps would be produce fine quality there with any codec. Obviously that’s an extreme example, but the same thing happens on a smaller scale with more normal content. There’s also the issue of how good the encoder was, different encoders can use the same bitrate and coded to produce different quality.

    So I say, just pick some numbers and try to tweak them to work well, and don’t worry about things like “optimal”.

  10. Svein says:

    A system like this should make a clear difference between technical delivery format and quality.

    Counting pixels, bitrate, and codec is not too difficult. This can mostly be read out of the files metadata and if you generate some kind of algorithm where you give each of these criterias a weight, you can have a number telling how nice this video COULD HAVE BEEN.

    Yes, because this number will not be able to say anything about the quality of the video. That all depends on where the video comes from. Was this originally shot with an HD camera, or is it a 50kb/s streaming video that has been uprezzed?

    With this in mind, the first mentioned algorithm is totally meaningless. People will soon start fixing their bad videos just to get higher automatic ratings or better exposure in the system.

    The only way to get around this and give the numbers meaning is to employ a system that can actually say something about the quality of the video. And it is possible. A system like this has as far as I know been developed in Norway. I will get back with more details as soon as I have it. I think it has been developed to monitor the quality of broadcast video.

  11. Francisco says:

    I suggest multiply the width and the height, then divided by the bitrate (bitrate per pixel). In your example of H.264 this number vary from 170 to 200.

  12. mike says:

    for many users the important thing will be bitrate, because when choosing video its important to know how long the file will take to download, and how much space it will consume

  13. Caliga says:

    I also think that you must keep format and quality separate.
    By the way, there is a reason, why the size of screens is always given as a diameter: The diameter directly relates to the area, regardless of the aspect ratio. (c² = a²+c²)
    So, instead of adding width and height, you would rather get the diameter.
    However, if the resulting value should be presented to users, it may be better to use the area, and that in megapixels.
    Following MDoggyDog, id suggest the following classification:
    diam area
    HD: 1468px 0.9MP ( 720*1280=921,600px – sqrt(720*720+1280*1280)=1468 )
    SD: 800px 0.3MP ( 640*480=307200 – sqrt(640*640+480*480)=1468 )
    LD: the rest…
    Personally I would not include the frames, although it may be in the “official” definitions.
    First, you’ll probably not get a lot of HD stuff, that saves on frames, and second, if you get that stuff you might not notice. Except it’s an action movie… And for that, even 24fps is hardly enough.

    Now, quality…
    For the first step I’d go the pragmatic way:
    If somebody creates a video in HD resolution, he will rarely intentionally squeeze it in low-bitrate files.
    The standard shaky-badlight-badcontrast-manyartifacts-youtube video will be LD anyway.

    For anything more I’d say you need either lots of calculating time or lots of superbrains, maybe both.
    Especially to check several videos in a batch may be time consuming.
    However, here are three ideas to perform the actual check:
    1. artifact search:
    search for 8x8px-blocks that are single-colored and surrounded by a rather different color. (the more the difference, the more the artifactiness :)
    I assume that the blocks will be aligned to 8px-values, like in jpeg. maybe motion-compensation moves them?
    2. edge detection:
    both bad compression and up-scaling remove hard edges.
    3. entropy testing
    as entropy is the opposite of compressability, there is not a whole lot of entropy in over-compressed videos.
    again, the 8×8 pixel-blocks need to be checked. this time we check how many different colors are in one block.
    Thinking about it, this is the same as artifact search… The worst artifact is a block with an entropy of zero.
    I remember an article that showed a picture of patterns (wavelets?) that are composed to more complex patterns.
    The most basic pattern is empty. following are patterns with one vertical or horizontal gradient. (then more complex waves)
    If you look at youtube videos, you may notice exactly those types of artifacts. (single-colored or 2-colored gradient)
    If I remember correctly, usually those patterns are combined to more complex ones. But at low bitrates you only get one basic pattern.
    so, maybe you would have to look for those patterns.
    You can also use the concept of frequency. (which is just another way to look at entropy) low frequency means not a lot of change. High frequencies are cut of by compression.

    The algorithms for those things are no secrets. (frequency, entropy, edge detections, artifact detection)
    Knowledge about those things is not only used for video codecs but also for image editing, like in gimp.
    So, a look at the gimp toolkit or imagemagick source codes may be enlightening. (Or hooking up with the devs there)

    No matter which algorithm is used, it only can detect missing details if there *were* details.
    So you can not just grab the first frame in the video, as it may be just black or blue with white font on it.
    also, keyframes are not good candidates to look for bad quality I guess.

  14. bab says:

    Hi,
    Nice challenge.
    I would also consider the entropy calculation as a good investigation path, to check the inner quality of a video.
    I have, by the way, two other directions to propose:
    Frequency transformation (fourier) and check about the spectrum of the image itself (should be able to detect some articfacts and upgraded images). The more spectrum used, the better the image should be.
    Upscaling detection: after downscaling the image, calculate a new one by different upscaling methods and chek how distant it is from the original image.

    Cheers

  15. Pentek says:

    You say “The idea is to end up with a single number that gives us an idea of how relatively badass a video will look.”

    Now, I think the art of coming up with a single number based on such values is called “fuzzy logic”, and it has quite some literature. You might want to take a look at that.

  16. Joey says:

    Maybe I’m over simplifying this, (I probably am, seeing as I know just enough about resolution and codecs to royally screw this up.) but if HD resolution is a question of data over time, then couldn’t there just be a bps level, like there is in music, and once it passes a certain ratio in a certain codec, it could be declared HD content?

  17. Anthony says:

    I think the comment from Svein is barking up the right tree. His suggestion is the most complicated but if you’re going to do something, why not do it right?

    You can come up with a score based just on the numbers but at the end of the day it doesn’t mean a whole lot unfortunately. Somebody who encodes videos with an “overkill” bitrate setting would get a higher score by consuming more data, which means longer downloads and more file storage, but doesn’t mean better video. (i.e., I can take a 320×240 video with a web cam in horrid lighting, then resize it to 1080p and encode it at 10mbps bitrate… do I now magically have bluray quality HD video on my hands? Of course not!)

    Likewise, motion and solid areas of color will present greatly varying video quality even at the same resolution and bitrate. For example: low motion (say, a slide show), will look great because the contents of the screen do not change often, compared to a high paced fast motion clip… or a cartoon for example will look better than “real life” video at the same bitrate because of larger areas of solid color. Codecs look for these things and handle slower/solid areas much more efficiently by design.

    I think the first step is something that actually quantifies the “look” of the video… That’s the hard part… but I guess it can stop there because that’s the goal here. But to take it one step further, you could combine it with the actual bitrate numbers and you can also come up with an “efficiency” score. Say that first step ranks a video from 1 to 10. Work that score into a math formula with the bitrate. If somebody pulls off a “10″ at 1000kbps and somebody else pulls off a “10″ at only 400kpbs, the latter is more efficient – the end user gets top quality video on a quicker download. Ignore codecs all together, because some codecs that may be lower-tech on paper might actually pull off better subjective quality depending on the content of the video. The final efficiency score would be really useful to a viewer to know they’re getting high quality video and making best use of their bandwidth and disk space to get it!

Looking for something?