It’s not just file size either. Video basically has several different things going on, where improving on one aspect tends to require compromise on the others:
Resolution
Frame rate
Quality
Bit rate (file size)
Encoding complexity
Decoding complexity (which affects battery life of mobile devices viewing the content)
Robustness for dropped or corrupted data
Over time, the standards improve, but generally benefit from specialized hardware for decoding (thus making decoding complexity a bit more complicated when serving a lot of people with different hardware).
Netflix, for example, serves a small number of very large files to many, many people on demand. That means they benefit from high encoding complexity, even if it shaves off a tiny bit of file size, because spending a few extra hours on encoding a movie that’s 10mb smaller is worth it if 10 million people watch that movie, as that’s 100 terabytes of traffic saved.
But YouTube/Facebook and the others with a lot of user-submitted video, they’re ingesting hundreds of hours of content every minute, chopping it up into like 5 different resolutions/quality levels.
Then YouTube has a shitload of processes for determining which video gets which treatment. A random upload of a kid’s birthday party might get a few hundred views at most, so YouTube cares less about file size and more about saving that computational complexity up front. But if a video hits 1000 views in a few minutes, that means it’s on the cusp of going viral, and it might be worth re-encoding with the high cost encodings that save space/bandwidth.
If a service doesn’t scale, it won’t be necessary to have that kind of complexity in the service. But those videos will load a bit slower, use a little more battery and bandwidth to watch, be more prone to skipping/distortion, etc.
Video is hard. User submitted video is harder. Especially at scale.
It’s not just file size either. Video basically has several different things going on, where improving on one aspect tends to require compromise on the others:
Over time, the standards improve, but generally benefit from specialized hardware for decoding (thus making decoding complexity a bit more complicated when serving a lot of people with different hardware).
Netflix, for example, serves a small number of very large files to many, many people on demand. That means they benefit from high encoding complexity, even if it shaves off a tiny bit of file size, because spending a few extra hours on encoding a movie that’s 10mb smaller is worth it if 10 million people watch that movie, as that’s 100 terabytes of traffic saved.
But YouTube/Facebook and the others with a lot of user-submitted video, they’re ingesting hundreds of hours of content every minute, chopping it up into like 5 different resolutions/quality levels.
Then YouTube has a shitload of processes for determining which video gets which treatment. A random upload of a kid’s birthday party might get a few hundred views at most, so YouTube cares less about file size and more about saving that computational complexity up front. But if a video hits 1000 views in a few minutes, that means it’s on the cusp of going viral, and it might be worth re-encoding with the high cost encodings that save space/bandwidth.
If a service doesn’t scale, it won’t be necessary to have that kind of complexity in the service. But those videos will load a bit slower, use a little more battery and bandwidth to watch, be more prone to skipping/distortion, etc.
Video is hard. User submitted video is harder. Especially at scale.
Great analysis