YouTube stats

Thomas Germain at the BBC tells How a computer that ‘drunk dials’ videos is exposing YouTube’s secrets:

[…] For all practical purposes, one of the most powerful communication systems ever created – a tool that provides a third of the world’s population with information and ideas – is operating in the dark.

In part that’s because there’s no easy way to get a random sampling of videos, according to Ethan Zuckerman, director of the Initiative for Digital Public Infrastructure at the University of Massachusetts at Amherst in the US.

The headline is a bit click bait but the content is more insightful. What they call “drunk dials” is more like “war dialing”: trying every numbers one by one. Given the stats of hitting an actual video with a random number, 1 in 1.87 Billion, it seems that the YouTube algorithm used to generate them is really random.

The original problem was to get a sample of videos that is random in order to collect statistics that YouTube no longer publishes. The result is interesting too, and shows that videos that gets a lot of interaction (views / comments) are just a statistical annomaly over the number of videos uploaded.