Wednesday, November 14, 2012

Statistics - Who Pays to Gather the Numbers?

Recently, Nate Silver became a darling of political media. He not only predicted who would win the Presidential election, he even predicted each individual state's vote within 2%, and who would win every single Senate race except North Dakota's - 32 for 33.

This has lead to sour grapes for other polls, in particular Gallup, a national poll Nate Silver briefly called out as being the worst poll of the 50 or so he tracks. In Gallup's rebuke of Nate Silver, they make an interesting complaint: They spend the money to gather polling data, and if they wanted to improve their polls, they have to spend more. Yet they are not the darling of the polling industry - nor even is the most accurate of the 50 polls Nate tracked. Instead, Nate is. And what did Nate or his 538 organization spend on polling? $0.

So, Gallup gripes, perhaps Gallup should get out of polling and get into the business Nate is in. It's much cheaper, and might get you a lot more readers - that afterall is where the money's at. Unfortunately Gallup ends with the typical gunshot into the air that businesses losing to the internet fire off: Maybe the government should pay for our business with taxes, because we can't figure out how to make money off of it. Some may recall the music industry proposed an internet tax when they realized they couldn't get $14 for a physical CD anymore. Both ideas were equally stupid.

But this is not the first time an issue like this has come up, where one group assembles original data at great expense, and another, seeing it online, gathers it for free and makes use of it without paying a dime. Google's entire business model is doing just that, for which they make billions a year. In a very clear-cut case, a little company, "Mocality," put together a website and incentives program to essentially build a Yelp for Kenya. Prior to this initiative, essentially 0 Kenyan businesses were online. Now with the work of some strapping Kenyan IT personnel and over $100,000 in payments to Kenyans who contributed to this online database (a LOT of money in Kenya!), they have a significant number of Kenyan businesses in their database.

So, Google, looking to also have Kenya businesses in their database sees that, and scoops it up without paying a dime. This pissed off Mocality. And you can't blame them - what they spent $100k in payments to users and more in website development and fine-tuning, Google is walking away with for free.

Yet if you look at this from any other perspective, the information is online. Too bad, right? If you didn't want people to have access to it, you shouldn't've put it online - right? If we take this to be true - and like it or not, it is true in so far as if you post information of value, someone will most certainly take it for free regardless of your views on things - then what incentive is there to gather this sort of original data? Essentially, what is the business model that allows you to spend the time, effort, and money on gathering data that really deserves to be online, without just looking like an idiot for helping a lower-cost competitor get rolling?

For the time being this situation does appear unjust. But perhaps once the best business model to capitalize on these efforts is clear, the injustice will feel irrelevant. So what might that business model be?

When Google created a search that was substantially better than other searches, one irony was that those search results could easily be stolen. And one (perhaps ironic) response they took to that theft was to add behavior monitoring to their software that watches for what appears to be someone trawling their results to steal them. So one element is to have not just data, but a lot of data - too much for any typical user to ever view all of in one go, so that if someone tries - you can catch them before they get everything as an obvious outlier in usage data. Another is to do a good job of writing that monitoring and blocking software.

One more is to continuously gather that data so that even if someone steals the data in small sips, they're always too far behind you, and out of date, to make you a legitimate competitor.

It appears Gallup's (almost) observation is correct: That you need to both provide original data and analysis to make money. People read or use your product for the promised results you can give them, less than they use it for raw data. In a sense, providing raw data is a much lower value service - one that begs other businesses to come along and analyze, and deliver more promising answers with it. Making a promise to provide specific insights, recommendations or other actionable information can be worth quite a bit of money. Gathering data, if that's all you intend to do, may be a better fit for keeping in a private database you charge for access to, with some kind of downstream "per viewer" fee for what your customers build on top of it. A better economy of ideas in its purest form, but a business model rife for theft, sadly.

Finally, any business looking to do original data collection is going to make mistakes in their business model, so they're best learning how to make news - after all in Mocality's case, all it took was public embarrassment on places like Slashdot and Reddit (2 news sources Google employees read frequently) to resolve their Google problem. That won't prevent other competitors from swooping in on their data - but it may buy them some time to refine their business model so it doesn't crush Mocality when would-be competitors try.