As Ranker’s Chief Data Scientist, I’ve been doing a lot of thinking of late about how much a given opinion dataset is worth. I’m not talking about the value of a specific dataset to answer a specific question, as that varies so wildly depending on the question, but rather I’d like to consider broad datasets/categories of data that promise to satisfy the world’s thirst for opinion answers. The existence of sites like Quora and Yahoo Answers, as well as the move by many search engines to move from providing links to pages to directly answering questions, highlights the need for such data, as does the growing demand for opinion queries. The future of services like Siri, Cortana, and Google Now is one where one’s questions about what to buy for one’s wife, where to eat, and what to watch on TV are answered directly and to do that well, one needs the data to answer those question. Are the world’s data collection methodologies up to the task?
One reason I ask this is that there seems to be a misconception that large amounts of data can answer anything. I’m a huge believer in reform in academia, but one thing my traditional academic peer-review oriented training has given me is an appreciation for that not being true. Knowing which universities have more follows, likes, or mentions isn’t going to tell you which one has the best reputation. Still, there certainly are advantages to scale, as well as depth. The math behind both psychometrics and crowdsourcing tells me that no one dataset is likely to have the ultimate answers as all data has error and aggregating across that error, which is Nate Silver’s claim to fame, almost always produces the best answer. So as I consider the below datasets, the true answer as to which you should use is “all of the above”. That being said, I think it is helpful (at least for organizing my thinking) to consider the specific dimensions that each dataset does best.
Below I consider prominent datasets along four dimensions: sampling, scale, breadth, and measurement. Sampling refers to how diverse and representative a set of users is used to answer a question. Note that this isn’t essential in all cases (drug research that has life/death implications is almost always done on samples that are extremely limited/biased), and perfect sampling is almost impossible these days such that even the best political polls rely on mathematical models to “fix” the results. Such modeling requires Scale, which is important in that it helps one find non-linear patterns in data and prevents spurious conclusions from being reached. Related to that is Breadth as large datasets also tend to answer larger amounts of questions. Anyone can spend the money on the definitive study of a single question at great expense, but that won’t help us for the many niche questions that exist (e.g. what is the best Barry Manilow song that I can play to woo my date? What new TV show would my daughter like, given that she loves Elmo?). Measurement might be the most important dimension of them all, as one can’t answer questions that one doesn’t ask well.
How do the most prominent datasets in the world fare along these dimensions?
Twitter – Sampling: C, Scale: B+, Breadth: A, Measurement: C
Twitter is great for breadth, which can be thought of not only in terms of the things talked about, which are infinite on Twitter, but also in terms of the range of emotions (e.g. the difference between awesome and cool can potentially be parsed). There is also a lot of scale. Unfortunately, Twitter users are hardly representative and people who tweet represent a specific group. Measurement is very hard on Twitter as well, as there is very little context to a tweet, so one can’t tell if something is really popular or just highly promoted. As well, natural language will always have ambiguity, especially in 140 characters (e.g. consider how many interpretations there are for a sentence like “we saw her duck”).
Facebook – Sampling: B, Scale A, Breadth, B, Measurement: D
Facebook is ubiquitous and reaches a far more diverse audience than Twitter. People provide data on all sorts of things about all sorts of topics too. I bought their stock because I think their data is absolutely great and still do. Still, the ambiguity of a “like” (combined with the haphazard and ambiguous nature of relying on status updates) will mean that there will always be questions (e.g. how hated is something? what do I think of a companies individual products? is this movie overrated?) that can’t be answered with Facebook.
Behavioral Targeting – Sampling: B-, Scale: A, Breadth C, Measurement D
Companies like Doubleclick (owned by Google) and Gravity track your web behavior and attempt to interpret information about you based on what you do online. They can therefore infer relationships between almost anything on the web (e.g. Mad Men interests) based on web pages having common visitors. Yet, the use of vague terms like “interest” highlight the fact that these relationships are highly suspect. Anyone who has looked up what these companies think they know about them can clearly see that the error rates are fairly high, which makes sense when you consider the diverse reasons we all have for visiting any website. This type of data has proven utility for improving ad response across large groups, where the laws of large numbers means that some benefit will occur in using this data. But I wouldn’t want to rely on it to truly understand public opinion.
Market Research – Sampling B, Scale, D, Breadth, D, Measurement A
Market research companies like Nielsen and GFK spend a lot of money to ask the right questions to the right people. Measurement is clearly a strength as market research companies can provide context and nuance to responses as needed, asking about specific opinions about specific items in the context of other similar items. Yet, given that only ~10% of people will answer surveys when called, even the best sampling that money can buy will be imperfect. Moreover, these methods do not scale, given the cost, and can only cover questions that clients will pay for, such that there is no way that such methods can power the diverse queries that will go to tomorrow’s answer engines.
Ranker - Sampling B-, Scale B-, Breadth B+, Measurement A
I work at Ranker largely because I believe in the power of our platform to uniquely answer questions, even if we don’t have the scale of larger sites like Twitter and Facebook…yet (we are growing and are among the top 200 websites now, per Quantcast). Our sample is imperfect, as are all samples, including pollsters like Gallup, but our sample is generally representative of the internet given that we get lots of traffic from search, so we can model our audience in the same ways that companies like YouGov and Google Consumer Surveys do. The strength of our platform is in our ability to answer a broad number of specific questions explicitly and with the context of alternative choices using the list format. Users can specifically say whether they think (or disagree) that Breaking Bad is great for binge watching, that Kanye West is a douchebag, that being smart is an important life goal, or that intelligence is a turn-on, while also considering other options that they may not have considered.
In summary, no dataset gets “A”s across the board and if I were running a company like Proctor and Gamble and needed to understand public opinion, I would use all of these methods and triangulate amongst them, as there is something to be uniquely learned from each. That being said, I agree with Nate Silver’s suggestion to Put Data Fidelity First, and am excited that Ranker continues to collect precise, explicit answers to increasingly diverse questions (e.g. the best Doritos flavors). We are the only company that combines the precision of market research with the scale of internet polling methods, and so I’m hopeful, as our traffic continues to grow, that the value of our opinion data will continue to grow with it.
- Ravi Iyer
ps. I welcome disagreement and thoughtful discussion as I’m certain I have something to learn from others here and that there are things I could be missing.
Comments