Work

Harnessing Web Information Sources to Predict Events

Public

Search engines and social media are two ubiquitous modes of accessing Web information, and they dictate what information people view, influencing their thoughts and beliefs and potentially shaping their opinions about news and facts. Network effects propagate the beliefs, often magnified, disseminating information rapidly and leaving little time for fact checks. This gives rise to a problem: it is not only information that spreads, but also misinformation. The resulting far-reaching impacts include misinformed prognostications that can lead to further ill effects. Considerable attention has been given to the “cleanup” of the Web, focusing on the common purpose of providing accountability to statements made online. However, the size and the growth of the Web make it challenging to characterize Web information or to separate facts from lies, resulting in people's thoughts and actions that can be void of truth. In this dissertation, we address the problem by using methods based on our thesis that Web information sources can be harnessed to synthesize accurate predictions of events in an attempt to arrive at the truth. Instead of validating every piece of information for provenance, which can be recursive and quickly become intractable, we adopt an approach that embraces noise in information and relies on the wisdom of crowds to derive accurate predictions from data. Toward our goal, we first characterize a particular bias of Web search engine results: the degree to which differences across engines' rankings correlate with features of the ranked content, including point of view and advertisements. We develop PAWS—Platform for Analyzing Web Search engines—to study Google and Bing, and we find no evidence that the engines emphasize results expressing positive orientation toward the engine company’s products. We do find that they emphasize particular news sites and that they also favor pages containing their company’s advertisements, as opposed to competitors’. Next, we use sports predictions from Twitter crowds to study methods for predicting game outcomes. We show that the wisdom of crowds and machine learning can lead to accurate predictions for certain games, and that features pertaining to the crowds can be leveraged for the purpose of prediction. We test similar approaches using Earnings Per Share and Revenue predictions from financial prediction platform, Estimize, and show that our methods have potential applicability across domains for deriving the truth through predictions.

Creator
DOI
Subject
Language
Alternate Identifier
Date created
Resource type
Rights statement

Relationships

Items