Data Leverage: A Framework for Empowering the Public to Mitigate Harms of Artificial Intelligence


Many computing technologies are primarily useful because of the existence of some set of data created by people, intentionally in some cases and unintentionally in others. For instance, technologies like search engines, recommender systems, classifiers, and language models are all dependent on digital records of things people have said, done, typed, clicked and experienced. In practice, the creation of this useful data involves the participation by, or surveillance of, members of the public. One way to view this relationship is to say that computing technologies are reliant on data labor, and that people who perform data labor have a potential source of leverage -- which we might call data leverage -- over the operators of computing technologies, i.e. technology companies and other large organizations. Identifying new ways to empower the public is important in light of growing concerns that advances in computing -- especially around artificial intelligence, machine learning, and statistics -- will contribute to inequality in power and wealth in addition to creating negative impacts along other dimensions. In this thesis, we describe work that has sought to understand and support data leverage along several fronts: by measuring the value of data to existing systems and platforms, by estimating the potential impact of data leverage actions, and by developing frameworks and tools that support data leverage.

Alternate Identifier
Date created
Resource type
Rights statement