Including protected variables can make algorithmic decision-making more fair
A recent paper co-authored by JON KLEINBERG, JENS LUDWIG, SENDHIL MULLAINATHAN, and ASHESH RAMBACHAN addresses algorithmic bias, countering the "large literature that tries to 'blind' the algorithm to race to avoid exacerbating existing unfairness in society":
"This perspective about how to promote algorithmic fairness, while intuitive, is misleading and in fact may do more harm than good. We develop a simple conceptual framework that models how a social planner who cares about equity should form predictions from data that may have potential racial biases. Our primary result is exceedingly simple, yet often overlooked: a preference for fairness should not change the choice of estimator. Equity preferences can change how the estimated prediction function is used (such as setting a different threshold for different groups) but the estimated prediction function itself should not change. Absent legal constraints, one should include variables such as gender and race for fairness reasons.
Our argument collects together and builds on existing insights to contribute to how we should think about algorithmic fairness.… We empirically illustrate this point for the case of using predictions of college success to make admissions decisions. Using nationally representative data on college students, we underline how the inclusion of a protected variable—race in our application—not only improves predicted GPAs of admitted students (efficiency), but also can improve outcomes such as the fraction of admitted students who are black (equity).
Across a wide range of estimation approaches, objective functions, and definitions of fairness, the strategy of blinding the algorithm to race inadvertently detracts from fairness."
Read the full paper here.
- Another paper by Kleinberg, Lakkaraju, Leskovec, Ludwig, and Mullainathan from 2017 compares algorithmic decisions to human decisions regarding bail, and finds that they are often at odds. "The goal of our paper is not to identify new ways of optimizing the machine learning algorithm. Instead our goal is to understand whether these machine learning predictions can help us understand and improve judges’ decisions." Link.
- A recidivism risk assessment tool that was slated to be introduced in Pennsylvania this summer has been delayed after pushback from community and legal groups. Link to an excellent post by Suresh Venkatasubramanian, who was commissioned by the ACLU to review the tool. And link to news coverage of the public hearings surrounding the tool.
- Tangentially related to automated decision systems, but highly relevant for the broader digital ethics conversation, Friday's Supreme Court ruling in Carptenter v. United States held that accessing cell-site location records from a telecom provider was "a search within the meaning of the Fourth Amendment, for which the government would generally have to obtain a warrant." Link to discussion of the decision on SCOTUS Blog.
Administrative data and social science research
In a 2011 NATIONAL SCIENCE FOUNDATION white paper, DAVID CARD, RAJ CHETTY, MARTIN FELDSTEIN, AND EMMANUEL SAEZ discuss the value of administrative data for social science research:
"Governments create comprehensive micro-economic files to aid in the administration of their tax and benefit programs.… A rich archive of information covering most aspects of socio-economic behavior from birth to death, including education, earnings, income, workplace and living place, family composition, health and retirement, is recorded in administrative data."
The authors note a dearth of access to United States administrative data when compared to several European countries, and argue for a reinvigorated access infrastructure to foster forward-thinking policy-minded work in the social sciences.
"We emphasize that direct access to micro-data is critical for success. Based on experiences from other countries and pilot initiatives, we believe that five conditions must be satisfied to make a data access program sustainable and efficient:
(a) fair and open competition for data access based on scientific merit
(b) sufficient bandwidth to accommodate a large number of projects simultaneously
(c) inclusion of younger scholars and students in the research teams that can access the data
(d) direct access to de-identified micro data through local statistical offices or remote connections
(e) systematic electronic monitoring to allow immediate disclosure of statistical results."
Link to the paper.
- An extremely detailed and enlightening 2017 chapter on the use and potential of government administrative data for federal statistics in the United States, from a NAS panel publication on innovations in federal statistics. Link.
- In a blog post from September 2017, researcher Chantale Tippett discusses the access problem, highlighting a handful of systematic difficulties and offering some examples of ad hoc access strategies that have led to breakthrough research (including Chetty's). "Data access considerations are not solely technical in nature—that is, that they can be created and broken down by power dynamics and relationships. The need for political clout and/or connections, as well as long time frames to access data, are frequently ill-suited to individual project timelines or budgets." Link. (The post was written in the context of a report for the European Commission on data mining and policy making, available here.)
- Chetty et al cite Denmark as a country with a well-developed, centralized administrative data bank for research purposes. Link to Statistics Denmark's website. And link to a presentation paper on the collection, linkage, and use of administrative data in Nordic National Statistical Offices.
- For context, a recent essay in the Economist on rapidly declining response rates to household surveys across rich countries. Link. We were tipped off to this essay by historian Adam Tooze, who tweeted: "The rise of administrative data as an alternative to surveys and census is a recent trend, but not without historical precedent," and cited his own work on the politics of admin data in Germany from the Kaiserreich to the Nazis. Link to that book.
- Martin Wolf in the Financial Times with a piece on the productivity slowdown. "When I look at the weighty presence in the modern economy of labor-intensive service sectors, such as health, education and care of children and the elderly, I conclude that the technological transformation will be slow. If I am wrong, it will be disruptive. At the moment, however, we have the worst of both worlds: significant disruption but near stagnation in average incomes." Link.
- Related to the above, a VoxEU post from 2017 from Nicholas Bloom, Chad Jones, John Van Reenen, and Michael Webb looks at the costs of maintaining innovation. "We show that the costs of extracting ideas have increased sharply over time.... In other words, the innovation bang for the R&D buck (or 'research productivity') has declined... low productivity growth in the economy is a direct consequence of research effort failing to increase fast enough to offset declining research productivity." Link. For more on R&D trends, see this recent newsletter.
- Italian tobacco monopolists in the 19th century. Link.
- A crucial new article in Stanford Law Review by Rebecca Wexler examines the introduction of intellectual property claims in the criminal justice system and "offers the first-wide ranging account of trade secret evidence in criminal cases." Link.
- How economic history has infiltrated economics. By Claude Diebolt and Michael Haupert. Link.
- In These Times magazine has published a set of columns taking the still-very-active Job Guarantee Debates off of Twitter threads and into paragraph form. Rohan Grey and Raúl Carrillo of the National Jobs for All Coalition write in favor, and Matt Bruenig writes a rebuttal. Link.
- New NBER paper co-authored by Avner Greif on the Industrial Revolution and the Great Divergence. "A market-size-only theory of industrialization cannot explain why England developed nearly two centuries before China.... Once we incorporate the incentives of factor suppliers' organizations such a craft guilds, industrialization no longer depends on market size, but on spatial competition between the guilds' jurisdictions." Link.
- An excellent thread from Dave Guarino, director of Code for America's GetCalFresh food stamps enrollment initiative, on complexity in public policy, with a focus on SNAP. "Public programs are changed over time: new laws are passed, new rules promulgated, new edge cases and problems found that require changes. But almost all these changes are ADDITIVE: making the program more complex to accommodate an outlier." Link.
- A paper co-authored by CEPR fellow Klaus Desmet on language diversity and public goods provision. The paper "predicts that a country’s provision of public goods (i) decreases in its overall linguistic fractionalization, and (ii) either increases or decreases in how much individuals locally learn about other groups." Link.
- Historical exposure to violent colonial health campaigns conducted by the French military in Cameroon and and former French Equatorial Africa predicts reduced trust, demand, and uptake for healthcare. Link.
- Examining the short- and long-run effects of resources on economic outcomes between states. Link.
- At LPE Blog, Frank Pasquale with a piece on data nationalization and China's Social Credit System, and "reputational economies" more broadly. Link to the post. Link to an ACLU post summarizing some elements of China's SCS.