↳ Data

July 21st, 2018

↳ Data

High Noon


History of risk assessment, and some proposed alternate methods 

A 2002 paper by ERIC SILVER and LISA L. MILLER on actuarial risk assessment tools provides a history of statistical prediction in the criminal justice context, and issues cautions now central to the contemporary algorithmic fairness conversations:  

"Much as automobile insurance policies determine risk levels based on the shared characteristics of drivers of similar age, sex, and driving history, actuarial risk assessment tools for predicting violence or recidivism use aggregate data to estimate the likelihood that certain strata of the population will commit a violent or criminal act. 

To the extent that actuarial risk assessment helps reduce violence and recidivism, it does so not by altering offenders and the environments that produced them but by separating them from the perceived law-abiding populations. Actuarial risk assessment facilitates the development of policies that intervene in the lives of citizens with little or no narrative of purpose beyond incapacitation. The adoption of risk assessment tools may signal the abandonment of a centuries-long project of using rationality, science, and the state to improve upon the social and economic progress of individuals and society."

Link popup: yes to the paper.

A more recent paper presented at FAT* in 2018 and co-authored by CHELSEA BARABAS, KARTHIK DINAKAR, JOICHI ITO, MADARS VIRZA, and JONATHAN ZITTRAIN makes several arguments reminiscent of Silver and Miller's work. They argue in favor of causal inference framework for risk assessments aimed at working on the question "what interventions work":

"We argue that a core ethical debate surrounding the use of regression in risk assessments is not simply one of bias or accuracy. Rather, it's one of purpose.… Data-driven tools provide an immense opportunity for us to pursue goals of fair punishment and future crime prevention. But this requires us to move away from merely tacking on intervenable variables to risk covariates for predictive models, and towards the use of empirically-grounded tools to help understand and respond to the underlying drivers of crime, both individually and systemically."

Link popup: yes to the paper. 

  • In his 2007 book Against Prediction popup: yes, lawyer and theorist Bernard Harcourt provided detailed accounts and critiques of the use of actuarial methods throughout the criminal legal system. In place of prediction, Harcourt proposes a conceptual and practical alternative: randomization. From a 2005 paper on the same topic: "Instead of embracing the actuarial turn in criminal law, we should rather celebrate the virtues of the random: randomization, it turns out, is the only way to achieve a carceral population that reflects the offending population. As a form of random sampling, randomization in policing has significant positive value: it reinforces the central moral intuition in the criminal law that similarly situated individuals should have the same likelihood of being apprehended if they offend—regardless of race, ethnicity, gender or class." Link popup: yes to the paper. (And link popup: yes to another paper of Harcourt's in the Federal Sentencing Reporter, "Risk as a Proxy for Race.") 
  • A recent paper by Megan Stevenson assesses risk assessment tools: "Despite extensive and heated rhetoric, there is virtually no evidence on how use of this 'evidence-based' tool affects key outcomes such as incarceration rates, crime, or racial disparities. The research discussing what 'should' happen as a result of risk assessment is hypothetical and largely ignores the complexities of implementation. This Article is one of the first studies to document the impacts of risk assessment in practice." Link popup: yes
  • A compelling piece of esoterica cited in Harcourt's book: a doctoral thesis by Deborah Rachel Coen on the "probabilistic turn" in 19th century imperial Austria. Link popup: yes.
⤷ Full Article

July 14th, 2018

Traveling Light


Considerations on data sharing and data markets 

CHARLES I. JONES and CHRISTOPHER TONETTI contribute to the “new but rapidly-growing field” known as the economics of data:

“We are particularly interested in how different property rights for data determine its use in the economy, and thus affect output, privacy, and consumer welfare. The starting point for our analysis is the observation that data is nonrival. That is, at a technological level, data is not depleted through use. Most goods in economics are rival: if a person consumes a kilogram of rice or an hour of an accountant’s time, some resource with a positive opportunity cost is used up. In contrast, existing data can be used by any number of firms or people simultaneously, without being diminished. Consider a collection of a million labeled images, the human genome, the U.S. Census, or the data generated by 10,000 cars driving 10,000 miles. Any number of firms, people, or machine learning algorithms can use this data simultaneously without reducing the amount of data available to anyone else. The key finding in our paper is that policies related to data have important economic consequences.”

After modeling a few different data-ownership possibilities, the authors conclude, “Our analysis suggests that giving the data property rights to consumers can lead to allocations that are close to optimal.” Link to the paper popup: yes.

  • Jones and Tonetti cite an influential 2015 paper by Alessandro Acquisti, Curtis R. Taylor, and Liad Wagman on “The Economics of Privacy”: “In digital economies, consumers' ability to make informed decisions about their privacy is severely hindered, because consumers are often in a position of imperfect or asymmetric information regarding when their data is collected, for what purposes, and with what consequences.” Link popup: yes.
  • For more on data populi, Ben Tarnoff has a general-interest overview in Logic Magazine, including mention of the data dividend and a comparison to the Alaska Permanent Fund. Tarnoff uses the oil industry as an analogy throughout: “In the oil industry, companies often sign ‘production sharing agreements’ (PSAs) with governments. The government hires the company as a contractor to explore, develop, and produce the oil, but retains ownership of the oil itself. The company bears the cost and risk of the venture, and in exchange receives a portion of the revenue. The rest goes to the government. Production sharing agreements are particularly useful for governments that don’t have the machinery or expertise to exploit a resource themselves.” Link popup: yes.
⤷ Full Article

June 23rd, 2018

Yielding Stone


Including protected variables can make algorithmic decision-making more fair 

A recent paper co-authored by JON KLEINBERG, JENS LUDWIG, SENDHIL MULLAINATHAN, and ASHESH RAMBACHAN addresses algorithmic bias, countering the "large literature that tries to 'blind' the algorithm to race to avoid exacerbating existing unfairness in society":  

"This perspective about how to promote algorithmic fairness, while intuitive, is misleading and in fact may do more harm than good. We develop a simple conceptual framework that models how a social planner who cares about equity should form predictions from data that may have potential racial biases. Our primary result is exceedingly simple, yet often overlooked: a preference for fairness should not change the choice of estimator. Equity preferences can change how the estimated prediction function is used (such as setting a different threshold for different groups) but the estimated prediction function itself should not change. Absent legal constraints, one should include variables such as gender and race for fairness reasons.

Our argument collects together and builds on existing insights to contribute to how we should think about algorithmic fairness.… We empirically illustrate this point for the case of using predictions of college success to make admissions decisions. Using nationally representative data on college students, we underline how the inclusion of a protected variable—race in our application—not only improves predicted GPAs of admitted students (efficiency), but also can improve outcomes such as the fraction of admitted students who are black (equity).

Across a wide range of estimation approaches, objective functions, and definitions of fairness, the strategy of blinding the algorithm to race inadvertently detracts from fairness."

Read the full paper here popup: yes.

⤷ Full Article

May 12th, 2018

Lay of the Land


A new paper on the labor effects of cash transfers

SARAH BAIRD, DAVID MCKENZIE, and BERK OZLER of the WORLD BANK review a variety of cash transfer studies, both governmental and non-governmental, in low- and middle-income countries. Cash transfers aren’t shown to have the negative effects on work that some fear:

"The basic economic model of labor supply has a very clear prediction of what should be expected when an adult receives an unexpected cash windfall: they should work less and earn less. This intuition underlies concerns that many types of cash transfers, ranging from government benefits to migrant remittances, will undermine work ethics and make recipients lazy.

Overall, cash transfers that are made without an explicit employment focus (such as conditional and unconditional cash transfers and remittances) tend to result in little to no change in adult labor. The main exceptions are transfers to the elderly and some refugees, who reduce work. In contrast, transfers made for job search assistance or business start-up tend to increase adult labor supply and earnings, with the likely main channels being the alleviation of liquidity and risk constraints."

Link popup: yes to the working paper. Table 2—which covers the channels through which cash impacts labor, is especially worth a read—as many studies on cash transfers don’t go into this level of detail.

  • A study on a large-scale unconditional cash transfer in Iran: "With the exception of youth, who have weak ties to the labor market, we find no evidence that cash transfers reduced labor supply, while service sector workers appear to have increased their hours of work, perhaps because some used transfers to expand their business." Link popup: yes.
  • Continuing the analysis of Hauschofer and Schapiro’s controversial results popup: yes from a cash study transfer in Kenya, Josh Rosenberg at GiveDirectly has, at the end of his overview, some thoughtful questions for continuing research: "Is our cost-effectiveness model using a reasonable framework for estimating recipients’ standard of living over time?… GiveDirectly provides large, one-time transfers whereas many government cash transfers provide smaller ongoing support to poor families. How should we apply new literature on other kinds of cash programs to our estimates of the effects of GiveDirectly?" Link popup: yes.
⤷ Full Article

April 21st, 2018



"Digital goods have created large gains in well-being that are missed by conventional measures of GDP and productivity"

A new paper by ERIK BRYNJOLFSSON et al. suggests using massive online choice experiments as a method to find the true impact of digital goods on well-being. The background section gives an example of the impact that is currently unmeasured:

“... [in] a number of sectors ... physical goods and services are being substituted with digital goods and services. An apropos example of such a transition good is an encyclopedia. Since the 2000s, people have increasingly flocked to Wikipedia to get information about a wide variety of topics updated in real time by volunteers. In 2012, Encyclopedia Britannica, which had been one of the most popular encyclopedias, ceased printing books after 244 years (Pepitone 2012). Wikipedia has over 60 times as many articles as Britannica had, and its accuracy has been found to be on par with Britannica (Giles 2005). Far more people use Wikipedia than ever used Britannica—demand and well-being have presumably increased substantially. But while the revenues from Britannica sales were counted in GDP statistics, Wikipedia has virtually no revenues and therefore doesn’t contribute anything to GDP other than a few minimal costs for running servers and related activities and some voluntary contributions to cover these costs…For such transition goods, consumer surplus increases as free access spurs demand, but revenue decreases as prices become zero. Hence GDP and consumer welfare actually move in opposite directions.”

One finding of note: “50% of the Facebook users in our sample would give up all access to Facebook for one month if we paid them about $50 or more.” Link to paper on NBER here popup: yes. A free draft is available here popup: yes.

⤷ Full Article