Sorry, there is no answer: Musings on data reference

Written by: Hailey Mooney

Primary Source: Digital Scholarship Collaborative Sandbox

These days, librarians receive less “ready reference” questions, meaning those straightforward questions that are easily answered by consulting a reference book (or the electronic equivalent). The statistics questions which come my way are usually the ones where Google is not going to readily provide an authoritative answer. We tend to think that there must be data for everything now. The reality is that data are not always available; it may never have been collected or published, it might be too expensive, or it might be in an unwieldy format. There is no Census survey that reports on the number of mentally ill people driving cars in Ingham County.

As the resident specialist for numeric data and statistics, there are some FAQs which have become routine for me; namely Census data. American FactFinder is not an intuitive database and the intricacies of the American Community Survey 1-, 3-, and 5-year samples versus the Decennial Census are not obvious to new users. Sometimes Census classifications and terminology can increase the difficulty level further still. Race and ethnicity are particularly difficult. For starters, the way the race question has been asked has changed over time, making historical comparisons tricky (I’ve included a favorite table of mine below, showing this). In the current iteration, people can choose any number of affiliations, giving data users options for a single race alone or in combination. Whether something is considered to be “race and ethnicity” or “ancestry” is another matter. A student wanted to create a table showing population demographics for Arabs, Chinese, and Nepalese. It turned out Chinese and Nepalese are subdivisions of (Asian) Race, whereas Arab is classified under Ancestry.

Table 1. Racial Categories: The measurement of race in the Census

Year Categories
1790 White, Black (slave). Indians not taxed (not counted)
1850 Color added as a classification: White, Black (free and slave), Mulatto (free and slave). Indians not taxed (not counted)
1870 White, Black, Mulatto, Chinese, Indian
1890 White, Black, Mulatto, Quadroon, Octoroon, Chinese, Japanese, Indian
1910 White, Black, Mulatto, Chinese, Japanese, Indian, Other
1930 White, Black, Mexican, Indian, Chinese, Japanese, Filipino, Hindu, Korean, Other (written out in full)
1960 Switch to Self-enumeration. White, Black, American Indian, Japanese, Chinese, Filipino, Hawaiian, Aleut, Eskimo, Other.
1970 White, Black, American Indian, Japanese, Chinese, Filipino, Hawaiian, Korean, Other. Hispanic Origin.
1977 OMB Statistical Directive Policy. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity. Five category scheme for race: (1) American Indian or Alaska Native, (2) Asian, (3) Black or African American, (4) Native Hawaiian or Other Pacific Islander, (5) White.
Ethnicity: Hispanic or Latino, Not Hispanic or Latino.
2000 Ability to select multiple races.

Providing an accurate quantitative measurement can be difficult for many topics. Take for example, historical lynching statistics. What’s the chance that all lynchings were actually reported? Incidentally, this is a case where a reference book (Historical Statistics of the United States) does much better than Google. Crime statistics are certainly problematic:

“The statistics of crime are known as the most unreliable and difficult of all statistics. First, the laws which define crimes change. Second, the number of crimes actually committed cannot possibly be enumerated. This is true of many of the major crimes and even more true of the minor crimes. Third, any record of crimes, such as arrests, convictions, or commitments to prison, can be sued as an index of crimes committed only on the assumption that this index maintains a constant ratio of crimes committed. The assumption is a large one, for the recorded crimes are affected by police policies, court policies, and public opinion.”
(Source: Sutherland, E. (1947). Principles of criminology (4th ed.). Chicago: J.P. Lippincott. As cited in Mosher, J.C., Miethe. T.D., & Phillips, D. M. (2002). The mismeasure of crime. Thousand Oaks, CA: Sage Publications.)

Statistics on religion are also tricky:

“To represent the religious history of America statistically and geographically is to generalize dangerously to court disaster openly.”
(Source: Gaustad, E. S., Barlow, P. L., Dishno, R. W., & Gaustad, E. S. (2001). New historical atlas of religion in America. New York: Oxford University Press.)

One of my favorite historical Census publications is the Census of Religious Bodies (conducted 1906-1936). It was discontinued because, you know, separation of church and state. Public Law 94-521 prohibits the Census Bureau from asking mandatory questions about religious affiliation.

Interestingly, another case where the US government refrains from supplying a definitive answer is our national literacy rate. You would think that asking for the US literacy rate is a straightforward question with a straightforward answer, but you would be wrong. The United Nations Educational, Scientific and Cultural Organization (UNESCO) aggregates literacy rates for the countries of the world. The US is conspicuously absent from their tables. We don’t report. The National Center for Education Statistics (NCES) has conducted national literacy surveys, most recently the 2003 National Assessment of Adult Literacy, but they aren’t about to trumpet a simple overall literacy rate statistic (see NCES Fast Facts). Instead it’s a nuanced description of measures for prose, document, and quantitative literacy. Although back in 1992 (see Table 6), NCES recorded adult illiteracy rates from 1870 to 1979 from early Censuses and the Current Population Survey.

The best questions are always the ones where there is no simple answer!

Hailey Mooney

The following two tabs change content below.
Hailey Mooney

Hailey Mooney

Hailey Mooney is Data Services Coordinator and Social Sciences Librarian at the Michigan State University Libraries. She is the liaison to programs in Human Development & Family Studies, Social Work, and Sociology. Hailey works to develop and provide data management services at the MSU Libraries, and is also the social sciences data librarian. Her current research interests are in the area of data information literacy and changing scholarly communication norms. She has also published on data citation behavior. Hailey has a BA in Sociology from the University of Michigan and an MLIS from Wayne State University.
Hailey Mooney

Latest posts by Hailey Mooney (see all)