How we built the Henry A. Wallace Police Crime Database

The Database Website

The purpose of the Henry A. Wallace Police Crime Database is to provide information on police crime (i.e., crime committed by sworn law enforcement officers with the general powers of arrest) that is not otherwise publicly available. No government agency in the United States collects, aggregates, or disseminates data on police crime. The database will be updated and additional cases added periodically, subject to available funding.


Data for the Henry A. Wallace Police Crime Public Database were collected as part of a larger project that began in 2004 designed to locate cases in which sworn law enforcement officers had been arrested for any type of criminal offense(s). Data were derived from published news articles using the Google News search engine and its Google Alerts email update service. Google Alerts searches were conducted using 48 search terms developed by Stinson (2009). The Google Alerts email update service sent a message each time one of the automated daily searches identified a news article in the Google News search engine that matched any of the designated search terms. The automated alerts contained a link to the URL for the news articles. The articles were located, examined for relevancy, printed, logged, and then scanned, indexed, and archived in a digital imaging database for subsequent coding and content analyses. Data sources are triangulated to ensure validity and reliability.

Coding and Content Analysis

Content analyses were conducted in order to code the cases in terms of (a) arrested officer, (b) employing nonfederal law enforcement agency, (c) each of the charged criminal offenses, (d) victim characteristics, (e) organizational adverse employment outcomes, and (f) criminal case dispositions. Each of the charged criminal offenses was coded using the data collection guidelines of the National Incident-Based Reporting System (NIBRS) as the coding protocol for each criminal offense category (see U.S. Department of Justice, 2000). Fifty-seven criminal offenses are included in the NIBRS, consisting of 46 incident-based criminal offenses in one of 22 crime categories as well as 11 additional arrest-based minor criminal offense categories. In each case every offense charged was recorded on the coding instrument as well as the most serious offense charged in each police crime arrest case. The most serious offense charged was determined using the Uniform Crime Report’s (UCR) crime seriousness hierarchy (see U.S. Department of Justice, 2004). An additional eight offenses were added following an earlier pilot study (see Stinson, 2009) because police officers who were arrested often were charged with criminal offenses not included in the NIBRS (e.g., online solicitation of a child, indecent exposure, official misconduct / official oppression / violation of oath, vehicular hit-and-run, perjury / false reports / false statements, criminal deprivation of civil rights).

The primary unit of analysis in this study is criminal arrest case. One of the primary issues in coding was differentiating between arrest cases with multiple victims and officers who were arrested on multiple occasions within the study years 2005-2018. The remainder of this paragraph presents hypothetical situations to demonstrate the unit of analysis in this study. Assume, for example, that an officer was arrested for assaulting his wife. That is coded as one arrest case (arrest case #1). If the same officer was again arrested a week later for violating an order of protection (arrest case #2) that was issued by a court judge following the officer’s first arrest, the second arrest was treated as a separate case in this study. If that same officer was arrested a few months later for drunk driving, that too was recorded as a new arrest case (arrest case #3). The officer was suspended from his employment immediately following his arrest for DUI. For the purposes of this hypothetical, assume that the same officer was acquitted at trial in all three of those arrest cases (that is, arrest case #1, arrest case #2, and arrest case #3) and returned to duty as a police officer. Two years later, let’s assume that the same officer was arrested for sex crimes involving a 14 year-old victim (arrest case #4) and 15 year-old victim (arrest case #5). Further assume that the officer was convicted in the case involving the 14 year-old victim (arrest case #4), and the charges were dismissed by the prosecutor in the criminal case involving the 15 year-old victim (arrest case #5). Following the officer’s conviction (in arrest case #4), the officer was fired from the police department. By coding each arrest case separately, the criminal case dispositions in each case as well as the adverse employment actions attached to each arrest case can be documented for analysis.

Cases were also coded on Stinson’s (2009) typology of police crime, which posits that most crime committed by police officers is alcohol-related, drug-related, sex-related, violence-related, and/or profit-motivated. The types of police crime are not mutually-exclusive categories. Rather, each type of police crime is coded as a dichotomous variable because crimes committed by officers often involve more than one type of police crime. In a case where an officer was arrested and charged with the forcible rape of a female motorist during a traffic stop, for example, the case would be coded in this study as both sex-related and violence-related.

Secondary data were employed from the Census of State and Local Law Enforcement Agencies (CSLLEA) to ascertain demographic data including the number of full-time sworn personnel and part-time sworn personnel employed by each agency where arrested officers served. There are 145 agencies included in this study that were not listed in the 2008 wave of the CSLLEA. County (and independent city) five-digit FIPS identifier numbers were used to verify the geographic location of arrested officers’ employing law enforcement agencies, as well as for use as a key variable to merge other data sources into the project’s master database and data set. The U.S. Department of Agriculture’s (2003) nine-point county-level urban to rural continuum scale was used to measure rurality. Population data from the U.S. Census Bureau’s decennial census in years 2000 and 2010 were utilized for county, independent city, and state populations.


Analytic procedures were undertaken to ensure reliability of the data for the 2005-2011 arrest cases. An additional coder was employed to independently code a random sample of five percent of the total number of cases in the study. Intercoder reliability was assessed by calculating the Krippendorf’s alpha coefficient across 195 variables of interest in this study on a random sample (n = 290, 4.3%) of the cases in the study (N = 6,724) (see Hayes & Krippendorff, 2007). Krippendorf’s alpha is often recognized as the standard reliability statistic for content analysis research (Riffe, Lacy, & Fico, 2005). The Krippendorf’s alpha coefficient (Krippendorf’s α = .9153) is strong across the variables (see Krippendorff, 2013). The overall level of simple percentage of agreement between coders across all of the variables (97.7%) also established a degree of reliability well above what is generally considered acceptable in content analysis research (see Riffe et al., 2005). Testing is currently underway to establish intercoder reliability for the 2012-2018 arrest cases coded data.

Research Project Database

This project has identified and analyzed an unprecedented amount of data on the arrests of nonfederal sworn law enforcement officers in the United States. The research data collection methodology was designed by Stinson (2009) and allows for the aggregation of information on the phenomenon of police crime that would not have otherwise been possible (Payne, 2013). It would also be very difficult to process and code the content of the vast amount of raw data analyzed in the current project without sophisticated database resources. The project utilizes OnBase, an enterprise content management (ECM) object relational database system by Hyland Software. Originally the research team utilized OnBase solely to archive digital image files of all the paper-based news articles, court records, and coding sheets that were collected and analyzed by Stinson and colleagues as part of their research on police crime. Stinson has enhanced the project database with additional electronic files, including video files, audio files, and the ability to search the full text of all the digital images contained in the project’s digital imaging database.

Several database components were added to the project ECM database to support the current study, including an object-relational database that allows for data to be organized in tables (a relational database component), as well as seamless integration with the digital imaging and video files (an object-oriented database component) among others. Coding of content was completed with a customized PC-based coding instrument using the Unicom Intelligence (formerly IBM/SPSS Data Collection Author and Interviewer) software application. The coding instrument system pulls case-specific data from the relational database into the on-screen coding instrument for each case to be coded, thus reducing coder duplication of effort and the potential for coding errors. The data from the completed coding instrument for each case is converted to an SPSS data file for subsequent statistical analyses, and data files are also converted to an electronic coding sheet in Microsoft Word (a facsimile of our paper-based coding instrument) and indexed in OnBase with other case-specific electronic files.

Stinson and colleagues continue to collect data on the criminal arrests of sworn law enforcement officers at state and local law enforcement agencies across the United States at a rate of approximately 1,100 new arrest cases involving approximately 800-950 sworn law enforcement officers arrested annually. The research project is continuing as a longitudinal trend study of police crime in the United States.


Payne, B. K. (2013). White-collar crime: The essentials. Thousand Oaks, CA: Sage.


Riffe, D., Lacy, S., & Fico, F. G. (2005). Analyzing media messages: Using quantitative content analysis in research (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.


Stinson, P. M. (2009). Police crime: A newsmaking criminology study of sworn law enforcement officers arrested, 2005-2007. Indiana University of Pennsylvania, Indiana, PA. Retrieved from


U.S. Department of Agriculture. (2003). Measuring rurality: Rural-urban continuum codes. [Computer file]. Washington. DC: U.S. Department of Agriculture, Economic Research Service. Retrieved from


U.S. Department of Justice. (2000). National Incident-Based Reporting System: Data collection guidelines. Washington, DC: U.S. Department of Justice, Federal Bureau of Investigation, Criminal Justice Information Services Division. Retrieved from


U.S. Department of Justice. (2004). Uniform crime reporting handbook. Washington, DC: U.S. Department of Justice, Federal Bureau of Investigation. Retrieved from


U.S. Department of Justice. (2008). Census of state and local enforcement agencies (CSLLEA). 2008 [ICPSR27681-v1 data set]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-08-03. doi:10.3886/ICPSR27681.v1.