PROTOCOL: Police stops to reduce crime: A systematic review

The use of pedestrian stops has been one of the most common yet controversial proactive strategies in modern policing (Weisburd & Majmundar, 2018). The pedestrian stop (also known as stop and frisk, Terry stops, street pops, stop and search, street stops, etc.) is often defined as the process by which “...officers stop, and potentially question and search, people in the communities they are patrolling” (Lachman et al., 2012, p. 1). These tactics have been staples in policing for generations, but they gained legitimacy with the landmark U.S. Supreme Court decision in Terry v. Ohio (1968)—which allows police officers discretion to conduct an investigatory stop of a person given reasonable suspicion that the person has committed a crime or is in the process of committing a crime, and discretion to frisk (or pat‐ down) the person given reasonable suspicion that they are carrying a weapon (see Jones‐Brown et al., 2010). Often termed “stop, question, and frisk (SQF)” (Rosenfeld & Fornango, 2014, p. 96), evidence suggests that many U.S. police departments began using pedestrian stops widely as a proactive policing strategy in the 1990s and early 2000s (Gelman et al., 2007; White & Fradella, 2016). In New York City alone, recorded SQFs increased from 160,851 in 2003 to 685,000 in 2011 (Weisburd et al., 2016), and similar increases have been noted in other U.S. cities such as Philadelphia and Los Angeles (Jones‐Brown et al., 2010; Saul, 2016). Police “stop and search” (McCandless et al., 2016, p. 2) powers have also been noted in the UK, where targeted pedestrian stops have been used as a strategy to reduce knife crime (Tiratelli et al., 2018), and in other European countries such as Bulgaria, Hungary, and Spain, often for the purpose of conducting identity checks related to criminal investigations (Miller et al., 2008). In this context, pedestrian stops have been used as primary components in a number of different proactive policing interventions, including crackdowns (Sherman, 1990), efforts to reduce illegal gun carrying (Koper & Mayo‐Wilson, 2006), directed patrol interventions (Ratcliffe et al., 2011), and hot spots policing interventions (Braga, Turchan, et al., 2019). While advocates have considered pedestrian stops to be a contributing factor to decreasing levels of crime in American cities (Baker & Goldstein, 2012), critics have pointed to the low success rates (i.e., low proportions of stops that lead to arrest or weapon seizure) and racial disparity associated with these strategies as evidence that such tactics represent an illegal and unjust use of police power (Fagan & Davies, 2000; Gelman et al., 2007; Rosenfeld & Fornango, 2014). Racial and ethnic profiling has also been a concern on an international level, with researchers noting racially disparate stop rates in several European countries—without clear evidence that these strategies have produced meaningful crime reductions (McCandless et al., 2016; Miller et al., 2008; Tiratelli et al., 2018). Despite such challenges, practitioners still view pedestrian stops as an important element of proactive crime prevention efforts


| The problem, condition, or issue
The use of pedestrian stops has been one of the most common yet controversial proactive strategies in modern policing (Weisburd & Majmundar, 2018). The pedestrian stop (also known as stop and frisk, Terry stops, street pops, stop and search, street stops, etc.) is often defined as the process by which "…officers stop, and potentially question and search, people in the communities they are patrolling" (Lachman et al., 2012, p. 1). These tactics have been staples in policing for generations, but they gained legitimacy with the landmark U.S. Supreme Court decision in Terry v. Ohio (1968)-which allows police officers discretion to conduct an investigatory stop of a person given reasonable suspicion that the person has committed a crime or is in the process of committing a crime, and discretion to frisk (or patdown) the person given reasonable suspicion that they are carrying a weapon (see Jones-Brown et al., 2010).
Often termed "stop, question, and frisk (SQF)" (Rosenfeld & Fornango, 2014, p. 96), evidence suggests that many U.S. police departments began using pedestrian stops widely as a proactive policing strategy in the 1990s and early 2000s (Gelman et al., 2007;White & Fradella, 2016). In New York City alone, recorded SQFs increased from 160,851 in 2003 to 685,000 in 2011 (Weisburd et al., 2016), and similar increases have been noted in other U.S. cities such as Philadelphia and Los Angeles (Jones-Brown et al., 2010;Saul, 2016). Police "stop and search" (McCandless et al., 2016, p. 2) powers have also been noted in the UK, where targeted pedestrian stops have been used as a strategy to reduce knife crime (Tiratelli et al., 2018), and in other European countries such as Bulgaria, Hungary, and Spain, often for the purpose of conducting identity checks related to criminal investigations (Miller et al., 2008). In this context, pedestrian stops have been used as primary components in a number of different proactive policing interventions, including crackdowns (Sherman, 1990), efforts to reduce illegal gun carrying (Koper & Mayo-Wilson, 2006), directed patrol interventions (Ratcliffe et al., 2011), and hot spots policing interventions (Braga, Turchan, et al., 2019).
While advocates have considered pedestrian stops to be a contributing factor to decreasing levels of crime in American cities (Baker & Goldstein, 2012), critics have pointed to the low success rates (i.e., low proportions of stops that lead to arrest or weapon seizure) and racial disparity associated with these strategies as evidence that such tactics represent an illegal and unjust use of police power (Fagan & Davies, 2000;Gelman et al., 2007;Rosenfeld & Fornango, 2014). Racial and ethnic profiling has also been a concern on an international level, with researchers noting racially disparate stop rates in several European countries-without clear evidence that these strategies have produced meaningful crime reductions (McCandless et al., 2016;Miller et al., 2008;Tiratelli et al., 2018).
Despite such challenges, practitioners still view pedestrian stops (D'Onfrio, 2019;Terkel, 2013), making an understanding of their effects on crime and the community increasingly important. We propose a systematic review on the effects of pedestrian stops as a strategy for reducing crime, including examination of the degree of geographic focus of the programs examined, and possible moderators such as its linkage with other crime prevention approaches. In addition, this review seeks to examine the effects of pedestrian stops on the people and the communities (micro, meso, or macro) within which these strategies are employed.

| The intervention
Pedestrian stops generally involve the police-initiated stop of a person (or group of people) on the street for the purpose of investigation and/or questioning (Lachman et al., 2012). In most cases, the officer must have reasonable suspicion that a person is involved in criminal activity for a stop to occur, and based on the level of suspicion, a frisk or search of that person may be conducted. However, in certain contexts stops may be conducted without suspicion or the threshold for reasonable suspicion may vary. In the UK, the Criminal Justice and Public Order Act of 1994 permits suspicion-less stops in high-risk areas with approval from an authorizing officer (Lennon, 2013(Lennon, , 2015. Police officers in the UK and other European countries are also permitted to conduct suspicion-less stops of people in authorized areas as a proactive counter-terrorism measure (Lennon, 2013). Similarly, the U.S. Supreme Court has ruled that the amount of crime in a given area can be used as a factor in an officer's determination of reasonable suspicion (Gelman et al., 2007;Illinois v. Warlow). Thus, it is important to note that while pedestrian stops are often reactive in nature, in that they require prior indication of suspicious behavior or criminal activity, they may also be used proactively. In this regard, police stops are often employed as key elements of policing strategies that "…have as one of their goals the prevention or reduction of crime and disorder" (Weisburd & Majmundar, 2018, p. 1). Thus, pedestrian stops are used to both respond to observed or reported criminal behavior as well as to deter and prevent future criminal behavior.
Pedestrian stops may be employed as distinct policing interventions or used as components of larger interventions such as shortterm police crackdowns (Sherman, 1990), directed patrol presence (McGarrell et al., 2001;Ratcliffe et al., 2011), and hot spots policing . Additionally, while pedestrian stops have been primarily implemented as a tactic to reduce violent and/or weapon-related crime (Koper & Mayo-Wilson, 2006;Ratcliffe et al., 2011;, evidence has suggested that they have also been used to target other crime/disorder problems, for example, drug-related crime (Geller & Fagan, 2010;Levine & Small, 2008). The current review will include any study that meets our methodological quality threshold (see later) in which different "regimes" of pedestrian stops are compared. This can be as a result of a policing intervention that employs pedestrian stops as a primary component, or as a result of controlled comparisons between areas that evidence varying levels of pedestrian stops, regardless of what (if any) specific crime/disorder outcome is being targeted. However, interventions employing pedestrian stops, but where such stops are not a primary programmatic intervention or the primary focus of the comparison, will be excluded.

| How the intervention might work
It has often been argued that offenders weigh the potential costs and benefits associated with a criminal act, and as such, are deterred from crime in situations in which the potential costs outweigh the potential benefits (Beccaria, 1986;Bentham, 1988;Durlauf & Nagin, 2011;Nagin, 2013). Pedestrian stops may deter crime by increasing these perceived costs, and likewise the perceived certainty of apprehension among potential offenders (Lachman et al., 2012).
In other words, people who have been personally stopped by the police may alter their behavior or avoid the area in which the stop occurred to mitigate their risk, while people who become vicariously aware of the pedestrian stop intervention may pre-emptively do the same (Rosenfeld & Fornango, 2014). Furthermore, if pedestrian stops result in the seizure of weapons and other items that are often used to commit crime, they may reduce crime simply through the incapacitation achieved by preventing access to the tools needed to commit criminal acts . It is also possible that pedestrian stop strategies deter crime merely through increasing police presence in high-crime areas. In this context, the increase in deterrence is not necessarily related to the strategy itself, but rather to the increased presence of police in an area.
It is key in any policing program to examine the impacts of specific policing strategies on both the people targeted and the communities where they are applied. Advocates of pedestrian stops focus on the benefits of reduced crime in the community (D' Onfrio, 2019;Terkel, 2013). Critics of the approach argue that pedestrian stops may negatively affect community attitudes toward the police by increasing involuntary contact with community members in ways that are perceived as unfair and unlawful (Miller & D'Souza, 2015;Tyler et al., 2014). The negative effects of these stops may also stem from their disproportionate application to specific groups of people. For example, like other proactive crime prevention strategies that deploy police resources in high-crime areas, pedestrian stops are often concentrated in disadvantaged minority communities (Fagan & Davies, 2000;Levine & Small, 2008). Given this concentration and the potential for officers to act on implicit biases that define racial minorities as safety threats (Epp et al., 2014;Holmes, 2000), there is concern that pedestrian stop strategies have disparate impacts on minority populations (Fagan & Davies, 2000;Ferrandino, 2015). For instance, research has suggested that in New York City, Blacks are over six times more likely to be stopped by police than Whites, and that the rate of success during these stops (operationalized as the rate of drug/weapon seizures or arrests) is often < 3% for seizures and < 7% for arrests (see Geller & Fagan, 2010;Gelman et al., 2007;Jones-Brown et al., 2010). Thus, the vast majority of police stops appear to be conducted against disadvantaged populations that are neither committing an arrestable offense, carrying weapons, or carrying contraband.
As a result of these factors, police-initiated stops may reduce feelings of police legitimacy among the people stopped or the communities in which they are found. In turn, there is evidence that pedestrian stops may lead to negative mental health effects on those stopped, such as anxiety and trauma )-because of the stress and fear engendered among populations such as young black males who are often more likely to be stopped in these programs. Pedestrian stops may also be conducted in a rough manner, potentially leading to the use-of-force that results in physical injury, or even death (see Miller et al., 2017), to the person stopped (Brunson & Weitzer, 2009;Levine & Small, 2008).
Beyond the proximate damage caused, if these experiences happen in large numbers, vicarious knowledge of these incidents may then lead to undesirable effects on community perceptions of the police in the long run (Miller & D'Souza, 2015).
Conversely, the effect of pedestrian stops on feelings of police legitimacy may be dependent on the manner in which officers conduct stops and the purposes of the stop intervention. Stops that are conducted in a procedurally just manner, with attention paid to citizen participation, respect, neutrality, and trustworthy motives, may have the potential to increase positive police-citizen contact and increase feelings of police legitimacy (Mazerolle et al., 2013;Tyler, 2004). Similarly, if the programmatic goal of a stop intervention aligns with the priorities of the community (i.e., gun seizures, drug crime reduction, etc.) it may further improve the perceived responsiveness of the police to community concerns.

| Why it is important to do the review
Proactive policing tactics play an important role in crime prevention (Skogan & Frydl, 2004;Telep & Weisburd, 2012;Weisburd & Eck, 2004;Weisburd & Majmundar, 2018). However, the effects of proactive interventions vary greatly by the type of intervention and the manner in which the intervention is applied. Some tactics raise critical questions about the impacts of policing on the communities that they serve (Braga, Brunson, et al., 2019;Tyler et al., 2014).
Police have long felt that pedestrian stops can have an important general and specific deterrent value in preventing crime. Research evidence supporting this view began to develop in the 1990s with evaluations of police crackdowns (Sherman, 1990). There is evidence that many cities across the United States were using pedestrian stops as a key crime prevention tool (Gelman et al., 2007;White & Fradella, 2016), and indeed the use of pedestrian stops has often correlated with decreasing crime in major U.S. cities . But a rigorous assessment of crime prevention outcomes of pedestrian stops has not been developed to date. A key contribution of our review will be to identify whether pedestrian stops reduce crime, and if so to identify the magnitude of that impact.
As noted earlier, a number of scholars have documented negative impacts of pedestrian stops on citizens and communities.
In recent years, pedestrian stop tactics have come under increased legal scrutiny. For example, a federal district court ruling in Floyd v.
City of New York (2013) found the New York City Police Department's use of SQF unconstitutional on the basis of racial disparity. Similar lawsuits have been brought against other U.S. police departments during the past decade (American Civil Liberties Union, 2010), and the perceived abuse of stop and search powers has led to riots and legal challenges in several European countries as well (Lennon & Murray, 2018;Murray et al., 2020).
Due to these concerns, pedestrian stop tactics have become extremely controversial. Perhaps as a result of this controversy, recent years have seen the use of such stops decrease substantially in major cities such as New York and Philadelphia (McNeil, 2020;Weisburd et al., 2016), and in European countries such as England and Scotland (Lennon & Murray, 2018;Tiratelli et al., 2018). There has even been a growing call among many to do away with pedestrian stop tactics entirely (see Baker & Goldstein, 2012). Yet, existing reviews have often failed to find evidence of negative impacts on community evaluations of the police-though negative effects on people who are stopped has a stronger evidence base (e.g., see Weisburd & Majmundar, 2018). Thus, it is increasingly important to determine if pedestrian stops have a crime prevention impact, and whether they produce negative consequences for the people and communities affected by them. To date, no review has systematically assessed these outcomes or simultaneously considered them alongside each other. Such a review is critical for informed crime prevention policy that weighs all potential costs and benefits.

| OBJECTIVES
Given that pedestrian stop tactics have garnered controversy and concern over their potential effects on crime (see Macdonald et al., 2016;Rosenfeld & Fornango, 2014;Weisburd et al., 2016), community (see Baker & Goldstein, 2012;Gelman et al., 2007;Miller & D'Souza, 2015;Tyler et al., 2014) and health-related outcomes , the main objective of this review is to examine the impact of pedestrian stops across each of these areas. Specifically, this review will synthesise the effect of police-initiated pedestrian stops on five discrete outcome groupings: • Crime and disorder; • Violence in police-citizen encounters; • Police misbehavior; • Community outcomes including fear of crime and perceptions of police (legitimacy, trust, satisfaction, and effectiveness); and • Health-related outcomes.
Our second objective is to examine whether the effects of policeinitiated pedestrian stops vary according to potential moderating factors. Assuming data is sufficient, we will assess whether effect sizes vary significantly based on (a) research design (see Weisburd et al., 2001); (b) country in which the intervention took place; (c) size of the geographic area at which the intervention was applied (e.g., micro versus macro-place); and (d) crime type of focus (e.g., violent versus drug crime). Where data is sufficient, and given the concerns related to the concentration of pedestrian stops among racial minorities (Fagan & Davies, 2000), we will also examine racial composition as a moderating factor. Ultimately, this review seeks to provide a comprehensive account of the effects of pedestrian stop tactics for policing agencies and city governments, allowing them to consider both the effects on crime and the community when deciding whether to pursue such tactics. For studies to be considered eligible for this review the evaluation must include a target area or group where pedestrian stops are heightened (or reduced) and a comparison area which reflects reduced use of pedestrian stops (or heightened use). We suspect that most studies will involve proactive police interventions in target areas where pedestrian stops are encouraged as a crime prevention tool, and where the control areas will receive standard policing interventions. At the same time, we will include studies which compare areas (such as beats or precincts), or cities, in which there are "natural" variations in the use of pedestrian stops. Such variation could be the result of variation in policies, or long term historical patterns in policing.
Randomized and quasi-experimental research designs will be eligible for inclusion in the review (Campbell & Stanley, 1966;Cook & Campbell, 1979;Shadish et al., 2002). This inclusion threshold is adapted from the inclusion criterion in Global Policing Database protocol (Higginson et al., 2015, p. 47-48), which will be the primary search source for this review:. We will include studies with "unmatched" control groups; for example, studies that compared a target area or group to the rest of the jurisdiction or group. As such, we will not restrict our review to quasiexperiments with statistically matched control groups, though we will distinguish between statistical matching and "unmatched" groups in a moderator analysis.

| Types of participants
Given our interest in examining the impacts of pedestrian stops on crime, officer behavior, the community, and health outcomes, this review will include the following populations: • Law enforcement officers (including any particular race, ethnicity, gender, etc.) • Citizens (including citizens who are the subjects of pedestrian searches; and including any race, ethnicity, gender, etc.) • Places (may be micro places such as street segments, clusters of addresses, police beats; meso-places such as neighborhoods and communities; or macro-places such as entire jurisdictions, police districts, etc.).

| Types of interventions
Studies that report on an evaluation of a proactive policing program in which the primary component of a policing intervention is pedestrian stops will be eligible for this review. This might include a general approach or policy throughout a policing jurisdiction or a coordinated program with the goal of reducing crime (though a comparison area would need to be identified to meet our inclusion criteria). In other words, pedestrian stops must be employed as a key element of a policing intervention to be considered for inclusion, but we will not exclude studies based on the a priori intent of that intervention. We will also include studies that compare different "regimes" of pedestrian stops absent a particular program or intervention. For example, jurisdictions or areas such as police precincts may be compared based on different levels of pedestrian stops that are the result of long term patterns of policing behavior.
The length, dosage, and goal of these stops may vary by study, but all will include increases in police-initiated stops relative to standard policing or the absence of a police intervention. This review will not be limited to pedestrian stop interventions targeting specific types of crime or disorder (e.g., weapon and drug-related crime), and will not be limited to any specific type of overarching policing tactic (e.g., hot spots policing, crackdowns, directed patrol, etc.). This review will exclude any studies employing pedestrian stops as a secondary component to other policing tactics (as the effects of the stop component would be difficult to isolate from the other components of the policing intervention).

| Types of outcome measures
If data are sufficient, our outcomes measures will be broken up into five primary groups with corresponding subgroups as listed below: • Crime and disorder • Health (physical health, mental health, mortality).
Crime and disorder-related outcomes may be measured in a variety of ways (see Addington, 2010), including official crime measures (e.g., incident and arrest data, calls for service data, crime rates), unofficial crime measures (e.g., crimes reported by civilians), and systematic social observations of crime. We also note that crime and disorder categories may include (but are not limited to) property, drug, and violent offenses.
Incidents of violence in police-citizen encounters will likely be measured through police use-of-force reports. We will attempt to be as discrete as possible, including capturing use-of-force that results from suspect resistance and varying levels of force when possible. We also note that this outcome is not necessarily a measure of unjustified use-offorce, and thus is distinguished from officer misbehavior. Officer misbehavior, rather, will be measured through formal citizen complaints or community surveys reporting on police abuse or violence. We also anticipate community surveys to be the primary method of data collection for studies measuring community outcomes of fear of crime and perceptions of police. While we do not anticipate many studies providing health related outcomes, those that do will likely utilize public surveys with people who have been subjected to pedestrian stops or community members living in areas experiencing a large quantity of pedestrian stops (see Geller et al., 2014;Sewell & Jefferson, 2016). It may also be possible for these studies to include official health outcomes, such as mental health and physical injury measured from hospital data, or other official data sources. For community and health related outcomes, subgroup information will be captured when possible (i.e., race, gender, socioeconomic status).

| Duration of follow-up
We will not restrict eligibility to any particular follow-up period. We expect to find studies with varying follow-up periods, including those that measure the effects of pedestrian stops concurrently with the intervention and those that use a dedicated postintervention period.
If there is notable variation in follow-up periods across studies we will categorize and synthesize studies by length of follow-up period, to include short follow-up (<6 months), medium follow-up (6 months to 1 year), and long follow-up (>1 year).

| Types of settings
To be eligible for inclusion, the intervention must be targeted at a geographic area. Sometimes that will be on specific streets, others on large geographic units such as police beats and districts, or even the entire jurisdiction. Eligible studies may come from any geographic region and any racial, ethnic, or demographic makeup that otherwise meets our eligibility criteria. Eligible studies will also not be restricted to any written language and will be included regardless of publication status. We will use Google Translate to conduct title and abstract screening for any non-English language studies, and we will also use Google Translate for the main text of any non-English language articles that require full-text review. Should this approach not produce unequivocal decision-making, we will contact study authors to verify study eligibility and/or obtain missing data. We will include studies circulated between 1970 and 2018 in this review.

| Electronic searches
The search will be led by the Global Policing Database (GPD) research team at the University of Queensland (Elizabeth Eggins and Lorraine Mazerolle) and Queensland University of Technology (Angela Higginson).
The University of Queensland is home to the GPD (see http://www.gpd. uq.edu.au), which will serve as the main search location for this review.
The GPD is a web-based and searchable database designed to capture all published and unpublished experimental and quasi-experimental evaluations of policing interventions conducted since 1950. There are no restrictions on the type of policing technique, type of outcome measure or language of the research (Higginson et al., 2015). The GPD is compiled using systematic search and screening techniques, which are reported in Higginson et al. (2015) and summarized in Appendices A and B. Broadly, the GPD search protocol includes an extensive range of search locations to ensure that both published and unpublished research is captured across criminology and allied disciplines.
To capture studies, we will use terms related to pedestrian stops to search the GPD corpus of full-text documents that have been screened as reporting on a quantitative impact evaluation of a policing intervention. Specifically, we will use the following terms to search the title and abstract fields of the corpus of documents published from January 1970 and December 2019:

| Searching other sources
As the GPD search process discussed above isolates experimental and quasi-experimental impact evaluations of policing interventions captured from academic and gray literature sources. We will also use additional strategies to supplement the GPD search. First, we will supplement the GPD search with additional sources from Japan, Korea, the Middle East, and Europe by consulting subject guides through the Duke University Library. Second, similar to recent reviews using the GPD (Hinkle et al., 2020;Lum et al., 2020;Mazerolle et al., 2020) we will perform hand searches in leading criminology journals for the 12 months prior to December 2019 search date to identify any recent studies not yet be indexed in the GPD and other databases. 1 Third, we will conduct forward citation searches using Google Scholar and reference harvesting on all studies deemed eligible for inclusion as well as existing reviews on related topics (e.g., Braga, Turchan, et al., 2019;Koper & Mayo-Wilson, 2012).
Finally, after completing all searches, we will e-mail our list of eligible studies to leading policing scholars knowledgeable in the area of pedestrian stop policing tactics. This list of scholars will be comprised of the lead authors of the studies deemed eligible for inclusion in this review as well as general policing experts identified by the research team.
This may help to identify studies that the above searches missed, as these experts may be able to identify studies missing from our list, particularly studies in the gray literature such as dissertations and smaller research reports.

| Description of methods used in primary research
The studies included in this review will use variations of a treatment It is likely, however, that the unit of analysis or original data collection in eligible studies will vary based on the outcome grouping.
We anticipate studies evaluating crime and community outcomes to use places as the unit of analysis (ranging from street segments or addresses to neighborhoods, police districts, or even cities), while studies evaluating officer behavior or health-related outcomes may use people as the unit of analysis. We anticipate that the majority of studies will be quasi-experimental in nature, often selecting comparison areas or groups in posthoc fashion. A smaller number of studies may use randomization to assign areas to treatment or proactively match treatment and control groups/areas. Some studies may include multiple follow-up periods.

| Criteria for determination of independent findings
Our unit of analysis will be the research study, such that each study will only be included once for each discrete outcome grouping.
Where multiple reports concern the same underlying intervention, we will code all reports and choose the one deemed most complete from which to retrieve data (though information may be extracted from multiple reports when necessary or to provide more comprehensive coding). In cases where there are multiple jurisdictions included in a multisite study, we will code each of the sites as a separate study. Where a study or multiple reports concerning the same intervention contain multiple measurement time-points, each will be coded and synthesized separately based on length of followup period (e.g., short, medium, and long). Additionally, studies using clustered designs (i.e. assignment to conditions at higher-level units than those being analyzed) often contain correlations between the units within a cluster. To account for the effects of this correlation on the standard error of the estimates, we will use the method described by Fu et al. (2013). To do so, however, the intra-class correlation coefficient is needed, and where studies fail to report this, we will conduct sensitivity analyses at various ICC values (see Armstrong et al., 2018).
In cases where a study reports multiple outcomes from within the same outcome category (e.g., crime/disorder, officer behavior, community outcomes, etc.) and the same target group/area, we plan to calculate composite effect sizes with standard errors that are adjusted based on the dependence between the individual outcomes (see Borenstein et al., 2009). For instance, a study may measure the effects of pedestrian stops on both violent and property crime.
In these cases, we will code all outcomes identified by the authors and provide the average effect sizes for that outcome category. The aggregation of effect sizes will be conducted using the method proposed by Borenstein et al. (2009), which corrects the variance estimate of the composite effect size based on the strength of the correlation between dependent outcomes. Where the correlations between outcomes cannot be determined we will contact study authors or estimate the correlation between outcomes at a range of plausible values. If very few studies report multiple outcomes from within the same outcome category, then we will conduct both a composite effect size analysis as well as an analysis of the most appropriate effect sizes based on an a priori decision-rule. If possible, we will examine outcomes by crime type, and if a study measures the same specific outcome using two forms of data (e.g., calls for service and incident data measuring violent crime) we will average these effects to ensure that each study is only represented once.
For studies using multiple sites within the same jurisdiction, we will treat the results as multiple outcomes that will be averaged across sites. For example, if a study reports on the effects of pedestrian stops on crime in multiple target areas in one jurisdiction, we will treat this as one study with subunits.

| Selection of studies
The search results will first be screened on title and abstract for potential relevance to police stops. To begin, all screeners will review the same subset of 25 titles/abstracts to establish reliability. These 25 titles/abstracts will be examined by a third author, and where discrepancies exist, all study authors with discuss the discrepancies and provide guidance for further eligibility screening. The rest of the results will then be divided among screeners for abstract/title review and a random sample of 5% of each screener's exclusions will be checked by a third author for quality assurance. Cross-checking will continue until rates of incorrect exclusions fall below 5% of all screenings, or the third author may consider rescreening all exclusions if a screener's decisions appear unreliable. If further questions regarding the eligibility of any particular titles/abstracts are raised, all study authors will discuss and determine the eligibility of the study in question at this initial stage.
All studies that are deemed potentially eligible after title/abstract review will then receive full-text review. Before beginning full-text review all screeners will once again review the same subset of 25 studies to establish reliability. The eligibility determinations for these 25 studies will be compared among screeners by a third author, and where discrepancies exist, all study authors will once again discuss and provide guidance for further screening. Any subsequent studies that generate uncertainty among screeners will receive full-text review from the review authors to determine final eligibility, and a random sample of 5% of each screener's exclusions will be check by a third author.

| Data extraction and management
Once the final sample of eligible studies is determined, all studies will be double coded by Kevin Petersen (an author of this protocol) and another research assistant at George Mason University using the coding sheet in Appendix C. A third author will compile the coding for all studies and then check for consistency between both coders, and any discrepancies will be discussed among the entire review team.
Where discrepancies in coding occur during the remainder of the coding process, all review authors will discuss these discrepancies and come to a final coding decision. All review authors will monitor study coding and have biweekly calls to discuss coding progress.
All eligible studies will be coded (see coding protocol attached in Appendix C) on a variety of criteria (including details related to them) including: (a) Reference information (title, authors, publication etc.) (b) Nature and description of site selection, group, targeted outcome, and so forth.

| Assessment of risk of bias in included studies
Studies deemed eligible for inclusion in the meta-analysis will be assessed for risk of bias using the Cochrane randomized and non-randomized risk of bias tools, based on study design (Higgins et al., 2019). Coding items consistent with these tools will allow for the categorization of randomized experiments into low risk, some concerns, or high-risk classifications, and quasi-experimental designs into low risk, moderate risk, serious risk, or critical risk classifications. We anticipate that the risk of bias tools may need to be adapted to better capture the issues arising in our studies, particularly those that utilize geographic areas as units of analysis.
We plan to modify these tools during the course of the review; however, these modifications will be dictated by the issues that are identified as salient upon review of our eligible studies. Risk of bias ratings across eligible studies will be presented in tabular and narrative format and sensitivity analyses may be performed depending on the level of bias noted in our eligible studies.

| Measures of treatment effect
For eligible place-based studies containing the requisite data, effect sizes will be calculated using a log relative incident rate ratio (RIRR). We anticipate that many of our studies will include place-based count data.
While prior meta-analyses using this form of data have often calculated effect sizes using an odds ratio (OR) that is then converted to a standardized mean difference effect size (Cohen's d; see Braga, Turchan, et al., 2019), Wilson (2021) suggests that Cohen's d effect sizes are not necessarily comparable across studies when computed using place-based count data, and that Cohen's d values calculated using the above WEISBURD ET AL.

| 7 of 25
conversion are not comparable to those calculated directly through conventional means (see Braga & Weisburd, 2020;Hinkle et al., 2020; for evidence of these problems in meta analyses). The log RIRR approach, when exponentiated, produces an easily interpretable percent change for the treatment group relative to the control group and has been used in a recent Campbell systematic review of problem-oriented policing, which analyzed studies with similar forms of data to those we anticipate in this review (see Hinkle et al., 2020). The general equation for the log RIRR is represented below (see Wilson, 2021): 11 00

10
With the first subscript representing treatment (1) or control (0) grouping, and the second subscript representing pre (0)  where ̅ X k is the average count for treatment and control area across both pre and postintervention time periods, S k is the standard deviation for each average count, and n k is the number of counts (contributing to the mean) for both treatment and control groups across pre and postintervention periods (Wilson, 2021). Experience in prior reviews (e.g., see Braga & Weisburd, 2020;Hinkle et al., 2020) suggest that the data for estimating over-dispersion is likely not available in many studies. Because of this we will use the method proposed by Farrington et al. (2007) to estimate confidence intervals for the produced statistics. However, we will conduct a sensitivity analysis of our overall findings by calculating the Farrington et al. and Wilson estimates for the overall mean effects for studies that data allow and compare summary findings. If a sufficient number of studies are available we will also apply the mean correction using the Wilson approach to all studies and compare the overall findings using both methods.
For studies providing binary outcomes or prevalence data (e.g., the presence/absence of an injury or condition), effect sizes will be calculated using a risk ratio, which largely follows the equation for the RIRR provided above and should allow for easier comparison of these effects with those of place-based count outcomes (Wilson, 2021). Effect sizes for individual-level outcomes, such as citizen surveys, measured on a Likertscales will be calculated using Cohen's d.

| Dealing with missing data
Where otherwise eligible studies lack the requisite data to calculate effect sizes, make determinations on risk of bias, or other salient coding areas, we will contact the study authors in attempt to obtain the necessary data. If this is unsuccessful, studies will be excluded from the meta-analysis if effect sizes cannot be calculated. However, these studies will still be noted and included in our narrative descriptions to avoid the introduction of bias from lack of reporting.

| Assessment of heterogeneity
We will examine the presence of heterogeneity in effect sizes across studies using the Q statistic. Additionally, we will examine the extent of such heterogeneity using I 2 and τ 2 statistics. Heterogeneity will also be explored using the variables listed in the subgroup analysis section.

| Assessment of reporting biases
We will use traditional methods to test for the sensitivity of the findings to publication/reporting bias in the experimental and quasiexperimental studies. These methods will include a comparison of the mean effect size for published and unpublished studies, a trim-and-fill analysis, and funnel plot.

| Data synthesis
Data permitting, effect sizes for conceptually analogous studies will be combined using inverse-variance weighted meta-analysis. Given that our studies will likely be conducted in various settings, with various forms of pedestrian stop interventions, and with various types of samples and corresponding populations, we will assume a random effects model.
In other words, we acknowledge that the true effect of pedestrian stops may differ based on the unit of analysis/intervention and the specifics of the strategy, and thus believe that a mixed effects model is most suitable for analyzing the effectiveness of pedestrian stop interventions. Data synthesis will be developed using Comprehensive Meta Analysis software, https://www.meta-analysis.com, and the method-of-moments random effects estimator will be used.

| Subgroup analysis and investigation of heterogeneity
As noted above, we hope to examine factors moderating the effect of pedestrian stops on our outcome measures. Though it is difficult to know what moderator analyses will be possible given the data, at minimum we hope to assess the differential effects of pedestrian stops on research design, location where the intervention took place, size of the geographic area at which the intervention was applied (e.g., micro versus macro-place), crime type of focus (e.g., violent versus drug crime), and racial or ethnic composition of the communities in the study or the persons stopped. Additionally, to the extent possible, we will conduct a moderator analysis based on the intent of the intervention. In other words, stop interventions that were a key component of an intervention with a priori intent to deter or prevent crime will be analysed separately from other studies, such as those in which intent cannot be determined or those with posthoc analysis of differing stop policies or numbers of stops conducted. If our search identifies enough data to assess these effects we will use the analog to the ANOVA method of moderator analysis (see Lipsey & Wilson, 2001) for categorical moderator variables and meta-analytic regression analysis for continuous moderator variables or analyses involving multiple moderators. Depending on our results, we may wish to conduct additional posthoc moderator analyses. If so, we will clearly delineate a priori analyses from those that were determined posthoc.

| Sensitivity analysis
In addition to including sensitivity analyses using smallest, largest, and combined effect sizes, publication biases, and various ICC levels (if necessary), we will also conduct sensitivity analyses excluding studies with significant risk of bias.

| Treatment of qualitative research
We do not plan to include qualitative research in this review.  Coordinating Group and so will not be involved in the editorial or formal approval process for this protocol or the subsequent review.

PRELIMINARY TIMEFRAME
The review process will adhere to the following schedule: Submission of protocol Fall 2020 Completion of GPD search Spring 2021 Coding of eligible studies Spring 2021 to Summer 2021 Submission of completed report Winter 2021

PLANS FOR UPDATING THE REVIEW
The authors expect to update the review every 5-10 years depending on a sense of trends for experimental and quasiexperimental evaluations and available funding and support. The free-text search terms for the GPD are provided in Table A1 and are grouped by substantive (i.e., some form of policing) and evaluation terminology. Although the search strategy may vary slightly across search locations, it follows a number of general rules: • Search terms are combined into search strings using Boolean operators "AND" and "OR." Specifically, terms within each category are combined with "OR," and categories will be combined with "AND." For example: (police OR policing OR "law#enforcement") AND (analy* OR ANCOVA OR ANOVA OR …).
• Compound terms (e.g., law enforcement) are considered single terms in search strings by using quotation marks (i.e., "law*enforcement") to ensure that the database searches for the entire term rather than separate words.
• Wild cards and truncation codes are used for search terms with multiple iterations from a stem word (e.g., evaluation, evaluate) or spelling variations (e.g., evaluat* or randomi#e).
• If a database has a controlled vocabulary term that is equivalent to "POLICE," the term is combined in a search string that includes both the policing and evaluation free-text search terms. This approach ensures that the search retrieves documents that do not use policing terms in the title/abstract but have been indexed as being related to policing in the database.
• For search locations with limited search functionality, a broad search that uses only the policing free-text terms is implemented.
• Multidisciplinary database searches are limited to relevant disciplines (e.g., include social sciences but exclude physical sciences).
• Search results are refined to exclude specific types of documents that are not suitable for systematic reviews (e.g., newspapers, front/back matter, book reviews).

Search Locations
To reduce publication and discipline bias, the GPD search strategy adopts an international scope and involves searching for literature across a number of disciplines (e.g., criminology, law, political science, public health, sociology, social science and social work). The search captures a comprehensive range of published (i.e., journal articles, book chapters, books) and unpublished literature (e.g., working papers, governmental reports, technical reports, conference proceedings, dissertations) by implementing a search strategy across bibliographic/academic, gray literature, and dissertation databases or repositories.
It is noted that there is substantial overlap of the content coverage between many of the databases. Therefore, the Optimal Searching of Indexing Databases (OSID) computer program (Neville & Higginson, 2014) has been used to analyse the content crossover for all databases that have accessible content coverage lists. OSID analyses the content coverage and creates a search location solution that provides the most comprehensive coverage via the least number of databases. Another advantage of using OSID when designing a search strategy is the reduction in the number of duplicates that would need to be removed prior to the screening phase. Databases with >10 unique titles are searched in full, whereas databases with ≤10 unique titles were searched only the unique titles and any non-serial content (e.g., reports, conference proceedings). Where a modified search of a database would be more laborintensive than a full search and export results, a full search of the database is conducted. The final search locations and solutions are reported in Table A2.

Types of interventions
Each document must contain an impact evaluation of a policing intervention. Policing interventions are defined as some kind of a strategy, program, technique, approach, activity, campaign, training, directive, or funding/organizational change that involves police in some way (other agencies or organizations can be involved). Police involvement is broadly defined as: • Police initiation, development or leadership • Police are recipients of the intervention or the intervention is related, focused or targeted to police practices • Delivery or implementation of the intervention by police.

Types of study designs
The GPD includes quantitative impact evaluations of policing interventions that utilize randomized experimental (e.g., RCTs) or quasi-experimental evaluation designs with a valid comparison group that does not receive the intervention. The GPD includes designs where the comparison group receives "business-as-usual" policing, no intervention or an alternative intervention (treatment-treatment designs).
The specific list of research designs included in the GPD are as follows: • Systematic reviews with or without meta-analyses • Cross-over designs The GPD excludes single group designs with pre-and postintervention measures as these designs are highly subject to bias and threats to internal validity.

Systematic Screening
To establish eligibility, records captured by the GPD search progress through a series of systematic stages which are summarized in Figure A1, with additional detail provided in the following subsections.
All research staff working on the GPD undergo standardized training before beginning work within any of the stages detailed F I G U R E A 1 GPD systematic compilation process below. Staff then complete short training simulations to enable an assessment of their understanding of the GPD protocols and highlight any areas for additional training. In addition, random samples of each staff's work are regularly cross-checked to ensure adherence to protocols. Disagreements about screening decisions between staff are mediated by either the project manager or GPD chief investigators.

Title and abstract screening
After removing duplicates, the title and abstract of records captured by the GPD systematic search is screened by trained research staff to identify potentially eligible research that satisfies the following criteria: • Document is dated between 1950 and present • Document is unique (i.e., not a duplicate) • Document is about police or policing • Document is an eligible document type (e.g., not a book review) Records are excluded if the answer to any one of the criteria is unambiguously "No," and will be classified as potentially eligible otherwise. Records classified as eligible at the title and abstract screening stage progress to full-text document retrieval and screening stages.
Full-text eligibility screening