Sixty years of second language aptitude research: A systematic quantitative literature review

Second language (L2) aptitude has been broadly defined as the rate and ease of initially acquiring a second language. Historically, L2 aptitude has been understood as a stable trait that predetermined L2 achievement, regardless of individual learners’ efforts to acquire an L2. This traditional view of L2 aptitude as fixed and stable has led to it being a relatively neglected area of research within second language acquisition (SLA) studies. The little research that was in fact conducted was diagnostic in nature, and mostly used tests such as the Modern Language Aptitude Test (MLAT) to select potentially gifted L2 learners. Given that six decades have passed since the publication of the MLAT, now is a good time to revisit the literature and investigate whether L2 aptitude continues to be viewed as an individual difference of little interest to SLA research. While summative literature reviews of L2 aptitude research have been written, few systematic reviews exist. This article conducts a systematic quantitative literature review (SQLR) to provide a principled, comprehensive and reproducible synthesis of research into L2 aptitude published over the last 60 years (1959–2019). In this SQLR, close to one


| INTRODUCTION
Second language aptitude (henceforth, L2 aptitude) has been broadly defined as the rate and ease of initially acquiring a second language (Carroll, 1981).Paradoxically, while many scholars consider aptitude as one of the main predictive individual variables in second language acquisition (SLA), there has been less interest in this variable compared with others (e.g., motivation; see Dörnyei, 2010) as shown by the dearth of empirical studies published until recently (Wen et al., 2017).This situation may stem from the assumption that aptitude is an immutable 'talent' or 'gift' (Mitchell et al., 2019) that learners either have or lack, with little room for enhancement.A second factor explaining the limited interest in aptitude research has been its perceived link with Audiolingualism (Skehan, 1998), leading scholars to question its relevance to current language teaching methodologies.
While new perspectives have emerged since Carroll and Sapon (1959) published the Modern Language Aptitude Test (MLAT), most narrative style reviews of L2 aptitude (e.g., Spolsky, 1995;Wen et al., 2017) still focus on traditional questions, such as whether aptitude predicts language learning success, whether it is modular or monolithic in nature, or whether it is amenable to experience.But new questions are now emerging, setting new research agendas (cf., Dörnyei, 2010;Robinson, 2012;Sawyer & Ranta, 2001;Skehan, 1998Skehan, , 2002;;Wen et al., 2019).In terms of systematic reviews of the field, three meta-analyses have been conducted (Li, 2015(Li, , 2016(Li, , 2017)), which make valuable contributions but are limited to correlational data and thus fail to give a comprehensive overview of the field.In addition, Li (2019) provides a critical overview of the field, combining the results of the meta-analyses with a narrative review.
The 60th anniversary of the publication of the MLAT is thus a timely occasion to revisit the literature on L2 aptitude and to both interrogate the claim that research into second language aptitude has been neglected and investigate whether the motivations for conducting aptitude testing have shifted over the years.These analyses lead us to identify current trends and emerging questions in L2 aptitude research.
To address these issues, we conducted a systematic quantitative literature review (SQLR, Pickering & Byrne, 2014) of empirical research on L2 aptitude.In this systematic review, we establish what topics and issues have been researched, who has undertaken this research, where it was published, at which geographical locations the studies have been conducted, and the types of measures and methods used.This allowed us to identify emerging trends and future directions for research.This article thus makes a critical contribution to the field by mapping the current status of the literature on L2 aptitude.It will be of relevance to researchers new to this field, who can quickly gain an effective understanding of some of the main research questions, and to researchers already in the field, who may find new insights to inform their own work.
The article is organised as follows.First, we introduce the concept of L2 aptitude.This is followed by a discussion of the methodology of designing and conducting a systematic review to ensure its objectivity and reproducibility.We then present the results of the SQLR, summarising the general characteristics of L2 aptitude research, and discuss the findings of the review in the context of current L2 aptitude research.We argue that L2 aptitude research is not 'dead' but has continued over the past 60 years, albeit scantly at times.Importantly, the data also show that the understanding of aptitude has become more nuanced and thus deserves further study.

| Second language aptitude
L2 aptitude is an umbrella term that broadly refers to a 'talent' for language learning.In terms of L2 aptitude research, this talent is defined as the ease and rate at which the L2 is initially acquired (Carroll, 1981).Based on this definition, the term 'aptitude' appears to refer to a real-world phenomenon, but fails to describe it in any objective or meaningful way.Definitions of L2 aptitude in the literature (cf.Dörnyei, 2010;Robinson, 2012;Sawyer & Ranta, 2001;Skehan, 1998Skehan, , 2002) ) suffer from a lack of clarity, with no conceptual consensus emerging (Rogers et al., 2017).Most definitions of L2 aptitude are based on Carroll's work with the MLAT (Carroll & Sapon, 1959) or other instruments that seek to measure this 'talent' for language learning (see Granena, 2013, for  the LLAMA test; Grigorenko et al., 2000, for CANAL-FT 1 ; and Linck et al., 2013, for Hi-LAB 2 ).Indeed, the only point of agreement among researchers appears to be that L2 aptitude is what L2 aptitude tests measure (Dörnyei, 2010;Singleton, 2017).In other words, the theories of L2 aptitude are derived from the power of these tests to predict L2 achievement.As a result, scholars agree that further research is needed to develop a better understanding of the concept of L2 aptitude (Singleton, 2017;Wen et al., 2017).
Much of L2 aptitude research has focused on measurement as a way of screening learners to determine their (un)suitability for foreign language learning instruction (Spolsky, 1995).Yet, while L2 aptitude may not be readily defined, tests such as the MLAT have reliably predicted L2 achievement for over 60 years (Sasaki, 2012).Indeed, L2 aptitude has been identified as the single best predictor of L2 achievement, along with L2 motivation (Dörnyei, 2010;Wen et al., 2017).The MLAT is arguably the benchmark test of L2 aptitude, even after 60 years (Sasaki, 2012).Its benchmark status stems primarily from its high and consistent levels of validity (r = 0.4-0.6;Sasaki, 2012) and reliability (r = 0.55-0.92;Sasaki, 2012) when applied to methods of instruction beyond Audiolingualism, as well as to both formal and naturalistic contexts of acquisition (Sawyer & Ranta, 2001).
Once the MLAT was fully developed and standardised, Carroll (1962) reverse-engineered a post-hoc formulation of L2 aptitude through factor analysis of test data to delimit his construct of L2 aptitude to a subset of cognitive and perceptual linguistic abilities that lead to faster and easier language learning.This analysis resulted in his four-factor construct of L2 aptitude (see Table 1), which continues to be extremely influential to this day (Skehan, 2002;Wen et al., 2017).
The factor analyses of L2 aptitude measures clearly showed the construct to be componential and not unitary (Carroll, 1981), although early research tended to utilise aggregated L2 aptitude test scores.Further research based on the components of L2 aptitude has also shown that L2 aptitude can be broadly categorised into phonological abilities, language analytic abilities (comprising inductive language learning and grammatical sensitivity) and memory abilities (Sasaki, 1993a(Sasaki, , 1993b;;Skehan, 2002).
In summary, L2 aptitude is a complex construct that subsumes various abilities implicated in L2 learning.These abilities are most readily assessed through tests that purport to measure the construct, although the MLAT-like tests lack a rigorous theoretical basis.L2 aptitude is thus an umbrella term that covers various sub-components.

| METHODOLOGY
A systematic quantitative literature review (SQLR) was performed following the well established guidelines set out in Pickering and Byrne (2014).A SQLR aims to (1) provide a comprehensive mapping of a field of inquiry; (2) provide an explicit and reproducible method for identifying and selecting literature; (3) summarise the field at the 'big picture' level; (4) extend beyond correlational data while offering a quantitative view of the field; and (5) uncover broader generalisations and limitations of the field.
To be included in the current review, each source had to meet all four of the following criteria: (1) be a journal article or a PhD thesis 3 ; (2) pertain to empirical research that included some aspect of L2 aptitude; (3) employ one of the established and readily available instruments for measuring L2 aptitude 4 ; and (4) be published in English. 5 Original research papers and theses presenting empirical studies were obtained from the following scholarly electronic databases: Eric, JSTOR, ProQuest, SAGE, Web of Science and Wiley. 6Searches were periodically conducted across all the databases between May 7, 2016 and August 24, 2020.
For each database, the same keyword search was employed (see Table 2).Keywords from column 1, which identify L2-aptitude-specific measures and tests, were combined with the keywords from column 2, which delimit the field of research to second language learning.The keyword phrases in Table 2 are written exactly as they were used in the keyword search.For example, the first search for each database was constructed by combining keywords from column 1 with keywords from column 2, for example ("modern language aptitude test" OR "MLAT") AND ("second language learning" OR "second language acquisition" OR "foreign language learning").
Based on the guidelines in Pickering and Byrne (2014), the following data categories were created: publication details, geographical information, research methods used, participant demographic information, and variables measured.Data were extracted from each paper according to the categories and definitions detailed in Appendix A.

Factor Description
Phonetic coding the ability to meaningfully store auditory information for access at a later time

Grammatical sensitivity the ability to "handle grammar" and discern the functions of words in various contexts
Rote memory for foreign language materials the ability and capacity to memorise a number of associations from the input materials, similar but independent to phonetic coding, encompassing more than just phonetic information Inductive language learning the ability to infer linguistic forms, rules, and patterns from new linguistic content [...] with a minimum of supervision or guidance Carroll's (1962, pp. 129-130) four factors of L2 aptitude

| Study selection
As Figure 1 shows, from the initial screening of the keyword database searches, 646 unique sources were selected and assessed for eligibility.Of these 646 studies, 76 were excluded for being non-empirical (e.g., theoretical reviews, book reviews), book chapters, or unrelated.Of the remaining 570 sources, 477 were excluded for not utilising an established and readily available test.
In this SQLR, the term 'source' refers to a publication, either a journal article or a thesis, and the term 'study' refers to research where data were collected from a group of participants and used for analysis.A source can report on more than one study, with each study being recorded separately in the review data set, as long as the participants and/or the data differ between the studies.This situation arose where replications or follow-up studies had been conducted.VanPatten et al. (2013), for instance, reported on four related studies, each with a variation in research design, e.g. a different target grammatical construction, and with a different sample of participants.In total, the review covers 93 sources and 111 studies (for a complete list of references included in the review, see Appendix B).

| RESULTS
Following Pickering and Byrne's (2014) guidelines, the results of the review are structured as follows.We first consider the publication details, with a focus on journals, types of publication, and authors most prevalent in L2 aptitude research.We then explore the geographical spread of L2 aptitude research, looking at the institutional affiliation of the researchers conducting the studies as well as the sites of data collection.Demographic information about participants in L2 aptitude studies is considered, followed by a discussion of the research methods used in the original studies.Finally, we summarise key trends, challenges and future research directions.

| Publication details of L2 aptitude research
As evidenced by Table 3, The Modern Language Journal and Language Learning account for a combined 36.8% of all journal articles published on L2 aptitude research over the period of the review.Overall, the top six journals publishing L2 aptitude research account for 73.7% of all sources published in the review, excluding 17 dissertations.The remaining 15 journals have not published more than one or two studies each.
As Figure 2 shows, The Modern Language Journal has also been the most consistent publisher of L2 aptitude research over the period under consideration.From 2013, however, Language Learning and Studies in Second Language Acquisition were the most active journals publishing research on L2 aptitude.
Figure 3 indicates that the publication of journal articles from the 1960s to 1990 was sporadic, and only three dissertations were published during that period.During the 1990s, however, dissertations became an important source of research in the field, while a low number of articles continued to appear in SLA journals.This situation is changing, with a comparatively sharp increase in journal articles published since 2012 and a relative decrease in the number of dissertations.

Journal/publisher Number of studies Percentage of studies (%)
The

F I G U R E 1 Overview of the exclusion/inclusion process
Table 4 shows that the 14 most published authors account for 36.5% of authorship, while 102 of the 117 remaining authors have only one publication to their name.Note that for the purposes of our SQLR, authorship is individually assigned to each contributing author of a paper.Consequently, Abrahamsson and Hyltenstam (2008), for instance, counts twice: one publication for Abrahamsson and one for Hyltenstam.Eight of the top 14 authors were involved with Sparks and Ganschow's research agenda, which resulted in a number of multi-authored articles (see the references in Appendix B).While this counting may appear to skew the picture, Table 4 clearly evidences that 63.5% of contributions to the field are authored by researchers who have individually published just one or two empirical research papers on L2 aptitude.All of this results in wide variance in authorship that makes it difficult to meaningfully summarise the field.

| Geographical scope
The geographical scope of L2 aptitude research is summarised in two ways: the count of the countries where an author's research institute was located and a count of the countries where data were collected for each study.Table 5 shows the ordered counts of these two summaries side by side.Clearly, the USA (and North America more broadly) dominate research for both research institutes and data collection sites for L2 aptitude research.Specifically, authors of L2 aptitude research were affiliated with research institutes in the USA 149 times out of a total of 208, or 71.6%.The next most represented country, the United Kingdom, accounted for only 12, or 5.8%.A similar situation is evident for the countries where data were collected.Data for L2 aptitude research were collected from the USA 70 out of 116 times, or 61.9%.The next country from which data were most collected, Japan, accounted for 13, or 11.5%.
Clearly, most of what is known about L2 aptitude comes from the US context.L2 aptitude in the sources reviewed is heavily biased towards language learning in the USA, with the vast majority of  5 shows a slightly wider geographical range of countries from which data were collected than the institutional affiliations would suggest.

| Participants
The SQLR data across studies show significant heterogeneity in the participants' profiles.The average ages of participants across the studies ranged from 6.9 years (Sparks et al., 2009) to 44.1 years (Sheffield, 1993).Participants were only slightly more likely to be female (55.8%) than male (44.2%).While the most-reported educational level of participants was university studies (50.0%), around a third of participants had high school studies only (35.2%), and 0.8% (Dąbrowska, 2018) included participants with no formal education.Aside from English (48.4%), the most common L1s spoken by participants were Japanese (8.2%), Chinese (7.5%) and Spanish (5.7%), with a total of 31 L1s recorded as spoken by participants across all studies (see Supporting Information S3).The most common L2s being learned by participants were Spanish (21.0%),French (15.6%),English (15.1%) and German (9.7%), with a total of 37 L2s recorded as being learned by participants across all studies (see Supporting Information S4).
Plainly, the data collected from these participants are not representative of the general population.This is unsurprising, given the context and motivations driving the historical use of L2 aptitude tests.Further research into L2 aptitude could broaden the participant demographics' profiles, as well as improve the reporting practices on demographic information.

| Research methods
The research methods of L2 aptitude testing reflect its psychometric nature.Typically, studies are quantitative (93.7% of studies) rather than qualitative (1.8%) or mixed (4.5%).The orientation towards quantitative studies may be a result of the initial motivation of research in this area, which was diagnostic.
There is a tendency in L2 aptitude studies towards longitudinal (54.1%) rather than cross-sectional (40.5%) or mixed (5.4%) designs.Still, the two orientations are relatively balanced.The categories, however, are not clear cut and much depends on how the term 'longitudinal' is defined (see Appendix A). 7 The most typical studies in this category follow a design in which an initial L2 aptitude test is administered, an educational intervention is conducted, and correlations between aptitude tests scores and final L2 achievement scores are established (Wen et al., 2017).
Although studies in L2 aptitude tend to be observational (58.6%) rather than experimental (41.4%), the number of experimental studies is on the rise, as shown by Figure 4.
Studies of L2 aptitude have differed in their focus, the measures used for determining L2 aptitude, as well as in the dependent and independent variables taken into account.These differences will be discussed in the following subsections.We first consider the foci of L2 aptitude research as well as the measures used, before turning to consider the dependent and independent variables used in L2 aptitude research.Our analysis includes summary statistics and a summary of the trends over time for these areas in L2 aptitude research.

| Foci and measures of L2 aptitude research
In approximately two-thirds of the studies included in the SQLR, L2 aptitude has been used as an explanatory variable for L2 acquisition.As indicated in Table 7, only 30.6% of the studies have explicitly investigated the nature of L2 aptitude itself and/or of its components (e.g., Cox et al., 2019;Doughty, 2019;Granena, 2016;Li et al., 2019).
Figure 5 shows a shift in focus of L2 aptitude research included in this review.Studies only began to include L2 aptitude as an explanatory variable in the 1990s, with a marked increase in these types of studies occurring over the last decade.In contrast, research into L2 aptitude itself has been occurring since the late 1960s, although the number of studies with this focus has also increased since the 1990s.This trend in the focus of L2 aptitude-related research fits Dörnyei's (2010) observation that the 1960s to the 1990s was a time of research into L2 aptitude test development, with the period from the 1990s onwards, particularly the last decade, being marked by renewed interest in a range of research streams (see Wen et al., 2017).Evidently, research into L2 aptitude has shifted from a diagnostic/predictive focus to one of explanation.
Table 8 shows the usage of the different L2 aptitude measures, with a measure being counted each time it was used for analysis in a study.Some studies analysed the relationship between overall scores from an L2 aptitude battery with overall scores from an L2 achievement test and also analysed individual sub-tests of L2 aptitude with specific aspects of L2 achievement.In such cases, the study was counted twice in Table 8.For example, in Doughty (2019) overall L2 aptitude test scores (MLAT, Hi-LAB) were analysed against successful completion of L2 courses, while sub-tests were analysed as predictors of L2 skills, e.g.speaking and reading.In total, 50 unique measures of L2 aptitude were used in 228 instances (for the full table, see Supporting Information S5).
Table 8 clearly shows two aspects of these data worth noting: the dominance of the MLAT tests as a measure of L2 aptitude, and the use of overall versus specific measures of L2 aptitude.L2 aptitude research has been dominated by the MLAT, which accounts for 58.8% of all L2 aptitude measures included in L2 aptitude research. 8The LLAMA tests are the second most used measure of L2 aptitude, accounting for 19.7% of all measures of L2 aptitude in the research.
Since 2010, a marked rise in the use of the LLAMA test to determine L2 aptitude is apparent (see Figure 6).Importantly, however, this increase does not co-occur with a decrease in the use of the MLAT tests.Also worth noting is the increased usage over the decade of other measures of L2 aptitude, e.g.CANAL-FT (Grigorenko et al., 2000), PLAB (Pimsleur, 1966) and Hi-LAB (Linck et al., 2013).
The data on the measures of L2 aptitude also show an interesting trend in the use of composite scores -that is, a total score that is the sum of all individual tests in a test battery -and the use of individual test scores.We refer to these types of L2 aptitude measures as 'whole' and 'specific', respectively.Although the MLAT Long, which is a 'whole' measure of L2 aptitude, is the most commonly used measure, the use of specific L2 aptitude measures in research has dominated the field (see Table 9).While many studies included both whole and specific measures of L2 aptitude (e.g., Cummins & Gulutsan, 1975;Doughty, 2019;Sheffield, 1993;Winke, 2013), the inclusion of specific measures suggests that research has overall operationalised L2 aptitude as a componential construct (Wen et al., 2017).Figure 7 shows the trends in the use of whole compared with specific measures of L2 aptitude.Again, the trend in research over the last decade is towards asking more specific questions on L2 aptitude, with the increasing use of specific measures of L2 aptitude.

| Dependent and independent variables used in L2 aptitude research
Table 10 shows the different types of dependent variables used in the studies included in the review (see Appendix A for definitions for each type of dependent variable).Overall, the four most common dependent variables in L2 aptitude research account for 78.3% of all dependent variables.As expected, all four relate to measuring L2 achievement at a general or specific level, that is, (i) overall L2 achievement (21.7%), e.g.school grades (Muñoz, 2017); (ii) specific L2 skill (20.9%), e.g.L2 listening (Davies, 1971); (iii) specific L2 achievement (18.3%), e.g.L2 pronunciation and fluency (Saito, 2017); and (iv) specific L2 knowledge (17.4%), e.g.explicit L2 metalinguistic knowledge (Granena, 2014).The remaining 21.7% of dependent variables included in L2 aptitude research are quite varied and have typically been investigated only once.This focus on L2 achievement is to be expected given that L2 aptitude tests were originally designed to predict L2 achievement (Carroll, 1962).
Figure 8 evidences the trends in the use of the four most common dependent variables in L2 aptitude research for the period under review.It clearly indicates that although overall L2 achievement is still being researched consistently, the last decade has seen a rapid rise in research that includes the more specific types of dependent variables.Once again, the data suggest that rather than merely applying L2 aptitude testing to predict overall L2 achievement scores, L2 aptitude research is becoming more nuanced and focusing on more specific questions.
The six independent variables most frequently included in L2 aptitude research comprise half (50.1%) of all independent variable instances used in the reviewed L2 aptitude research (see Table 11).Another 53 independent variables represent the remaining 49.9% (see Supporting Information S6 for a full overview).A small number of independent variables thus accounts for the majority of research, while numerous independent variables have been considered in very few studies.The six most frequently investigated independent variables comprise variables that were considered in the initial period of test development from 1959 to 1990 (Dörnyei, 2010), and some that were proposed subsequently.Notably, although Carroll (1962) discussed intelligence explicitly and could be said to have discussed affective variables (e.g., attitude, anxiety and motivation) as a function of time spent on task (see Carroll, 1962), L1 skills and working memory were not identified as relevant factors at the time.Figure 9 makes apparent that L1 skills and working memory are more recent developments in L2 aptitude research.Figure 9 shows the trends in the use of the three most common independent variables in L2 aptitude research for the period of this review.Intelligence was the independent variable most consistently included in L2 aptitude research.While L1 skills were most prominent in L2 aptitude research during the 1990s, working memory has become the most researched independent variable of the three over the last decade.Thus, the data support the claim that L2 aptitude research has evolved markedly from Carroll's original four-factor construct (see Carroll, 1962).

| TRENDS, CHALLENGES AND FUTURE DIRECTIONS
The aim of this systematic review was to overview the field of L2 aptitude and document where research has been conducted and disseminated, by whom, on what aspects, by which methods, and what was found.By applying this method, we were able to identify shifts in perspectives and emerging trends, limitations of the findings, and gaps in the literature that merit further investigation.In what follows, we summarise key findings, identify some challenges and suggest areas for future research.

| Research on L2 aptitude is not 'dead'
The SQLR shows that, despite the limited interest that L2 aptitude has attracted in the SLA literature compared with other individual variables, research has continued to be published, al- Trends in the use of whole versus specific measures of L2 aptitude, 1959-2019 beit sporadically, until the 1990s.As seen in Figure 3, that decade saw a resurgence of interest in L2 aptitude research.Interestingly, this resurgence occurred initially in PhD dissertations.More recently, there has been a comparatively sharp increase in the number of journal articles published.Indeed, special issues published since the cut-off date of this review (see Doughty & Mackey, 2021;Li & DeKeyser, 2021) and edited books (e.g., Wen et al., 2019) show the upward trend of research into L2 aptitude continues.This trend may stem from a renewed interest in L2 aptitude, focused on more nuanced questions and novel methodologies, as shown in this overview.

| Research on L2 aptitude is geographically and demographically limited
From the SQLR, L2 aptitude research can be seen to have a comparatively long, but geographically and demographically limited history.Geographically, L2 aptitude research is heavily biased towards North America, particularly the USA and, to a lesser degree, Canada (see Table 5).Two explanations are possible: (1) that this review was limited to English language journals 9 ; and (2) that compared with other countries, there are more academics conducting L2 aptitude research in the USA, typically on data collected from North American participants.Of these participants, the majority is L1 English-speaking high school or university students learning foreign/second languages.Clearly, this data set cannot be considered representative of the wider population of language learners, of which most are not L1 English speakers learning languages at high school or university.Indeed, the question needs to be asked if the published research findings on L2 aptitude are generalisable to diverse populations.
The most frequently included independent variables in L2 aptitude studies

| Reporting practices are patchy
Participants in this research are poorly understood, as the reporting of their background characteristics is inconsistent.If the background characteristics across samples are unknown, then true comparisons are not possible and thus meta-analyses would be unfeasible based on these data.This challenge, in turn, compromises the interpretability of the results.For example, socioeconomic status is considered an important variable in educational outcomes (Reardon et al., 2014), yet only 16.2% of all studies report on this variable.Additionally, less than 60% of the studies reported on previous language learning experience, despite findings that suggest this experience influences L2 aptitude test scores (Rogers et al., 2017).More research is needed to better understand which background factors are relevant in studying L2 aptitude (Rogers et al., 2016(Rogers et al., , 2017)).
Increases in the consistency and details of reporting background characteristics of participants would be especially useful if L2 aptitude research expands beyond the current focus on L1 English-speaking, high-school and university aged students in North America.From our SQLR on L2 aptitude research we can see that the field is of primary interest to SLA researchers, as most studies have been published in language learning focused journals (see Figure 2), which is understandable given the original motivation of the field.Moreover, the field of L2 aptitude research appears to be a fractured one, with little consensus or long-term research agendas.Notable exceptions in the period under review are Sparks, Ganschow and their colleagues.
Curiously, the fracturing of the field is further evidenced by the fact that individual researchers who have published PhD theses on L2 aptitude do not appear to have published subsequent research articles in the field.This suggests that most researchers do not focus on L2 aptitude itself, but rather include L2 aptitude as one of the variables of interest in their studies.

| Research on L2 aptitude is overwhelmingly quantitative
The SQLR makes it clear that, methodologically, L2 aptitude studies are overwhelmingly quantitative, focusing on comparing L2 aptitude scores with other predictors of L2 achievement.The data are thus correlational, examining statistical relationships between aggregated measures of L2 aptitude, L2 achievement, and to some extent intelligence, attitudes and motivation.
Although L2 aptitude research tended to focus on aggregate scores for a battery of tests, more recent research examines specific tests addressing one area of L2 aptitude to explain outcomes in specific areas of L2 learning.Coextensive with this recent research tendency is the increasing use of a range of L2 aptitude tests other than the MLAT.The most well-known of these are the LLAMA tests, which are free and computer-based (Rogers et al., 2017), although a range of other tests also exist.Notably, the development of new L2 aptitude tests is only partially captured in our review.The most prominent of these is the Hi-LAB (Doughty, 2014), which purports to predict high levels of L2 ultimate attainment with tests that measure various aspects of memory (with a particular focus on working memory), implicit learning, processing speed and auditory perceptual acuity (see also Linck et al., 2013).Significantly, all these new measures in L2 aptitude testing continue to be quantitative, reflecting the psychometric origins of L2 aptitude research.

| Variables investigated
L2 aptitude research has changed its orientation from a predictive to an explanatory focus, which co-occurs with a renewed interest in the field of L2 aptitude research in SLA studies.Variables more recently investigated go beyond Carroll's (1962) original four-factor construct of L2 aptitude and related variables, that is, intelligence and motivation, to include new and important factors such as L1 skills and working memory (see Figure 9).Recent research identifies that a relationship between these variables does exist and is worthwhile investigating further (Sparks & Patton, 2013), given their potential theoretical and practical implications.Theoretically, the focus on new variables should lead us to interrogate precisely what comprises L2 aptitude and to reconsider its constitutive components.This may necessitate the development of new testing instruments.Practically, a reconsideration of the components of L2 aptitude might lead to the identification of strengths and weaknesses in individual learners' profiles, thus enabling a more personalised approach to language instruction.

| CONCLUDING REMARKS
The aim of this article was to review 60 years of research into L2 aptitude and to examine whether this construct continues to be viewed as an individual variable of little interest to SLA research.To address this question, we conducted a systematic review of 60 years of empirical research in this field.
The SQLR has shown that, despite popular perceptions, research into L2 aptitude is in fact alive and growing, both in volume and sophistication.Having initially been understood as a four-factor construct, L2 aptitude is currently being interrogated for its potential multifactorial nature.
Arguably, one issue worthy of further research is the relation that may pertain between L2 aptitude and other complex individual variables, such as motivation (Dörnyei, 2010).Another issue that merits investigation is the potential contribution of working memory to L2 aptitude.Indeed, some current research goes as far as to largely conflate working memory with L2 aptitude (Singleton, 2017;Wen, 2016).This in turn raises the intriguing possibility that L2 aptitude could in fact be amenable to training, a prospect which directly contradicts the notion that L2 aptitude is an immutable gift of the fortunate few.
Quantitative = numerical data representing predetermined logical scales of values, for example, aptitude test;

Category Term Definition
Qualitative = verbal categories of data determined by examining the data after collection; Mixed = where both these types of data were included, for example, testing followed by interviews.

Collection
The collection of data was categorised according to Dörnyei (2007):

Location of data collection
The country in which the data was collected as reported in the study.Up to four locations were recorded for one study.

L2s
Languages being studied by the participants.Other options include: Artificial referred to studies where a modified language system was used, whether it be complete or not.
Not reported referred to those studies that did not detail the second language being learned by participants.
None referred to when no second language was being learned by participants.
L2 aptitude focus L2 aptitude in study Each study was categorised as being one of three types:  Control variables Any variables used to limit the sample of participants in the study, control for differences among different groups of participants, or used as a covariant.
Type of independent variable (IV) The name of the explicit independent variable used in the analysis.

Measure of L2 aptitude
The name of the specific instrument(s) used to measure L2 aptitude or L2 aptitude abilities.This includes the standardised and common tests of L2 aptitude, their adaptations (e.g.translations), or instruments explicitly designed to measure L2 aptitude or an L2 aptitude ability.
T A B L E A 1 (Continued)

F
I G U R E 2 Top journals that published L2 aptitude research, 1959-2019

F
I G U R E 3 Number of publications on L2 aptitude between 1959 and 2019 T A B L E 4 Most active authors publishing L2 aptitude research 1959-2019 (for the full table of all authors, see Supporting Information S2) Geographical scope of L2 aptitude research in terms of countries where authors' research institutes were located and countries from which data were collected research institutions and data collected in the United States of America.Despite the heavy concentration of research published by scholars with US affiliations, Table Trends in the use of observational versus experimental study designs, 1959-2019 Focus of L2 aptitude in the research

F
I G U R E 5 Trends in the use of L2 aptitude in the research, 1959-2019

F
I G U R E 6 Trends in the use of L2 aptitude test measures in L2 aptitude research, 1959Use of whole versus specific measures of L2 aptitude across all studies

F
Trends in the use of the top three independent variables in L2 aptitude research, 1959-2019 4.4 | Interest in L2 research is restricted and fractured and development' = IF qualitative data that reports on the L2 learning experience and/or development of participants is used to categorise participants with L2 aptitude test scores used to explain/predict these groupings

Dependent variable type Number of studies Percentage of studies (%)
T A B L E 1 0 Dependent variables in L2 aptitude studies F I G U R E 8 Trends in the use of dependent variables in L2 aptitude research, 1959-

Category Term Definition 'Control variable for selecting/comparing participants
ID variable explaining outcome' = IF L2 aptitude is NOT the focus of the study AND L2 aptitude is a variable of interest to explain the results of the study; 'L2 aptitude itself' = IF the study develops a new instrument for measuring L2 aptitude OR IF the study aims to expand our understanding of L2 aptitude, especially in relation to other predictors of L2 aptitude, for example, working memory, attitude/motivation, etc.; ' = IF L2 aptitude test scores from different treatment/participant groups in the study are analysed for statistically significant differences before the main analysis is carried out, especially if referred to as a covariate between the groups.L2 aptitude studies Design The level of interaction with participants the researchers designed into the study, categorised as: experimental = where the study explicitly introduced a specific treatment and then compared these outcomes against a control; observational = where the study collected data from participants' normal course of L2 learning.Dependent variables The dependent variable was determined by the statistical test used for the analysis, for example, the response variable in a general linear model.Categories covered: 'overall L2 achievement' = IF L2 achievement is assessed as a whole without a focus on a single specific skill or area of knowledge, especially if L2 test scores are aggregated into one measure; = IF L2 aptitude test scores were used to predict standardised test scores of L1 skills and knowledge, mainly found in US studies of primary and high school children conducted by Sparks, Ganschow, and colleagues; T A B L E A 1(Continued)