Interventions for improving executive functions in children with foetal alcohol spectrum disorder (FASD): A systematic review

Abstract Background The consequences for children born with birth defects and developmental disabilities encompassed by foetal alcohol spectrum disorder (FASD) are profound, affecting all areas of social, behavioural and cognitive functioning. Given the strong evidence for a core deficit in executive functioning, underpinned by impaired self‐regulation skills, there has been a growing focus on the development of interventions that enhance or support the development of executive functions (EFs). Objectives The primary objective of this review is to synthesise the evidence for structured psychological interventions that explicitly aim to improve EF in children. The review also sought to ascertain if the effectiveness of interventions were influenced by characteristics of the intervention, participants or type of EF targeted by the intervention. Search Methods Sixteen databases, 18 grey literature search locations and 9 trial registries were systematically searched to locate eligible studies (up to December 2020). These searches were supplemented with reference harvesting, forward citation searching, hand searches of topic‐relevant journals and contact with experts. Selection Criteria Studies were included in the review if they reported on an impact evaluation of a psychological intervention aiming to improve EF in children 3–16 years who either had confirmed prenatal alcohol exposure or a formal diagnosis falling under the umbrella term of FASDs. Eligible study designs included randomised controlled trials (RCTs) and quasi‐experimental designs with either no treatment, wait list control or an alternative treatment as a comparison condition. Single‐group pre‐post designs were also included. Data Collection and Analysis Standard methodological procedures expected by the Campbell Collaboration were used at all stages of this review. Standardised mean differences (SMDs) were used to estimate intervention effects, which were combined with random effects meta‐analysis (data permitting). Risk of bias was assessed using the Cochrane Risk of Bias Tool (RoB2) and Cochrane Risk of Bias in Non‐Randomised Studies‐Interventions tool (ROBINS‐I). Main Results The systematic search identified 3820 unique records. After title/abstract and full‐text screening, 11 eligible studies (reported in 21 eligible documents) were deemed eligible, with a combined 253 participants. Of the 11 studies, 6 were RCTs, 1 was a quasi‐experiment and 4 were single‐group pre‐post intervention designs. All studies were rated as having an overall high or serious risk of bias, with some variation across domains for RCTs. For RCT and quasi‐experimental studies, the overall effect of EF interventions on direct and indirect measures of EF generally favoured the experimental condition, but was not statistically significant. There was no difference between intervention and comparison groups on direct measures of auditory attention (k = 3; SMD = 0.06, 95% confidence interval [CI] = −1.06, 1.18), visual attention (k = 2; SMD = 0.90, 95% CI = −1.41, 3.21), cognitive flexibility (k = 2; SMD = 0.23, 95% CI = −0.40, 0.86), attentional inhibition (k = 2; SMD = 0.04, 95% CI = −0.58, 0.65), response inhibition (k = 3; SMD = 0.47, 95% CI = −0.04, 0.99), or verbal working memory (k = 1; d = 0.6827; 95% CI = −0.0196, 1.385). Significant heterogeneity was found across studies on measures of auditory attention and visual attention, but not for measures of cognitive flexibility, attentional inhibition or response inhibition. Available data prohibited further exploration of heterogeneity. There was no statistical difference between intervention and comparison groups on indirect measures of global executive functioning (k = 2; SMD = 0.21, 95% CI = −0.40, 0.82), behavioural regulation (k = 2; SMD = 0.18, 95% CI = −0.43, 0.79), or emotional control (k = 3; SMD = 0.01, 95% CI = −0.33, 0.36). Effect sizes were positive and not significant for meta‐cognition (k = 1; SMD = 0.23, 95% CI = −0.72, 1.19), shifting (k = 2; SMD = 0.04, 95% CI = −0.35, 0.43), initiation (k = 1; SMD = 0.04, 95% CI = −0.40, 0.49), monitoring (k = 1; SMD = 0.25, 95% CI = −0.20, 0.70) and organisation of materials (k = 1; SMD = 0.25, 95% CI = −0.19, 0.70). Effect sizes were negative and not statistically different for effortful control (k = 1; SMD = −0.53, 95% CI = −1.50, 0.45), inhibition (k = 2; SMD = −0.08, 95% CI = −0.47, 0.31), working memory (k = 1; SMD = 0.00, 95% CI = −0.45, 0.44), and planning and organisation (k = 1; SMD = −0.10, 95% CI = −0.55, 0.34). No statistically significant heterogeneity was found for any of the syntheses of indirect measures of EF. Based on pre‐post single‐group designs, there was evidence for small to medium sized improvements in EF based on direct measures (cognitive flexibility, verbal working memory and visual working memory) and indirect measures (behavioural regulation, shifting, inhibition and meta‐cognition). However, these results must be interpreted with caution due to high risk of bias. Authors' Conclusions This review found limited and uncertain evidence for the effectiveness of interventions for improving executive functioning in children with FASD across 8 direct and 13 indirect measures of EF. The findings are limited by the small number of high‐quality studies that could be synthesised by meta‐analysis and the very small sample sizes for the included studies.

1 | PLAIN LANGUAGE SUMMARY 1.1 | Limited evidence on interventions to improve executive functions in children affected by prenatal alcohol exposure This review found limited evidence for the effectiveness of interventions designed to improve executive function (EF) in children with prenatal alcohol exposure (PAE). However, the ability to detect overall effects is severely limited by a lack of large comparison studies in the area.

| What is this review about?
Foetal alcohol spectrum disorder (FASD) is a preventable, lifelong disability which poses significant cost to individuals, families, and societies. Despite growing evidence exploring the profile of FASD, evidence on effective treatments of the condition is sparse.
Given prevalence estimates of around 5% in the general population, and higher in vulnerable samples, rigorous and comprehensive evaluation of feasible interventions for FASD are a necessity.
The focus of this review is studies which assess the impact of psychological interventions designed to improve EF in children impacted by PAE. Eligible outcomes include any measure of EF as per contemporary popular models of the construct (discussed in detail in methodology section of the systematic review). Examples include: attention, working memory, cognitive flexibility, inhibition, planning and organisation.

| What is the aim of this review?
This Campbell systematic review examines the impact of interventions designed to improve executive functioning in children with a history of prenatal alcohol exposure. The review summarises evidence from eleven studies, including seven treatment-comparison studies and four single-group studies, with pre-post outcome assessments.

| What studies are included?
Eleven studies are included, with only seven randomised or highquality quasi-experimental studies, and four single group prepost intervention studies. The randomised and quasiexperimental studies are synthesised using meta-analysis. Of these seven studies, all were carried out in either Canada or the USA.
1.5 | What are the main findings of this review?
Overall, the studies have important methodological weaknesses that temper the review findings, most notably very small sample sizes or incomplete reporting.
While the pattern of results arising from the synthesis generally suggest positive intervention effects, the analyses did not reach statistical significance. There appear to be no statistically significant differences between children with PAE who participated in EF interventions versus those who did not on the following direct measures of EF: auditory attention, visual attention, cognitive flexibility, attentional inhibition, response inhibition, verbal working memory, or planning.
Similarly, there appear to be no statistically significant differences between children with PAE who participated in EF interventions versus those who did not on the following indirect measures of EF: global executive function, behavioural regulation, emotional control, shifting or inhibition.

| What do the findings of this review mean?
Only a small number of eligible comparison group studies are included in the present analyses (n = 7), and it is likely that relatively small sample sizes hinder detection of effects across outcomes.
The findings of this review therefore illustrate the need for a greater number of high quality comparison studies with larger sample sizes. This will allow for more definitive conclusions to be drawn regarding the overall effectiveness of interventions for EF in children with FASD.

| How up-to-date is this review?
The review authors searched for studies up to December 2020.
2 | BACKGROUND 2.1 | Description of the condition Prenatal alcohol exposure (PAE) is associated with profound and lifelong disability. The umbrella term foetal alcohol spectrum disorder (FASD) describes a spectrum of impairments resulting from the deleterious effects of PAE (Chudley et al., 2005). recently, the Canadian and Australian guides for the diagnosis of FASD have been published (Bower et al., 2016;Cook et al., 2016).
While the core aspects of the condition remain unchanged, the Australian and Canadian guides allow two diagnostic categories for FASD: (i) FASD with three sentinel facial features; or (ii) FASD with less than three sentinel facial features and an additional 'At Risk' classification if there is insufficient evidence or uncertainty about the appropriateness of the neurodevelopmental assessment.
While information on prevalence rates remains a significant challenge (Roozen et al., 2016), estimates have been as high as 5% in general population studies in the United States of America (USA) . Further, evidence suggests this rate may be higher in subpopulations such as First Nations people (Burd & Moffatt, 1994;Fitzpatrick et al., 2015;Popova et al., 2017) and children involved with the child protection system (Lange et al., 2013). Understanding how to support children with a diagnosis of FASD is particularly important given that the condition has been linked to poor outcomes across a range of developmental domains (Mattson et al., 2019). The condition is linked to a host of poor life outcomes, including increased contact with the justice system, substance misuse (Burd et al., 2010), antisocial and delinquent behaviour, learning disabilities, externalising and aggressive behaviour, as well as a range of other adaptive functioning and mental health problems (Bower, 2016;Kodituwakku Piyadasa, 2009;Rasmussen et al., 2008). Although there are no agreed upon usual treatments following a diagnosis of FASD, much of the clinical literature acknowledges the importance of ensuring families and caregivers have an understanding of how FASD can impact on children's behaviour and cognitions (Reid, 2015).
There is growing evidence that a core deficit underpinning many of these adverse outcomes is impairment in executive functions (EFs) (Khoury et al., 2015;Kodituwakku, 2009). EFs are higher-order mental processes which allow individuals to deploy attention strategically, hold and manipulate goal-relevant information, and consciously enforce goal-directed behaviour (Baggetta & Alexander, 2016;Diamond, 2013). According to Diamond, EFs comprise three core abilities: (i) inhibitory control-the ability to inhibit prepotent responses of attention, behaviour, thoughts and/or emotions in favour of what is appropriate or necessary; (ii) working memory-the ability to hold information in mind and manipulate it in goal-directed ways; and (iii) cognitive flexibility-the ability to solve problems using different perspectives or rules as they arise. Different combinations of these core EFs produce a range of higher-order manifestations, including reasoning, problem-solving, planning and directing attention (Diamond, 2013).
Children with FASD have frequently shown impairment compared to typically developing children across a wide range of both core EFs and higher order manifestations (Rasmussen, 2005).
Deficits have been found on cognitive inhibition, verbal and nonverbal fluency (Schonfeld et al., 2001), use of attentional strategies, planning (Green et al., 2009), visual attention, spatial working memory , behavioural inhibition, the ability to form complex concepts and cognitive flexibility . Research also suggests that EF impairments are often more severe than would be suggested by IQ deficits alone (Connor et al., 2000).
Importantly, EF deficits may underlie many of the poor outcomes for individuals with FASD. For example, FASD is associated with a higher rate of attention deficit and hyperactivity difficulties (Rasmussen et al., 2010), and research has demonstrated that impaired EF may underpin ADHD. Impaired executive functioning has also been linked to a range of other outcomes found in FASD populations, including: autism (Geurts et al., 2004), obesity (Crescioni et al., 2011), lower quality of life (Davis et al., 2010), poorer school readiness (Shaul & Schwartz, 2014), impaired school performance (Visu-Petra et al., 2011), financial problems, criminal behaviour and substance misuse (Moffitt et al., 2011). Consequently, addressing EF deficits in FASD populations may offer an important opportunity to improve both individual and societal outcomes.
Targeting children is particularly important as gains may be made with appropriate early intervention strategies. Indeed, there have been a number of studies evaluating the effectiveness of interventions aimed to improve EFs in children with FASD (e.g., Coles et al., 2015;Leung et al., 2015;Reid et al., 2017). However, a critical gap in the literature is the absence of an updated, comprehensive systematic review and meta-analysis in the area. Given the compromised outcomes associated with EF deficits, and the frequency of EF impairment in children with FASD, a rigorous synthesis of the effectiveness of available interventions offers great value to practitioners, individuals with FASD and their families.

| Description of the intervention
This review focuses on synthesising evaluation evidence for structured psychological interventions that explicitly aim to improve EF in children using either (i) a face-to-face format, (ii) computerised format, or (iii) both formats. in individual or group format, and/or where the intervention is administered either (i) directly to children (e.g., working memory training) or (ii) to children and caregivers/ families (e.g., the GOFaR program).

| Child-centred interventions
A common intervention of this type is specific or targeted EF training. Training interventions are the intentional teaching of skills through repetitive experience aimed at restoring or improving functions and are generally completed in the presence of a qualified practitioner. These interventions can be delivered face-to-face  or as computerised training programs (Holmes et al., 2010). Tasks generally aim to improve skills by offering practice of specific abilities unique to EF domains (Holmes et al., 2010). These interventions often involve working through levels of increasing difficulty, allowing for continual optimisation of training impact (Klingberg et al., 2005). Childcentred interventions can also be longer-term programs whereby children work their way through a pre-defined structure over a number of sessions (e.g., the Alert Program; .
These programs use similar mechanisms to short-term training, relying on experiential activities to restore or improve functions.

| Parental involvement in EF training
In addition to focusing on the development of EFs in children, some programs provide instruction and support to parents and caregivers. This additional component can sit alongside the EF focus and be provided either concurrently in group or individual formats to parents and caregivers or as separate program components. For example, Coles et al. (2015) supplemented computerised training for children with FASD with a parental workshop on how PAE impacts neurodevelopment. Parents participated in group sessions that were run concurrently with their children's computer training. This aspect of intervention is often designed to improve parents' working knowledge of their child's neurodevelopmental functioning, facilitating the provision of more effective behavioural scaffolding from parent to child.

Scaffolding of behaviours by parents and caregivers have been
shown to support the generalisation of skills and provide children with an opportunity for support and repeated practice (Hammond et al., 2012).

| Previous relevant reviews and knowledge-gaps
In recent years, increasing awareness of the prevalence of FASD across the world has sparked new interest in trialling interventions designed to ameliorate the deleterious effects of PAE on functioning.
A number of recent systematic reviews have been published in the areas of FASD prevalence Roozen et al., 2016), comorbidity (Popova et al., 2016) and to assess the impact of treatment (Peadon et al., 2009;Reid et al., 2015). The most recent review by Reid et al. (2015) provided a synthesis of the effectiveness of treatment interventions for FASD that included an assessment of methodological quality of FASD intervention studies across the lifespan. This narrative review included interventions targeting parenting skills, self-regulation and attentional control, mathematics skills, nonverbal reasoning and social skills.
A limitation of Reid et al.'s (2015) review is the lack of a quantitative synthesis of effect sizes. By providing a meta-analysis in parallel to the review, more precise conclusions can be provided regarding the overall effectiveness of interventions that address a core deficit in children with FASD. The current review will therefore provide both an updated review of interventions aimed at enhancing EFs, and a quantitative synthesis of these studies. A secondary objective of this review is to examine whether the effectiveness of interventions vary by a range of factors, including: 1. Variation in the number, setting, delivery and intensity of program components 2. Variation in program participants (e.g., gender, age, comorbid diagnoses, level of PAE) 3. Variation in the type of EF targeted by the intervention.
A systematic review protocol has also recently been published by Singal (2018) aiming to review effectiveness of interventions in FASD populations. The review intends to include meta-analysis if possible, however aims to evaluate any outcome pertaining to children's physical and mental health as well as cognitive, behavioural and social skills. Whilst this review will provide a comprehensive and broad examination of the extant evaluation literature, a more focused review in a single domain affected by PAE will provide a concise synthesis of the existing evaluation studies and provide clear directions for policy and practice. By focusing on one domain of functioning (EF) and psychological interventions common in FASD practice and research, this review seeks to provide a more targeted synthesis of available literature.
Thus, the current review will be the first to provide a comprehensive synthesis of the size of the effect of EF interventions for children with FASD.

| Practice and policy relevance
Globally, there have been a range of initiatives aimed at preventing FASD through increased public awareness of the effects of PAE.
Concurrently, there has been development of clearer diagnostic processes (e.g., Astley & Clarren, 2000;Bower et al., 2016;Chudley et al., 2005). This review supports major initiatives and policy as outlined in key government documents. For example, the Australian Government's Commonwealth Action Plan seeks to 'manage the impact of a diagnosis of FASD on the individual and the family to ensure the child and the family are supported through and after the diagnostic process' (Australian Government, 2014, p. 3). Similarly, the More broadly, this review will provide evidence that will build the capacity of practitioners and policy-makers to make informed choices regarding treatment and pathways of care for FASD children globally.
It is also hoped that the results of this review will aid policy-makers tasked with drafting formal public health policy in the area of the impact of alcohol use on citizen health and welfare. This will ultimately drive better outcomes and improve supports as set out in the various government policy agendas. This review included randomised controlled trials (RCTs) where participants were randomly allocated to an experimental or comparison treatment condition, and also randomised cluster control trials where pre-defined clusters of groups were randomised to the experimental or comparison condition. This review also included quasi-experimental designs where participants were allocated to conditions in a manner other than randomisation (e.g., matched comparison group designs). In all of these designs, the experimental or treatment group refers to eligible participants who took part in the intervention designed to improve executive functioning, and the comparison group refers to those who were allocated to a wait-list control or to receive, treatment-as-usual, no treatment or an alternative treatment.

| OBJECTIVES
Due to the burgeoning nature of this area of research, singlegroup pre-post designs were also included in the review. These designs are, however, subject to concerns regarding bias and are therefore analysed separately to RCTs and quasi-experimental studies (Deeks et al., 2020).

| Types of participants
Eligible participants were children aged 3-16 years who had either (i) a formal diagnosis of FAS, FASD, pFAS, ARND or 'at risk of FASD' using any of the following diagnostic systems: The Institute of Medicine Diagnostic system (Hoyme, 2005), The Washington 4-Digit Code (Astley & Clarren, 2000), The Canadian Guidelines (Chudley Albert et al., 2005) or the Australian Guide (Bower, 2016); or (ii) classified as having FAS based on facial dysmorphology alone; or (iii) confirmed or suspected PAE (light, moderate or heavy dosages). Children with a sole diagnosis of ARBD were excluded, as this condition is not associated with neurological deficits .
The minimum age range of 3 years was selected as it is possible to measure EFs at this age (Wiebe et al., 2011). Where the sample included some children that fell outside of the specified age range, the protocol (Betts et al., 2019) specified that study authors would be contacted to request data pertaining only to children within the age range. However, the review did not identify any studies with this issue. In future updates of this review, the procedure specified in the protocol will be followed should this issue occur.
There were no geographical restrictions on study location, however, variability among different countries/cultures that can impact diagnoses were anticipated. As such, the review used clear, formal diagnostic criteria to guide study inclusion. The protocol (Betts et al., 2019) specified that studies that used one of the four formal diagnostic frameworks mentioned above would be synthesised separately to studies where participants were included based solely on confirmed PAE or facial dysmorphology.
The review does not separate analyses in this way due to the small number of included studies and the likelihood that the results would be piecemeal and less meaningful. In future updates of this review, the procedure specified in the protocol will be followed should there be sufficient studies to support this approach.

| Types of interventions
Studies were included if the focus of the intervention was to improve EFs in children with FASD. Interventions need to be structured psychological interventions that explicitly aimed to improve EFs in children using either (i) a face-to-face format, (ii) computerised format, or (iii) both. The review included interventions delivered in individual or group format, and/or where the intervention was administered either (i) directly to children (e.g., working memory training) or (ii) to children and caregivers/ families (e.g., the GOFaR program). Studies where participants received both psychological interventions and pharmacological interventions concurrently as part of a focal treatment group were excluded.
The protocol for the review (Betts et al., 2019) specified that interventions needed to take place in a health clinic (e.g., medical, psychological clinic), school setting or in a home-therapy setting. This inclusion threshold was relaxed for the review as some studies did not report the intervention setting but met all other inclusion criteria for the review.  (2013) and Miyake et al. (2000); a set of higher order cognitive functions which combine to create abilities such as planning, organisation, attentional control and emotional regulation. The definition and categorisation of EFs, and the measures assessing them, tend to vary across the literature. For example, inhibitory control, working memory and cognitive flexibility are considered core EFs. Yet some research does not label outcomes with this exact terminology. There also tends to be variation in the extant literature regarding whether processes are EF or manifestations of EFs (e.g., reasoning, self-regulation, behavioural regulation, problem-solving, planning and attention). Therefore, this review took a comprehensive approach by including studies where the authors explicitly referred to the outcomes as EFs, explicitly measured core EFs (inhibitory control, working memory, cognitive flexibility) or included measures that some researchers consider to be manifestations of EFs (e.g., reasoning, self-regulation, problem-solving, planning and attention). Where it was unclear if the outcomes reported in a study were EFs, the protocol (Betts et al., 2019) specified that study authors would be contacted to verify the eligibility of the outcomes.
This was not required for any studies captured by the review.
Studies were included if the outcome data was gathered using either standardised direct assessment of EF or through indirect parent/teacher reports of EF. Common standardised measures of EF in children include the following (not exhaustive): 1. Tests of variables of attention (T.O.V.A), a direct neuropsychological assessment that measures attention while screening for ADHD (Leark et al., 1996).
2. NIH toolbox, a direct assessment battery which provides measures of a range of cognitive abilities and EF in children aged 3 years+ (Gershon et al., 2013).
3. NEPSY-II (attention and EF domains), which is a direct standardised assessment consisting of 32 sub-tests for use in neuropsychological assessment with pre-schoolers, children and adolescents (Brooks et al., 2009). 4. Behaviour Rating Inventory of Executive Function (BRIEF-P/ BRIEF-2), which is a standardised indirect psychological assessment tool that measures EFs in pre-schoolers (Gioia et al., 2003) and children (Gioia et al., 2003(Gioia et al., , 2016. 5. Child Behavioural Checklist (CBCL; attention and ADHD subscales), which is a parent-or teacher-completed indirect measure that includes broad competencies, adaptive functions and problems in children (Achenbach & Rescorla, 2001).
After identifying the final corpus of eligible studies, outcomes were initially separated in to either direct or indirect measures of EF to separate synthesis. This dichotomy reflects a distinction in the measurement of EF between performance, task-related data (e.g., NEPSY tasks; direct measurement) and that of itemised, selfor other-reported data (e.g., BRIEF questionnaire; indirect measurement). These forms of measurement were separated as there is evidence demonstrating poor convergence between direct and indirect measures of EF across a range of populations (e.g., Toplak et al., 2008) and in children with FASD (Gross et al., 2015;Mohamed et al., 2019). Thus, these measures were separated to ensure that only homogenous outcomes were synthesised together.
Once separated into direct and indirect measures, outcomes were further categorised into conceptually analogous EFs.
Terminology in the EF literature is often used interchangeably, at times producing unclear distinctions between different function classifications. This is compounded by variation in the names of assessment measures and sub-tests which can measure different components of EFs. To ensure studies were classified and synthesised accurately, the specific tasks assessed for each outcome measure reported in an eligible study were assessed against the conceptual definitions and guidance provided by Diamond (2013) and Miyake et al. (2000). Following this classification, a qualified neuropsychologist was then consulted on the categorisations and various psychometric papers were then used to further refine EF classifications. The final classification of each outcome included in this review is provided (see Table 1).
Studies with any length of follow-up were included in the review. To synthesise data from studies with different lengths of follow-up, studies were classified into the following time-frames analysed separately: short (0−3 months), medium (>3−6 months) and long-term follow-up (>6 months).

| Secondary outcomes
No secondary outcomes were included in the review. Children are presented with series of ghosts of different colours in different spatial locations and press a key if two consecutive ghosts are the same colour (duration = 10 min). Errors of omission (not pressing the key when appropriate) are considered measures of inattention. Standard scores are used in scoring and interpretation, whereby higher scores equate to lower levels of inattention.

Attention (visual)
KiTAP (The owls, omission errors) Children are presented visual and auditory stimulus and press a key when they either hear an owl make an 'incorrect' sequence of tones or when the owl on the screen closes its eyes. Errors of omission (not pressing the key when appropriate) are considered measures of inattention. TEA-Ch (Sky Search, DT) Children are presented with a visual array of similar objects (flying spaceships) and identify perceptually similar, but unique types of spaceships. This required discrimination of target objects among similar distractors. Children are also required to circle similar pairs of objects to control for the impact of motor speed.

Attention (visual)
TEA-Ch (creature counting) Children are presented with 'creatures' in a tunnel and required to count them. When they encounter an arrow, they are required to verbalise the direction (up or down) and change the direction of their count. Down arrows signify that children need to count backwards from the previous count. Counting time and number of correct responses are considered indicators of performance for this measure.
Cognitive flexibility NEPSY (inhibition/switching) Children are presented with a page of black arrows/shapes and are required to verbally name the opposite direction/shape (e.g., circle/square) when the stimulus is coloured white or the correct direction/shape when coloured black.
Cognitive flexibility CANTAB (intra/extra dimensional shift task) The task involves presentation of stimuli which vary on either one (initially) or two (later trials) dimensions (e.g., white lines of different shapes, then white lines overlaid on various pink shapes). The rule which determines which stimulus should be selected is altered after six correct responses.
Cognitive flexibility NIH toolbox (dimensional change card sort) Children are presented with sets of cards which vary along two dimensions (colour and shape). In each trial, children are instructed to select a target that corresponds to a specific dimension. Rules for selection are changed seemingly randomly throughout the task.

Cognitive flexibility
KiTAP (Ghost's Ball, commission errors) Children are presented with series of ghosts of different colours in different spatial locations and press a key if two consecutive ghosts are the same colour. Commission errors (pressing the key when not appropriate) are considered a measure of response inhibition. Standard scores are used in scoring and interpretation, whereby higher scores equate to lower levels of inhibition.

Response inhibition
KiTAP (The owls, commission errors) Children are presented visual and auditory stimulus and press a key when they either hear an owl make an 'incorrect' sequence of tones or when the owl on the screen closes its eyes. Commission errors (pressing the key when not appropriate) are considered a measure of response inhibition.

Response inhibition
KiTAP (Happy-Sad Ghost, commissions) Children are presented with visual representations of ghosts and required to press a key when ghosts appear frowning, but not when smiling. They are simultaneously required to ignore distracting stimuli in the periphery. ANT-C (incongruent) Children are presented with either a cue (informing the child about the timing, location or both of a target) or a blank screen (depending on condition). Following this, a fish (the target) is presented above or below fixation and is surrounded by identical fish pointing in the same direction (congruent trial) or the opposite direction (incongruent trial) to the target fish. Children are required to hit a button to specify which direction the target is facing. Incongruent trials are considered a measure of response inhibition.

Response inhibition
KiTAP (Dragon's Castle, commissions) Participants must allow blue and green dragons into the castle in alternating colour order by pressing one button for each colour, but the dragons appear in alternating locations, requiring the children to flexibly monitor which button to press. Commission errors occur when children press the button inappropriately and are considered a measure of response inhibition.

Response inhibition
TEA-Ch (Map Mission) Children are presented with a map filled with diverse sets of symbols, words and locations. They are given 60 s to identify specific symbols. The number of targets correctly identified in a 60 s interval is the primary indicator of performance. This task requires children to filter information to detect relevant stimuli and reject irrelevant stimuli.
Attentional inhibition TEA-Ch (Sky Search, attention score) Children are presented with a visual array of similar objects (flying spaceships) and identify perceptually similar, but unique types of spaceships. This required discrimination of target objects among similar distractors. Children are also required to circle similar pairs of objects to control for the impact of motor speed.
Attentional inhibition NIH toolbox (flanker) Children are required to select the direction of a central target arrow among a group of distractor arrows. On congruent trials distractors face the same way as the target, on incongruent trials distractors face the opposite direction to target.

Response inhibition
TOVA (commission errors) Children are presented with a black screen and given a clicker in their dominant hand. A white square flashes on the screen with a small black square presented in the top or bottom section of the square. Children are required to click the clicker, only when the black square appears at the top of the white square. Errors of commission (clicking when the white square appears at the bottom) are considered a measure of response inhibition. Standard scores are used in scoring and interpretation, whereby higher scores equate to better inhibition (i.e., lower impulsivity).

| Electronic searches
The search encompassed January 1972 to December 31, 2020, with specific search dates provided in Supporting Information: Appendix 1.
The following electronic databases were searched, with the database platform provided in parentheses: • Scopus

| Assessment of risk of bias in included studies
The Cochrane Risk of Bias Tool (RoB, Version 2.0; Sterne, 2016a) was used to assess risk of bias for RCTs. Using the signalling questions and guidance provided in the tool, studies were rated as having low risk, some concerns or high risk along the following domains: (2020) argue that studies with a single pre-and post-intervention measure of eligible outcomes are a particular case of uncontrolled before-after studies generally considered to be at serious or critical risk of bias. The predominant concern with this type of study design is establishing whether changes in outcomes before and after the intervention are due to the intervention or other time-varying confounders (Paulus et al., 2014;Sterne et al., 2020).  (2001) if the standard deviation of the change score was not reported by study authors (see Figure 1, Formula a). In cases where authors did not report the correlation between pre-post scores, existing research was consulted to obtain test-retest reliabilities. Where test-retest reliability could not be identified, a value of 0.5 was used for r.
For studies using a single-group pre-and post-intervention design, the SMD and standard errors were calculated using the formulae specified by Lipsey and Wilson (2001) for calculating standardised mean gain scores, which incorporate the correlation between baseline and post-intervention outcomes (see Figure 1, Formula b). The same aforementioned procedure was used to obtain or estimate the correlation between baseline and post-intervention measures. If studies directly reported effect sizes, these were used in analyses comprised of the single-group studies.

| Unit of analysis issues
The unit of analysis for this review is each study, whereby secondary reports of studies were nested under the first published report of the study. We also anticipated that some studies would report repeated post-intervention outcome measures. Where studies reported repeated post-intervention outcome measures, studies were still nested under the primary study, but each time-point was coded separately and categorised into the following time-frames for separate syntheses: short (0-3 months), medium (>3-6 months) and long-term follow-up (>6 months).
In cases where studies included multiple types of experimental groups, those which most closely matched inclusion criteria were included, and others excluded (Deeks et al., 2022). Where studies employ clustering (e.g., cohorts of treatment groups), data for individuals are not independent due to dependence within clusters and can lead to a unit of analysis error if data are analysed at the individual level. When identified eligible studies contained clustering, the appropriateness of the authors' statistical analysis (e.g., multi-level modelling) was first assessed. In the case that the statistical analysis had not taken into account the clustering, the approach recommended by Higgins et al., 2022 (Figure 1, Formula c) was used to adjust the standard error of the effect size before meta-analysis.
Where there were multiple measures of the same construct reported in a study, conceptually similar outcomes were aggregated to form a composite effect size for that specific EF using the approach provided by Borenstein et al. (2009). Specifically, the effect sizes were averaged and an adjusted standard error was calculated, both of which were then used in the relevant meta-analysis ( Figure 1, Formula d). Direct (standardised/normed neuropsychological assessments) and indirect (e.g., parent selfreport questionnaires) measures of executive were not combined, even in the case that they measured the same EF. This is due to differences in measurement error between these methods (Gross et al. 2014). Thus, dependent direct and indirect outcome measures were combined and synthesised separately.

| Dealing with missing data
The standardised coding form instructed the coder to record when data was missing from study reports to facilitate the location of the missing data. If data was missing from eligible studies, the study authors were contacted to either clarify ambiguous data or obtain data that was completely missing. Where missing data for effect size calculation could not be obtained, the study was excluded from the relevant meta-analyses, but the study was still coded and included in the narrative summary of eligible studies. The results section explicitly reports which data is missing and the discussion sections note the potential impact of missing data of the review findings (Deeks et al., 2022).

| Assessment of heterogeneity
All studies were initially analysed together using a random effects model and heterogeneity was examined using the I 2 statistic, χ 2 test and τ 2 (Deeks et al., 2022 (Cohen, 1988).

| Subgroup analysis and investigation of heterogeneity
The review protocol outlined the intended approach for investigating heterogeneity through specific sub-group analyses (Betts et al., 2019 and see Section 3). However, the review located too few studies to conduct these statistical analyses. Where heterogeneity was detected, the results integrate a discussion of the differences between the studies based on study coding across the a priori subgroups. For future updates of this review, the protocol procedures will be used to investigate heterogeneity through sub-group analyses.

| Sensitivity analysis
The review protocol outlined the intended approach for conducting sensitivity analyses to examine how factors in the review decision-making process may have impacted the robustness of the meta-analytic results (e.g., combining vs. not combing RCTs and quasi-experiments, removing studies with high risk of bias from meta-analyses). Only one sensitivity analysis was possible based on the studies located for the review. Sensitivity analyses were used to assess the impact of estimating intra-class correlation coefficients (ICC) for studies with clustering on the results of meta-analyses. Of the randomised and two-group quasiexperimental studies using direct measures of EFs, only one study-Nash (2012)-had clustering. The authors did not report the ICC, so we conducted sensitivity analyses as suggested by Deeks et al. (2020), which has been implemented in analogous reviews (Armstrong et al., 2018;Barlow et al., 2016). Barlow et al. (2016) located an ICC of 0.03 for one included group-based psychological intervention study. An ICC of zero was used for meta-analyses, with accompanying sensitivity analyses to assess difference in meta-analysis results for ICCs of 0.02, 0.03 and 0.1. For future updates of this review, all the variables and procedure specified in the protocol will be used for sensitivity analyses (Betts et al., 2019). For 28 records screened as potentially eligible after title and abstract screening, either the full-text document could not be obtained or the full-text document was insufficient for confirming eligibility for the review. In all cases, attempts were made to locate documents by ordering through institutional libraries or contacting study authors, but these procedures were unsuccessful. These 28 records are included in the 'Studies awaiting classification' reference list. A full-text was obtained for the remaining 1568 records, which were then screened for final eligibility. Of these, 11 were reviews that were harvested for studies (Bertrand, 2009;Hanscom, 2008;Hanson, 2000;Kodituwakku & Kodituwakku, 2011;Landes et al., 2017;McLean, 2014;Ospina et al., 2011;Peadon et al., 2009;Premji et al., 2007;Reid, 2015;Schaffer & Geva, 2016 only one group of study authors able to provide the data required.
All 11 eligible studies are included in the study summary tables and included studies section below.

| Included studies
A total of 253 participants were included across the 11 included studies (in 21 study reports). The subsections below provide an overview of the included studies with further detail provided in the summary tables for each included study and abbreviated study summaries provided in Table 2.

| Research design and comparison conditions
Six studies were RCTs (Coles et al., 2015;Pei et al., 2011;Vernescu, 2008;Wells et al., 2012), with some variation in randomisation strategies. One study used an alternating sequence of allocation , one study using block randomisation , one study randomised within matched pairs (Vernescu, 2008), one study used yoked pairs (Coles et al., 2015), one study used an allocation via a table of even and odd numbers (Wells et al., 2012), and one did not report the method used to achieve randomisation . One study used a quasiexperimental design with matching on participant age and gender , without further specification for how participants were allocated to the comparison or experimental conditions. Four studies used single-group pre-post intervention research designs (Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015;Reid et al., 2017). One study used an ineligible comparison group, and as such was treated as a single-group study in this review , along with a secondary report of the Pei et al. (2011) study which reported on the intervention group only. For the RCT/quasi-experimental studies, three studies used a no treatment comparison condition (Coles et al., 2015;Loomes et al., 2008;Wells et al., 2012). Two studies used a treatment-asusual comparison condition, comprised of either neuropsychological evaluation and customised community referrals  or individualised contact at the same level as the experimental participants, with usual individualisation of academic activities based on child needs and skills (Vernescu, 2008). The remaining studies used a wait list control comparison condition Pei et al., 2011).  Format: Individual face-to-face parent consultations (weekly, 90 min each) Group face-to-face child skills groups (fortnightly, 90 min each)

| Participant characteristics and recruitment
One included study did not report recruitment information (Vernescu, 2008). Six studies recruited participants via referrals from health practitioners or specialised clinics (Coles et al., 2015;Leung et al., 2015;Loomes et al., 2008;Reid et al., 2017). One study recruited through the Department of Children and Family Services (Wells et al., 2012) and three studies reported recruitment via advertisements/information packages sent home with children from school (Hutchison, 2015;Kerns et al., 2010;Pei et al., 2011).
All studies specified that participants needed to have a diagnosis or were identified as having FASD or confirmed PAE. Five studies required a formal diagnosis related to FASD Reid et al., 2017;Vernescu, 2008;Wells et al., 2012), with no further specification. Four studies required either a formal diagnosis or (i) a confirmed history of PAE ; or (ii) significant alcohol-related physical deformities (Coles et al., 2015); or (iii) receipt of services related to PAE (Hutchison, 2015). Two studies did not explicitly state eligibility criteria, however all children in the sample had received a formal diagnosis of a FASD-related disorder    , family home , school (Hutchison, 2015;Kerns et al., 2010;Pei et al., 2011;Vernescu, 2008) or a combination of home/community/university settings Reid et al., 2017).

| Implementing practitioners
Implementing practitioners in six of the 11 studies were allied health practitioners, such as psychologists or occupational therapists (Coles et al., 2015;Hutchison, 2015;Reid et al., 2017;Wells et al., 2012). The remaining five studies either used teachers and/or trained research staff to implement the intervention Leung et al., 2015;Loomes et al., 2008;Pei et al., 2011;Vernescu, 2008).

| Duration and intensity
All eligible studies reported the duration of the intervention; mean duration ranged from 11 days  to 30 weeks . The number of intervention sessions ranged from 1  to 31 , with most studies using a frequency of approximately one session per week.

| Format and nature of the intervention
Five studies delivered the intervention in individual format to children only Leung et al., 2015;Loomes et al., 2008;Pei et al., 2011;Vernescu, 2008), whereas four studies implemented a mixture of parent/child individual/group sessions (Coles et al., 2015;Reid et al., 2017;Wells et al., 2012). One study did not specify if the intervention was delivered in an individual or group format (Hutchison, 2015). Five studies included a computerised component, with or without supplementary face-to-face EF training (Coles et al., 2015;Pei et al., 2011;Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015). The remaining studies used face-to-face therapeutic or training strategies Wells et al., 2012), with Reid et al. (2017) supplementing therapy with technology-facilitated activities.
Two studies evaluated the Alert Program  or an adaptation of the Alert program (Wells et al., 2012) designed to improve self-regulation in children. Another study evaluated the GoFAR program, which uses meta-cognitive techniques to teach children planning and organisation skills, individual parent behavioural regulation training and opportunities for supported practice (Coles et al., 2015). Vernescu (2008)

| Fidelity
Regular supervision with study authors or therapists was the most common form of fidelity monitoring Wells et al., 2012). Two studies used observations of children during the task, recording behavioural evidence of strategy use, and children were subsequently asked about their use of taught strategies (Hutchison, 2015;Loomes et al., 2008) Leung et al., 2015;Pei et al., 2011;Vernescu, 2008). Table 1 shows how EFs were directly measured, and Table 3 provides an overview of the outcome measures by included study, highlighting how most included studies reported multiple measures of EFs. Please also see Table 4  inhibition (attentional and response; Coles et al., 2015;Hutchison, 2015;Kerns et al., 2010;Reid et al., 2017;Vernescu, 2008), cognitive flexibility (Hutchison, 2015;Leung et al., 2015;Reid et al., 2017;Vernescu, 2008), planning  and working memory (Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015).
All studies reported baseline outcome assessment and postintervention outcome assessment (immediately after to 3 months after the intervention), aside from one study which conducted postintervention measures approximately 6 months after the completion of the intervention (1 year after baseline; .
Three studies also reported follow-up assessments of at least one outcome measure at 5 weeks , 3 months after completing the intervention  and 12 months after completion of the intervention (18 months after random assignment; .

| Excluded studies
Due to the large number of full-text records screened for eligibility, a comprehensive description of all excluded studies is not included here.
Example interventions that did not explicitly aim to improve EF included: One study evaluated parent training workshops designed to teach behavioural regulation skills to children with FASD but did not include any clear EF outcome measures .
One clinical trial registry record could not be verified in terms of eligibility due to a lack of information available (Coles, 2005).
One study included an evaluation of an eligible EF intervention and eligible outcome measures but used a sample which comprised a combination of children with FASD and Autism Spectrum Disorder  and so was excluded from the review.

| RCTs
All RCTs studies were rated as having an overall high risk of bias due to at least one of the five RoB2 domains being rated as 'high' (see domain, all studies were rated as either high or unclear risk of bias. As is often the case in applied psychological intervention research, participants and implementers are likely aware of the condition to which they have been assigned. There was only one report of deviations from the intended intervention , and most authors conducted appropriate analyses to estimate the effect of assignment to conditions (e.g., intention-to-treat vs. per-protocol).
The ratings for the 'Missing outcome data' domain were mixed. Four studies (Coles et al., 2015;Pei et al., 2011;Vernescu, 2008) were rated as having high or unclear risk of bias due to either having <95% available data for participants and/or no assessment of the impact of missing data (likely due to small sample sizes). Studies inconsistently reported reasons for missing data, making it difficult to determine if the missing data was related to its true value in all cases. or whether it could be related to the true value of the outcome. Two studies were rated as having low risk of bias due to reporting data for all participants randomised Wells et al., 2012).
In terms of the measurement of outcomes, all but two studies were rated as having low risk of bias due to the use of standardised direct assessments of EF, which are less likely to be influenced by knowledge of the intervention. However, studies inconsistently reported whether outcome assessors were blind to participant assignment. Two studies were rated as having high risk of bias as only indirect parent-or teacher-report measures were used to assess EF and this type of outcome measure may be influenced by knowledge of the intervention Wells et al., 2012).
All studies were rated as having either high or unclear risk of bias for the 'Selection of the reported result' domain, predominantly because no pre-specified analysis plan or protocol could be located.
Most studies included more than one measurement of an eligible outcome and tended to vary on the completeness of reporting, with some selective reporting based on statistical significance.

| Quasi-experimental comparison group studies
The single eligible quasi-experiment with a comparison group  was rated as having an overall serious risk of bias due to a rating of serious for at least one of the domains for the ROBINS-I tool.
T A B L E 4 Functional description of executive functions

Executive function Functional description
Direct

Attentional inhibition
The ability to stop automatic shifts in attention and shift attention back to tasks that are goal-directed. This skill allows children to not become distracted by surroundings when working on a task, or to redirect attention back to the task if they do become distracted.

Response inhibition
The ability to stop automatic behavioural responses in favour of behaviours that are more goal-directed for the situation. This skill allows children to stop themselves from calling out in class, for example.

Cognitive flexibility
The ability to tolerate moving between different contexts and understand how each new situation may include changes to rules and expectations and adapt behaviour or perspective accordingly.
Working memory The ability to hold and integrate multiple pieces of information in mind for the purpose of solving problems or completing tasks. This skill allows children to complete tasks with multi-step instructions or complete tasks such as mental arithmetic Planning The ability to create and carry out a series of steps in sequence to achieve a goal

Indirect
Global executive function (GEC) This is a composite measure created by adding performance across behavioural regulation, meta-cognition and emotional control (all described below).

Behavioural regulation
The ability to understand the need for different behaviours in different situations and use this to stop inappropriate behaviours and emotions in favour of more goal-directed actions

Meta-cognition
The ability to independently plan and begin tasks, as well as monitor own thinking and progress. This allows children to actively solve problems across a variety of situations Emotional control The ability to understand the usefulness of different emotions across context and actively control which emotions are felt or expressed depending on certain rules Shifting The ability to tolerate moving between different contexts and understand how each new situation may include changes to rules and expectations and adapt behaviour or perspective accordingly.

Inhibition
The ability to stop a behaviour when deemed appropriate. This ability allows children to control their behaviour in different settings (e.g., not calling out in the classroom).

Initiation
The ability of a child to begin activities, generate ideas or problem solve on their own.

Monitoring
The ability of a child to assess their own performance following completion of a task, or assess the impact their actions may have on the feelings of others.
Working memory The ability to hold and integrate multiple pieces of information in mind for the purpose of solving problems or completing tasks. This skill allows children to complete tasks with multi-step instructions or complete tasks such as mental arithmetic Planning/organising The ability to understand demands of current and future tasks and activities. This skill allows children to predict future events, set goals based on predictions and step through sequential steps to achieve goals Organisation of materials The ability to organise physical materials useful in the pursuit of goals. This refers to things such as a child keeping a neat and tidy workspace or independently cleaning up their belongings.

Attention
The ability to direct conscious awareness towards things that matter. This allows a child to stay focused in class or continue working on a boring task.

Effortful control
The ability to consciously control ones behaviour or emotional response to the environment BETTS ET AL.
| 21 of 65 While there was baseline assessments of the outcome measures, other potential baseline confounders or time-varying confounders were either not measured or accounted for in analyses (e.g., demographics, time of measurements between groups), leading to a rating of serious risk of bias for the confounding domain. Selection of study participants did not appear to be based on participant characteristics observed after the start of intervention (selection domain), suggesting that participants who would have been eligible were included in the study. However, it is unclear whether the start of follow-up and start of the intervention coincide for participants in each group. As a result, this domain was rated as being at moderate risk of bias.
The intervention and comparison conditions were clearly articulated by study authors, presumably before the start of the intervention, leading to a rating of low risk of bias for the classification of interventions domain.
There were also no reported deviations from the intended intervention, leading to a low risk of bias for this domain. Data for all recruited participants appeared to be included in analyses, also leading to a rating of low risk of bias for the missing data domain. Similar to the included RCTs, the measurement of outcomes domain was also rated as having low risk of bias because knowledge of assignment to conditions was unlikely to have affected the standardised direct measures of EF and because the methods of outcome assessment appeared comparable between the intervention and comparison groups. Finally, the selection of reported results domain was rated as having moderate risk of bias due to there being no prospectively published protocol or analysis plan for the study, however, there was only one measurement and analysis of the eligible outcome which was clearly reported in the study report.

| Effects of interventions
Four RCT/quasi-experimental studies with a comparison group reported direct measures of auditory attention (Coles et al., 2015;Pei et al., 2011;Vernescu, 2008). One study (Vernescu, 2008) reported two direct measures of auditory attention, which were combined using the approach detailed in Section 4.3.5. Another study reported measuring auditory attention via the NEPSY and one other potentially eligible measure of auditory attention (KiTAP, subtest unspecified), but no data was reported in any study reports . Efforts to obtain the required data from authors for effect size calculation were unsuccessful, so only three studies were meta-analysed.
The meta-analysis included a total of 62 participants (experimental  the use of different practitioners to implement the intervention or the developmental age of the child participants. Three single-group pre-post studies reported direct measures of auditory attention before and after an eligible intervention (Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015). However, Leung et al. (2015) (N = 20) did not report post-intervention data (NEPSY: Auditory Attention). Hutchison (2015) and Kerns et al. (2010) both reported Cohen's d, but no other data which could be used to combine the effect sizes with meta-analysis (e.g., pre-post means, SD, p-value of d or confidence intervals). Hutchison (2015)  permit meta-analysis, however, author contact details could not be located for Hutchison (2015) and Leung et al. (2015) could not provide the required data, and no response was received from Kerns et al. (2010). While both of these reported effects suggest more positive and larger effects than the two group experimental studies, they must be interpreted with caution due to the tendency for single-group studies to over-estimate true effects and high risk of bias.

| Attention (visual)
Three RCT/quasi-experimental studies with a comparison group directly measured visual attention and reported this as an outcome (Coles et al., 2015;Pei et al., 2011;Vernescu, 2008). Vernescu (2008) measured visual attention directly using KiTAP Ghost's Ball (omission errors) and TEA-CH Sky Search DT, however, the required data to calculate an effect size for TEA-Ch Sky Search DT was not reported.
Efforts to contact the author for this data were unsuccessful, so only one effect size-rather than an aggregated effect size of the dependent but no data was reported in any study reports. While Coles et al. (2015) directly measured various aspects of attention using the TOVA, the errors of omission scale is considered the most direct assessment of visual attention and so was used for effect size calculation to calculate an effect size for visual attention (see Table 1). Efforts to obtain the required data from authors for effect size calculation were unsuccessful, so only two studies were meta-analysed.
The meta-analysis included a total of 32 participants (experimental n = 16; comparison n = 16) and the overall effect for visual attention across studies was medium in size and positive, but not statistically significant (SMD = 0.90, 95% CI = −1.41, 3.21). This indicates that participants receiving an EF-focused psychological intervention did not differ from comparison participants on direct measures of visual attention. A forest plot of the distribution of effect sizes is presented in Figure 9. There was significant heterogeneity across the two studies (τ 2 = 2.43; χ 2 = 7.84, df = 1, p = 0.005; I 2 = 87%). However, the small number of studies precludes an accurate estimate of the variance component, τ 2 . Potential sources of this heterogeneity (see Table 5) could be the format of the intervention, the use of different practitioners to implement the intervention or the presentation format for the test used to measure the outcome (e.g., KiTAP may be more interesting for children when engaging in the task).  . In addition, Pei et al. (2011) reported no data that could be used to calculate effect sizes and attempts to obtain the data were unsuccessful. As such, this study was omitted from meta-analyses.
Vernescu (2008)  indicated that when the standard errors of the single study with clustering  were adjusted, the overall effect size slightly increased, but the confidence intervals still included zero (see Figures 15-17).
One single-group study reported a direct measure of attentional inhibition as an outcome before and after an eligible intervention (N = 10; Kerns et al., 2010). Kerns et al. (2010) reported Cohen's d of 1.07, but no confidence interval or standard error, or data that could be used to calculate these by hand. The lead author was contacted for the missing data, but no response was received. While this suggests a larger effect of the intervention than studies that included a comparison group, the effect must be interpreted with caution due to the tendency for single-group studies to over-estimate true effects and their high risk of bias.

| Inhibition (response)
Three RCT/quasi-experimental studies with a comparison group reported direct measures of response inhibition as an outcome (Coles et al., 2015;Vernescu, 2008). A fourth study potentially measured attentional inhibition, however the distinction between attentional and response inhibitions was not possible based on the description of measures in available reports of the study . In addition, Pei et al. (2011) reported no data that could be used to calculate effect sizes and attempts to obtain the data were unsuccessful. As such, this study was omitted from meta- For outcomes with sufficient data, the effects sizes were combined within each study before meta-analysis using the approach specified in  Figure 18. Sensitivity analyses indicated that when the standard errors of the single study with clustering  were adjusted, the overall effect size slightly increased, with confidence intervals slightly increasing with the ICC value (see Figures 19-21). There was no significant heterogeneity across studies (τ 2 = 0.00; χ 2 = 1.75, df = 2, p = 0.42; I 2 = 0%).
Four single-group pre-post studies reported direct measures of response inhibition before and after an eligible intervention (Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015;Reid et al., 2017). Hutchison (2015) and Kerns et al. (2010) both reported Cohen's d, but no other data which could be used to combine the effect sizes with meta-analysis (e.g., pre-post means, SD, p-value of d or confidence intervals). The authors were contacted to obtain missing data and permit meta-analysis, however, author contact details could not be located for Hutchison (2015) and no response was received from Kerns et al. (2010).
Hutchison (2015)  No single-group studies included any direct measures of planning.

| Working memory (verbal)
Two RCT/quasi-experimental studies with a comparison group reported a direct measure of verbal working memory as an outcome, yet both reported insufficient data to calculate an effect size using RevMan analysis tools Pei et al., 2011 Three single-group pre-post studies reported direct measures verbal working memory as an outcome before and after an eligible intervention (N = 39; Hutchison, 2015;Kerns et al., 2010;Leung et al., 2015).
Hutchison (2015) and Kerns et al. (2010) both reported Cohen's d, but no other data which could be used to combine the effect sizes with metaanalysis (e.g., pre-post means, SD, p-value of d or confidence intervals).
Hutchison (2015)  inspection of the figures in the associated thesis report suggests that the direct measure of verbal working memory continued to increase, albeit at a slower rate than between baseline and post-intervention. While these studies collectively suggest variable sized effects that somewhat align with the meta-analysis of the two group experimental studies, the effect sizes must be interpreted with caution due to the tendency for singlegroup studies to over-estimate true effects and their high risk of bias.

| Working memory (visual)
One of the RCT/quasi-experimental studies with a comparison group reported a direct measure of visual working memory as an outcome, yet reported insufficient data to calculate an effect size .
Efforts to obtain the missing data from study authors were unsuccessful.
Three single-group pre-post studies reported direct measures of visual working memory before and after an eligible intervention.
However, Leung et al. (2015) did not report post-intervention data to allow calculation of effect sizes. Hutchison (2015) and Kerns et al. (2010) both reported Cohen's d, but no other data which could be used to combine the effect sizes with meta-analysis (e.g., pre-post means, SD, p- While these studies collectively suggest small to medium-sized effects, they must be interpreted with caution due to the tendency for singlegroup studies to over-estimate true effects and their high risk of bias.  forest plot of the distribution of effect sizes is presented in Figure 26.

| Indirect measures of EF
There was no significant heterogeneity across studies (τ 2 = 0.00; χ 2 = 0.18, df = 1, p = 0.68; I 2 = 0%). Sensitivity analyses indicated that when the standard errors of the single study with clustering  were adjusted, the results did not change for an ICC of 0.02, 0.03 or 0.1 (see Figures 27-29). Three single-group pre-post studies reported an indirect measure of meta-cognition using the BRIEF MI composite before and after an eligible intervention (Hutchison, 2015;Leung et al., 2015;Reid et al., 2017). Hutchison (2015) and Leung et al. (2015) had missing data, which precluded combining the studies with meta-analysis.
Authors of both studies were contacted to obtain missing data, however, author contact details could not be located for Hutchison

Reliable Change Index scores reported by the authors suggest that (a)
Child 1 'recovered' when using one parent's report and 'improved' for the other parent; and (b) Child 2's BRI was 'unchanged' across both parents.
While the reported results from single-group studies suggest larger effects than the two group experimental studies, the effects must be interpreted with caution due to the tendency for single-group studies to over-estimate true effects and their high risk of bias.

| Attention
One RCT/quasi-experimental study with a comparison group reported an indirect measure of attention as an outcome  reported insufficient data to calculate an effect size for either indirect measure of attention (Conner's; Attention Deficit Disorders Evaluation Scale). One single-group pre-post study reported an indirect measure of attention before and after an eligible intervention  and also did not report post-intervention data to allow calculation of an effect size for either indirect measure of attention (Conner's; BASC-2).  Figure 34. There was no significant heterogeneity across studies (χ 2 = 0.52, df = 1, p = 0.47; I 2 = 0%; τ 2 = 0.00). Sensitivity analyses indicated that when the standard errors of one study with clustering  were adjusted, the overall effect size to zero for an ICC of 0.02, 0.03 or 0.1 (see Figures 35-37).

| Emotional control
One single-group pre-post study reported an indirect measure of inhibition before and after an eligible intervention. Hutchison (2015) (N = 9) reported a small and positive Cohen's d of 0.168 for the teacherreport BRIEF. While the reported results from this single-group study differs from the synthesis of the two group experimental studies, the effect must be interpreted with caution due to the tendency for singlegroup studies to over-estimate true effects and their high risk of bias.  Figure 38. There was no significant heterogeneity across studies (χ 2 = 0.01, df = 1, p = 0.93; I 2 = 0%; τ 2 = 0.00). Sensitivity analyses indicated that when the standard errors of one study with clustering  were adjusted, the overall effect size to zero for an ICC of 0.02, 0.03 or 0.1 (see Figures 39-41). One single-group pre-post study also reported the above BRIEF indirect measures of EF before and after an eligible intervention , yet did not report post-intervention data to allow calculation of effect sizes. Similarly, the review was unable to assess whether the impact of EFfocused interventions varies by a range of important moderating variables (e.g., characteristics of the intervention, characteristics of the participants, type of EF targeted by the intervention). Fourth, most of the included studies were also conducted in North America (k = 10), with only one study conducted elsewhere (Australia). Therefore, the findings may not generalise to other cultures or geographical settings. This may be particularly pertinent in the area of FASD due to the strong cultural influences around patterns of alcohol consumption and differential expectations regarding normative child development across different cultural settings (Wang, 2018).

| Quality of the evidence
The results of this review should be interpreted with utmost caution due to issues with the quality of the located and included Collectively, these issues reduce the certainty and confidence in the review findings.

| Potential biases in the review process
The systematic search for this review included 17 academic databases, 27 potential sources of grey literature, and various supplementary hand search strategies. As such, it is unlikely that any eligible studies were not captured by the review. The systematic search was updated to December 2020 to ensure the review provides a current representation of the state of the field. A potential bias in the review process was the omission of specialised education databases. We aim to include these in future updates of the review (e.g., ERIC).
All data extraction and effect size calculation was completed by two independent reviewers, thereby minimising the potential for any bias in the results of the review. Decision-making regarding syntheses are clearly reported in this review, with sensitivity analyses conducted where possible to examine the impact of that decisionmaking on the results. Overall, the review adhered to current Campbell Collaboration guidelines for reporting and conduct standards. Therefore, there is limited potential for bias arising from review process.

| Agreements and disagreements with other studies or reviews
The current review found a lack of high quality evidence assessing the impact of interventions on EF in children diagnosed with FASD.
According to the small number of extant studies, results of metaanalysis suggest that there is limited evidence to support EF-focused psychological interventions for children with FASD. This conclusion tends to differ from the included studies, many of which report favourable conclusions, albeit not consistently based on the full range of outcome data in study reports. Results of the current review echo conclusions of a number of previous reviews in the area, which also emphasise that the ability to draw strong conclusions regarding efficacy is limited by the relative dearth of research on interventions in FASD (Ospina et al., 2011;Peadon et al., 2009;Premji et al., 2007) and calls for further research (Bergeson, 2016). One review (Reid et al., 2015) concluded that there is emerging evidence for the effectiveness of interventions in early/middle childhood FASD. Similarly, Bertrand (2009) concluded that several interventions were shown effective in improving outcomes in FASD. However, both of these reviews included interventions targeting a broader range of outcomes (e.g., social skills, mathematics, safety skills). Additionally, meta-analysis was not possible due to significant heterogeneity resulting from the broad scope of the review. This highlights the importance of conducting rigorous reviews and syntheses to support dissemination of more precise effect estimates to practitioners, researchers and policy-makers.

Implications for practice
There has been a notable increase in awareness in recent decades regarding the dangers and impacts of PAE. This is evidenced by the growing allocation of public funds across the developed world to understanding the consequences of prenatal exposure to alcohol, and the development of clear diagnostic guidelines and criteria across multiple countries. While much of the research in previous decades focused on the description and prevalence of FASD, greater emphasis is now being placed on interventions and their potential to improve the outcomes of children affected by FASD.
When considering the diffuse neurobiological impacts of PAE, damage to EFs may be among the most damaging to overall outcomes (Moffitt et al., 2011). EFs develop from an early age and underlie a range of everyday abilities in children and adults (Anderson et al., 2002) and so are an important area to target in early-intervention efforts in the context of FASD. Yet the findings from this review suggest that EF-focused psychological interventions may not be effective for improving EF in children with FASD, at least in the short-term.
However, as noted above, these findings need to be interpreted with the utmost caution due to the limited evidence currently available for estimating overall effectiveness. Despite this limited evidence, these findings should not discourage further investigations of potential interventions to support children with FASD. FASD is associated with diffuse neurodevelopmental impairments that vary across individuals (Kodituwakku, 2009). It is possible that the most effective interventions may need to target multiple neurodevelopmental domains rather than focus predominantly on EF. In addition, outcomes may need to be tracked over longer periods of time to account for possible 'sleeper effects'. That is, the true impact of interventions on EF may not be realised until later developmental years, as has been found in other early intervention programs (Bierman et al., 2014). Overall, there is much to be done to ensure that the quality of evaluation evidence keeps pace with the growing ambition to understand how best to treat children impacted by PAE. Government, Australia. The lead author was also supported by a PhD scholarship included as part of the above grant. Review tasks were distributed as follows:

Elizabeth Eggins is an Editor for the Campbell Crime and Justice
Coordinating Group, however, she will not be privy to any internal editorial or Campbell Social Welfare Coordinating Group communications during the peer-review process.

DIFFERENCES BETWEEN PROTOCOL AND REVIEW
The review makes two deviations from the protocol. The protocol stated that studies would be included in the intervention was conducted in schools, hospitals or other clinics. Upon screening potentially eligible studies, it became clear that the intervention setting was not consistently or clearly reported. To provide a comprehensive review, studies that did not clearly specify the setting, but met all other eligibility criteria were included in the review.
The second deviation was adjusting the full-text screening strategy due to the large number of documents. As independent double screening is not a mandatory MECCIR standard, an approach similar to other reviews published by the Campbell Collaboration was taken (e.g., Mazerolle et al., 2020). Specifically, a random sample of excluded full-texts were reviewed by a second author to assess the consistency and accuracy of screening decisions. A 10% random sample was drawn across the full-text exclusion criteria and only two disagreements were identified. The final inclusion/exclusion decision for these two studies was made via discussion between review authors. Recruitment: Eligible children were identified by clinicians using medical records upon applying for clinical services at a paediatric neurodevelopmental exposure clinic. Information was provided to parents regarding the program before seeking consent and randomisation to conditions.
Clustering: Unclear *Note: This study randomised participants to three conditions (GoFAR, FACELAND, control). The second treatment condition was equivalent to the GoFAR intervention, but included a computerised program (FACELAND) which aims to foster children's skills in identifying emotions (self and other). As this is an alternative intervention not aimed at fostering executive function, all comparisons are based on the GoFAR and control group participants.
Participants Eligibility criteria: Children with a clinical diagnosis of FAS, or partial FAS, or significant alcohol-related physical deformities (scores of more than 10 on dysmorphia checklist, Institute of Medicine/Hoyme Revision).
Total referred sample: 20 (10 in both groups) Participants completing treatment: Exp = 7; Con = 9 Individual face-to-face parenting sessions Individual child computer game sessions Parent-and-child therapy sessions Description: Five weekly computer game sessions (1 h) where children played GoFAR, which focuses on navigating obstacles with a character to unlock subsequent levels. Children cannot navigate freely, but must plan actions by placing symbols in environment before executing them ('Focus and Plan' step), then after action ('Act' step), they are required to recall their plan before moving to the next level ('Reflect' step). Length and frequency of sessions not provided in any reports of the study. Five parenting sessions (1 h) focused on learning and using the FAR methodology and techniques for managing child behaviour. These sessions ran concurrently with the child computer game sessions. Parenting sessions were conducted by a clinical psychology graduate students and postdoctoral fellows under the supervision of one study author. Session content: (a) FASD psychoeducation and impact of FASD on arousal behaviour regulation; (b) identifying child's arousal level and how to teach in calm alert state; (c) environmental modifications to promote behavioural regulation; (d) how to promote adaptive living skills and manage negative behaviours; (e) reflection on how to apply FAR process. Parents were provided with homework based on the content of the sessions. Mean time for parent sessions: 56.9 min (SD = 4.3). Five weekly Behaviour Analogue Therapy parent-child sessions focused on applying learning through activities that used toys as analogues for child's activities at home (e.g., putting doll to bed). The specific activities were individualised for each family. Children taught application of FAR techniques by therapist, under observation of parent(s), followed by parent implementation within session and weekly homework to continue application in the home environment. Length and frequency of sessions not provided in any reports of the study.
Integrity monitoring: Child computer game sessions were monitored by staff supervised by the study authors. These staff were either an educational specialist, clinical psychological graduate student or trained undergraduate student. The following data was collected during observation: time spent playing the game, attention to the game, enthusiasm and number of prompts to stay on task. Parent therapy sessions were audio visually recorded and reviewed by study authors and therapists recorded whether homework was completed, number of pre-defined therapy goals achieved and parents' understanding of session content.

Outcomes
Measures were administered before randomisation and after completion of the program (or equivalent for comparison group participants). Self-report child disruptive behaviour was also measured at midtreatment.

Bias
Authors' judgement Support for judgement Randomisation High risk Comment: Unclear whether the allocation sequence was random or concealed. Quote: 'Thirty families with children ages 5 to 10 years of age who were prenatally affected by alcohol were recruited and randomly assigned…control group participants served as timeelapsed controls and were yoked to a participant in one of the treatment groups to minimise group differences in time between assessments' .
Comment: The results of baseline differences on outcomes measures not directly reported, but inspection of baseline data, reported for participants completing the study protocol, suggest small differences the TOVA and NEPSY. Quote: 'Comparisons of demographic (Continues)

Bias
Authors' judgement Support for judgement and family characteristics, birthweight and intellectual skills of the participants yielded no significant group differences' (Coles et al., 2015).

Deviations from intended interventions
High risk Comment: Due to the nature of the intervention and comparison conditions, participants and implementers likely aware of assignment. No information on whether there were deviations from the intended intervention due to the experimental context. Authors conducted 'per-protocol' analysis whereby they excluded participants who did not complete the intended intervention from analyses, and this may have impacted the effect estimates.

Missing outcome data
Unclear risk Comment: Reasons for missing data are reported by study authors and these reasons are likely unrelated to the true value of outcomes. That is, the reason for missing data is due to dropout and decisions by study authors to excluded participants who did not complete the entire intervention from analysis (see 'deviations from intended interventions'). Quote: 'results are based on the 25 participants completing the protocol'  and 'advanced statistical modelling of missing data was not performed as a result of the small sample size' (Coles et al., 2015).

Outcome measurement
Low risk Comment: Method of measurement appropriate, measurement of direct outcomes unlikely to change between groups or be influenced by knowledge of intervention received by participants as the assessment methods are standardised. Study authors likely collected outcome data, meaning they were aware of the intervention received by participants, but this was unlikely to have affected the results of direct standardised assessment methods. Indirect self-report measures may be influenced by knowledge of the intervention.

Selection of reported result
Unclear risk Comment: No pre-specified analysis plan or protocol could be located.

Authors' judgement Support for judgement
Multiple eligible outcome measures, with multiple eligible analyses across study reports (appear consistent). For direct outcomes, all available subscales for eligible outcomes appear to be reported. For indirect outcomes, some subscales are not reported (e.g., single subscales for BRIEF and CBCL).

Hutchison (2015)
Methods Research design: Single group with pre-and postintervention measures Recruitment: Three schools agreed to recruit children. Information packages and consent forms sent to guardians of children in the eligible age range who had been identified as having an FASD diagnosis or who had received services in relation to PAE or possible FASD. Guardians who returned consent forms were contacted by the researcher.
Clustering: participants were enrolled across three schools (school 1 = 5; school 2 = 4; school 3 = 1) Participants Eligibility criteria: Aged 6-13 years, with a diagnosis of FASD or who had received services related to PAE. Children with a brain injury, other neurodevelopmental disorder diagnosis, or who could not complete baseline outcome measures were excluded from the study. Description: CQ is a serious video game designed to improve attention and working memory in children.
The game consisted of five levels (mini-games), each of which required a 90% accuracy rate or certain number of correct trials per level to pass and move onto the next level. Each level tested and trained either attention or working memory through interactive video game activities. The entire intervention lasted 12 h in total, distributed across 20-30 min sessions 2-5 times per week, over a period of 8-10 weeks.
Integrity monitoring: Children played the video game under supervision of an Education Assistant (EA) trained in the implementation of the CQ. The lead author shadowed an initial session and modelled strategies to improve participant engagement. The author stayed in regular contact with EAs throughout the study period, to answer questions relating to implementation.

Outcomes
Measures were administered approximately 1 week before intervention commencement and 1 week after intervention completion    Leung et al. (2015) Methods Research design: Single group with pre-and postintervention measures* Recruitment: Participants were recruited through a FASD clinic at a rehabilitation hospital, following a confirmed history of PAE.
*Note: This study used an ineligible comparison group (children without FASD) and so was treated as a single group study.
Participants Eligibility criteria: Children needed to have a confirmed history of PAE. Children were excluded if they had any of the following conditions: genetic disorders (e.g., Down's syndrome), severe neurodevelopmental disorders (e.g., autism) and/or significant motor/ sensory impairments (e.g., cerebral palsy, blindness). Integrity monitoring: Not formally monitored, however, informal reports from caregivers suggested that participants often took more time to complete the intervention components than specified in the Cogmed © protocol. Reasons recorded included: need for frequent breaks, difficulty sustaining attention and school schedules.

Outcomes
Measures were administered before the intervention, after completion of the program and 5 weeks later.

Eligible outcomes (direct)
Attention (  Integrity monitoring: Throughout the experiment, assessors observed and recorded rehearsal strategies (e.g., whispering, moving lips or repeating the items) and asked children how they remembered the stimuli on the experimental tasks.

Outcomes
Measures were administered before the intervention, immediately after the first intervention session, and then immediately after the second intervention session.

Eligible outcomes (direct)
Working memory Working Memory Test Battery for Children (WMTB-C) **Insufficient data for effect size calculation and data could not be sourced from study authors. Comorbidity: ADHD: Exp = 42%; Con = 85%*ODD: = Exp: 25%; Con = 15% Language disorder: Exp = 50%; Con = 38% Anxiety disorder: Exp = 0%; Con = 15% Sensory disorder: Exp = 8%; Con = 8% Note. Demographics are for sample completing treatment, as reported in . Sample numbers and demographic details slightly vary across reports of this study. In addition, the eligibility criteria for the Soh et al. (2015) and Nash et al. (2017) report expand eligibility criteria so that children with a hospitalised head injury were not included. The Soh et al. (2015) report expands these criteria further to exclude children with a neurological abnormality, a debilitating/chronic medical condition or contraindication to MRI (e.g., braces).

*Significantly different at baseline
Interventions Name of intervention: Alert Program for Self Regulation ® Setting: Hospital clinic Format: Individual face-to-face child sessions Description: 12 × 60 min therapy sessions across 12-14 weeks in developmentally appropriate therapy room (e.g., floor mats, large pillows, tent) without distracting visual/auditory stimuli. The Alert program focuses enhancing child self-regulation by encouraging integration of cognitive processing, using a car engine analogy. The sessions are organised into three sequential stages: (1) identifying and labelling 'engine levels' through learning 'engine words' and 'engine speeds'; (2) experimenting with 'engine speeds' and learning regulation strategies; (3) self-selection of strategies for application outside therapy and review of learning. Before proceeding to (Continues) BETTS ET AL.
| 45 of 65 subsequent stages, children need to pass 'mile markers', which represent their mastery of concepts. For this study, therapy was delivered by two senior doctoral level clinical psychology students who had completed training by the developers of the Alert program.
Integrity monitoring: A selection of videoed sessions were observed by a clinical psychologist and the implementing therapists received bi-weekly supervision with a senior clinician to ensure treatment was being delivered in a developmentally appropriate way for each child (e.g., use of visual tools, reinforcement strategies).

Outcomes
Measures were administered before treatment and within 2 weeks of completing the treatment program (or equivalent for comparison group participants).

Eligible outcomes (direct)
Response inhibition
Comment: Baseline differences between intervention groups suggest there may have been a problem with randomisation. Quote: 'Groups differed significantly at baseline in ADHD diagnoses and exposure to both alcohol and drugs. The DTC group had significantly more children diagnosed with ADHD…whereas the TXT group had a greater number exposed to both alcohol and drugs… There were no other significant differences between the TXT and DTC groups' (Nash, 2012, p. 29).

Deviations from intended interventions
High risk Comment: Participants likely aware of their assigned intervention, as they were presumably assigned following an informed consent process which detailed both conditions. Study authors appear to have implemented randomisation and also implemented the treatment intervention.
No deviations are reported for earlier study reports Nash et al., , 2017, however the secondary report by Soh et al. (2015) indicates that one participant was re-allocated from the treatment group to the control group due to scheduling issues for the family. The Soh et al. (2015) report also contains an additional four participants compared to the Nash et al. reports and so it is unclear whether this re-allocated participant is in the participants reported by Nash et al. or resides within the additional four participants included in the Soh et al. (2015) report.
The analysis was appropriate to estimate the effect of assignment to the intervention (intention-totreat/modified intention-to-treat analyses).

Missing outcome data
High risk Comment: Data not available for all randomised participants, results could possibly be biased by missing outcome data, and missingness may depend on its true value. Approximately 85%

Bias
Authors' judgement Support for judgement data available for each group. Loss of 2 of 14 participants in the treatment group and 3 of 15 in the waitlist control group. No analysis methods that correct for bias or sensitivity analyses reported by study authors.
Reasons for missing data in  and  appear to be unrelated to the eligible outcome data (i.e., custody/access issues, lost to follow-up). However, reasons for missing data in Nash et al. (2017) may be related to the Go-No-Go outcome. Specifically two treatment cases were excluded from analyses because they refused to enter the fMRI scanner due to anxiety, whereas, two cases in the control group were excluded from analysis due to a dental implant and excessive motion. This issue likely only affects one outcome, but this cannot be verified as even though the number of participants and missing data is the same across all the Nash et al. reports, the reasons for missing data appear to differ. | 47 of 65 levels requiring 90% accuracy to progress. Three research assistants were trained in the role of interventionist, who taught and encouraged children to apply meta-cognitive strategies during game play (e.g., rehearsal to scaffold working memory). The meta-cognitive training followed a sequence of (a) identifying the issue; (b) strategy planning; and (c) reinforcement of the specific strategy.
Integrity monitoring: Interventionists kept record of meta-cognitive strategies that were either taught by themselves or used spontaneously by participants.

Outcomes
Eligible outcomes were measured before and after the intervention or approximately 12 weeks after baseline

Eligible outcomes (direct)
Attention  Study 1: NEPSY (auditory attention) Study 1: Farm animals game (omission errors, computerised continuous performance task) Study 2: KiTAP (no exact specification of subtest aside from classification by authors as a measure of attention) Inhibition  Study 1: NEPSY (exact subtest not specified, but subtests drawn from attention & executive function domain which contains measures of inhibition) Study 1: Day/night task (paper-based Stroop-like task) Study 1: Computerised go/no-go (no further specification provided in study reports) Study 1: Farm animals game (commission errors, computerised continuous performance task) Study 2: Tasks of executive control Working memory  Study 1 and 2: Weschler Intelligence Scale for Children (spatial span) Study 1 and 2: Working Memory Test Battery for Children (WMTB-C, Digit Recall and Block Recall) Study 2: Tasks of executive control Meta-cognition (Makela et al., 2019) Meta-cognitive strategy checklist (completed during intervention only, no baseline)

Eligible outcomes (indirect)
Study 1: General executive function using BRIEF, parent and teacher report (subscales not specified) Study 1: Attention using Conor's Rating Scales Revised, parent and teacher report Study 2: Attention Deficit Disorders Evaluation Scale (ADDES), parent and teacher report (Study 2)  (Pei et al., 2011, p. 13). This further reduces the bias for direct assessments of EF, however, indirect self-report measures may be influenced by knowledge of the intervention.

Selection of reported result
High Format: Parent-and-child therapy sessions Description: Weekly to fortnightly sessions lasting 1-2 h (Family 1 = 17 sessions total; Family 2 = 21 sessions total) with each family. Treatment begins with a holistic assessment of the family's needs, which guides the conceptualisation and tailored treatment approach. The usual PuP model was adapted by adding psychoeducation about the neurobehavioral impacts of FASD and specific self-regulation strategies. Each session began with a review of the prior week's events and then proceeded to (a) discussion, application and problem-solving in relation to self-regulation strategies; and (b) discussion, application and problem-solving in relation other issues experienced by the family. The implementing psychologist worked with parents to support the implementation of self-regulation strategies between sessions, and also directly with children to teach and encourage the use of self-regulation strategies. Example exercises included ageappropriate mindfulness exercises taught through books, in-situ practice and applications on electronic devices. The standardised Parent Workbook was used to guide strategies to support overall family wellbeing.
Integrity monitoring: Fortnightly supervision with program developer and explicit fidelity checks in to the intervention, asking participants to discuss how strategies had been implemented throughout training.

Outcomes
Measures were administered at baseline, postintervention and follow-up (3 months).
Eligible outcomes ( Description: Twelve daily 30-min sessions during the school day or after school, with the total intervention period spanning 2.5-3.5 weeks. Teachers administered training to children in a small, quiet room in the school. The intervention following the Pay Attention! protocol, with some minor adjustments to some of the materials to suit the participants age and developmental ability and use of visual and auditory sustained attention tasks only. Example tasks included: sorting cards into categories or listening to auditory stimuli and pressing a buzzer when hearing the target stimuli. Tasks are hierarchically structured so that children only progressed to more difficult tasks once they reached a specific performance threshold for a task (e.g., 90% accuracy on each timed task). Children were not encouraged to practice outside of training sessions and were not taught any specific attention strategies to use during sessions.
Integrity monitoring: Unclear

Outcomes
Measures were administered 1 week before the intervention and within 1 week after completion of the program.

Interventions Name of intervention: Neurocognitive Habilitation Program
Setting: Unclear Format: Group face-to-face caregiver psychoeducational sessions Group face-to-face child intervention sessions Caregiver-and-child practice sessions Description: 12-week program comprised of 12 × 75 min sessions delivered by doctoral and master's-level therapists, where by parent and child sessions ran concurrently and the final 15-30 min was dedicated to parent-child practising of the skills learnt. Caregiver psychoeducational sessions focused on providing information on FASD and skills-based content: (a) for recognising their child's arousal level; (Continues) BETTS ET AL.
| 53 of 65 (b) strategies to moderate child arousal and behaviour; (c) developmental accommodations for their child. Child sessions incorporated elements of the Alert program (e.g., engine to represent arousal, selfregulation strategies) and other strategies used to treat traumatic brain injuries (e.g., strategies to improve memory). Each session followed a consistent structure: check-in and review, group activity or learning concept, sensory snack and a 'wind down'; activity (e.g., art) before joining with their caregivers.
Integrity monitoring: Treatment manual guided intervention and practitioners met to discuss implementation and fidelity issues.

Outcomes
Measures were administered before randomisation and within 7 months from baseline (or equivalent for comparison group participants).

Inhibition
Behavioural Baseline differences between intervention groups suggest there was not problems with randomisation. Quote: 'The

Bias
Authors' judgement Support for judgement demographic characteristics of the treatment and control groups, including race and ethnicity as classified by the children's primary caregiver and DCFS, were similar, except that the mean age of the control group was significantly greater than that of the study group' (p. 29).

Deviations from intended interventions
Unclear risk Comment: Participants likely aware of their assigned intervention, as they were assigned following an informed consent process which detailed both conditions. Intervention was implemented by postgraduate clinicians and it is unclear if these clinicians are also authors of or involved in the evaluation. No deviations from intended intervention reported. The analysis was appropriate to estimate the effect of assignment to the intervention (intention-totreat/modified intention-to-treat analyses).

Missing outcome data
Low risk Data reported for all randomised participants.

Outcome measurement
High risk Comment: method of measurement appropriate, measurement of direct outcomes unlikely to change between groups or be influenced by knowledge of intervention received by participants as the assessment methods are standardised. Outcome assessors unaware of assignment. Quote: 'Repeated follow-up measures were administered 7 months after enrolment, which was usually 2 to 3 months after treatment concluded, by psychologists blinded to the child's group assignment' (p. 23). However, intervention participants (parents) completed a self-report measure of the outcome eligible for this review and would have known their assigned condition. It could be argued that parents in the treatment group would have expectations or be vigilant to improvements in the eligible outcome (behavioural indicators of executive function) and those in the comparison condition may expect no change.

Characteristics of ongoing studies
Louw (2019) Study name A randomised control trial of a custom developed computer game to improve executive functioning in 4-to 6-year-old children exposed to alcohol in utero

SOURCES OF SUPPORT
Internal sources