BACKGROUND: The Geriatric Depression Scale (GDS) is a widely used instrument to assess depression in older adults. The short GDS versions that have four (GDS-4) and five items (GDS-5) represent alternatives for depression screening in limited-resource settings. However, their accuracy remains uncertain. OBJECTIVE: To assess the accuracy of the GDS-4 and GDS-5 versions for depression screening in older adults. METHODS: Until May 2020, we systematically searched PubMed, PsycINFO, Scopus, and Google Scholar; for studies that have assessed the sensitivity and specificity of GDS-4 and GDS-5 for depression screening in older adults. We conducted meta-analyses of the sensitivity and specificity of those studies that used the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases-10 (ICD-10) as reference standard. Study quality was assessed with the QUADAS-2 tool. We performed bivariate random-effects meta-analyses to calculate the pooled sensitivity and specificity with their 95% confidence intervals (95% CI) at each reported common cut-off. For the overall meta-analyses, we evaluated each GDS-4 version or GDS-5 version separately by each cut-off, and for investigations of heterogeneity, we assessed altogether across similar GDS versions by each cut-off. Also, we assessed the certainty of evidence using the GRADE methodology. RESULTS: Twenty-three studies were included and meta-analyzed, assessing eleven different GDS versions. The number of participants included was 5048. When including all versions together, at a cut-off 2, GDS-4 had a pooled sensitivity of 0.77 (95% CI: 0.70-0.82) and a pooled specificity of 0.75 (0.68-0.81); while GDS-5 had a pooled sensitivity of 0.85 (0.80-0.90) and a pooled specificity of 0.75 (0.69-0.81). We found results for more than one GDS-4 version at cut-off points 1, 2, and 3; and for more than one GDS-5 version at cut-off points 1, 2, 3, and 4. Mostly, significant subgroup differences at different test thresholds across versions were found. The accuracy of the different GDS-4 and GDS-5 versions showed a high heterogeneity. There was high risk of bias in the index test domain. Also, the certainty of the evidence was low or very low for most of the GDS versions. CONCLUSIONS: We found several GDS-4 and GDS-5 versions that showed great heterogeneity in estimates of sensitivity and specificity, mostly with a low or very low certainty of the evidence. Altogether, our results indicate the need for more well-designed studies that compare different GDS versions.