RÉSUMÉ
Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.
RÉSUMÉ
The subseafloor marine biosphere may be one of the largest reservoirs of microbial biomass on Earth and has recently been the subject of debate in terms of the composition of its microbial inhabitants, particularly on sediments from the Peru Margin. A metagenomic analysis was made by using whole-genome amplification and pyrosequencing of sediments from Ocean Drilling Program Site 1229 on the Peru Margin to further explore the microbial diversity and overall community composition within this environment. A total of 61.9 Mb of genetic material was sequenced from sediments at horizons 1, 16, 32, and 50 m below the seafloor. These depths include sediments from both primarily sulfate-reducing methane-generating regions of the sediment column. Many genes of the annotated genes, including those encoding ribosomal proteins, corresponded to those from the Chloroflexi and Euryarchaeota. However, analysis of the 16S small-subunit ribosomal genes suggests that Crenarchaeota are the abundant microbial member. Quantitative PCR confirms that uncultivated Crenarchaeota are indeed a major microbial group in these subsurface samples. These findings show that the marine subsurface is a distinct microbial habitat and is different from environments studied by metagenomics, especially because of the predominance of uncultivated archaeal groups.