RESUMO
Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.
Assuntos
Bancos de Espécimes Biológicos , Genética Médica , Genoma Humano , Genômica , Hispânico ou Latino , Humanos , Glicemia/genética , Glicemia/metabolismo , Estatura/genética , Índice de Massa Corporal , Interação Gene-Ambiente , Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla , Hispânico ou Latino/classificação , Hispânico ou Latino/genética , Homozigoto , México , Fenótipo , Triglicerídeos/sangue , Triglicerídeos/genética , Reino Unido , Genoma Humano/genéticaRESUMO
A detailed knowledge of gene function in the monarch butterfly is still lacking. Here we generate a genome assembly from a Mexican nonmigratory population and used RNA-seq data from 14 biological samples for gene annotation and to construct an atlas portraying the breadth of gene expression during most of the monarch life cycle. Two thirds of the genes show expression changes, with long noncoding RNAs being particularly finely regulated during adulthood, and male-biased expression being four times more common than female-biased. The two portions of the monarch heterochromosome Z, one ancestral to the Lepidoptera and the other resulting from a chromosomal fusion, display distinct association with sex-biased expression, reflecting sample-dependent incompleteness or absence of dosage compensation in the ancestral but not the novel portion of the Z. This study presents extended genomic and transcriptomic resources that will facilitate a better understanding of the monarch's adaptation to a changing environment.
Assuntos
Borboletas/genética , Mecanismo Genético de Compensação de Dose , Transcriptoma , Animais , Feminino , Genoma , Masculino , RNA Longo não Codificante/fisiologiaRESUMO
Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.