ABSTRACT
Cervical cancer (CC) poses a significant burden on individuals in developing regions, exhibiting heterogeneous responses to standard chemoradiation therapy, and contributing to substantial mortality rates. Unraveling host immune dynamics holds promise for innovative therapies and discovery of clinically relevant biomarkers. We studied prospectively locally advanced CC patients pre-treatment, stratifying them as responders (R) or non-responders (NR). R patients had increased tumor-infiltrating lymphocytes (TILs), while NR patients showed elevated PD-1 scores, CD8+ and PD-L2+ TILs, and PD-L1 immune reactivity. NR patients exhibited higher systemic soluble mediators correlating with TIL immune markers. R patients demonstrated functional polarization of CD4 T cells (Th1, Th2, Th17, and Treg), while CD8+ T cells and CD68+ macrophages predominated in the NR group. Receiver operating characteristic analysis identified potential CC response predictors, including PD-L1-immunoreactive (IR) area, PD-L2, CD8, FGF-basic, IL-7, IL-8, IL-12p40, IL-15, and TNF-alpha. Dysfunctional TILs and imbalanced immune mediators contribute to therapeutic insufficiency, shedding light on local and systemic immune interplay. Our study informs immunological signatures for treatment prediction and CC prognosis.
Subject(s)
Uterine Cervical Neoplasms , Female , Humans , Uterine Cervical Neoplasms/therapy , B7-H1 Antigen , Prognosis , CD8-Positive T-Lymphocytes , Immunologic Factors , Lymphocytes, Tumor-Infiltrating , Biomarkers, TumorABSTRACT
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.
Subject(s)
Genomics , Metagenomics , Aged , Brazil/epidemiology , Genome, Human/genetics , Genomics/methods , Humans , Polymorphism, Single Nucleotide , Whole Genome SequencingABSTRACT
PDE4B (phosphodiesterase-4B) has an important role in cancer and in pharmacology of some disorders, such as inflammatory diseases. Remarkably in Native Americans, PDE4B variants are associated with acute lymphoblastic leukemia (ALL) relapse, as this gene modulates sensitivity of glucocorticoids used in ALL chemotherapy. PDE4B allele rs6683977.G, associated with genomic regions of Native American origin in US-Hispanics (admixed among Native Americans, Europeans, and Africans), increases ALL relapse risk, contributing to an association between Native American ancestry and ALL relapse that disappeared with an extra-phase of chemotherapy. This result insinuates that indigenous populations along the Americas may have high frequencies of rs6683977.G, but this has never been corroborated. We studied ancestry and PDE4B diversity in 951 healthy individuals from nine Latin American populations. In non-admixed Native American populations rs6683977.G has frequencies greater than 90%, is in linkage disequilibrium with other ALL relapse associated and regulatory variants in PDE4B-intron-7, conforming haplotypes showing their highest worldwide frequencies in Native Americans (>0.82). Our findings inform the discussion on the pertinence of an extra-phase of chemotherapy in Native American populations, and exemplifies how knowledge generated in US-Hispanics is relevant for their even more neglected and vulnerable Native American ancestors along the American continent.
Subject(s)
Cyclic Nucleotide Phosphodiesterases, Type 4 , Neoplasms , Pharmacogenetics , Cyclic Nucleotide Phosphodiesterases, Type 4/genetics , Genetics, Population , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Recurrence , American Indian or Alaska NativeABSTRACT
The emergence of diverse lineages harboring mutations with functional significance and potentially enhanced transmissibility imposes an increased difficulty on the containment of the SARS-CoV-2 pandemic [...].
Subject(s)
COVID-19/epidemiology , COVID-19/virology , SARS-CoV-2/genetics , Brazil/epidemiology , COVID-19/transmission , Genome, Viral/genetics , Humans , Mutation , Phylogeny , Phylogeography , SARS-CoV-2/classificationABSTRACT
Cervical cancer (CC) represents a major global health issue, particularly impacting women from resource constrained regions worldwide. Treatment refractoriness to standard chemoradiotheraphy has identified cancer stem cells as critical coordinators behind the biological mechanisms of resistance, contributing to CC recurrence. In this work, we evaluated differential gene expression in cervical cancer stem-like cells (CCSC) as biomarkers related to intrinsic chemoradioresistance in CC. A total of 31 patients with locally advanced CC and referred to Mário Penna Institute (Belo Horizonte, Brazil) from August 2017 to May 2018 were recruited for the study. Fluorescence-activated cell sorting was used to enrich CD34+/CD45- CCSC from tumor biopsies. Transcriptome was performed using ultra-low input RNA sequencing and differentially expressed genes (DEGs) using Log2 fold differences and adjusted p-value < 0.05 were determined. The analysis returned 1050 DEGs when comparing the Non-Responder (NR) (n=10) and Responder (R) (n=21) groups to chemoradiotherapy. These included a wide-ranging pattern of underexpressed coding genes in the NR vs. R patients and a panel of lncRNAs and miRNAs with implications for CC tumorigenesis. A panel of biomarkers was selected using the rank-based AUC (Area Under the ROC Curve) and pAUC (partial AUC) measurements for diagnostic sensitivity and specificity. Genes overlapping between the 21 highest AUC and pAUC loci revealed seven genes with a strong capacity for identifying NR vs. R patients (ILF2, RBM22P2, ACO16722.1, AL360175.1 and AC092354.1), of which four also returned significant survival Hazard Ratios. This study identifies DEG signatures that provide potential biomarkers in CC prognosis and treatment outcome, as well as identifies potential alternative targets for cancer therapy.
ABSTRACT
BACKGROUND/OBJECTIVES: Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely underrepresented in genomic studies. Here, we study the genetic architecture of BMI in children, young adults, and elderly individuals from the admixed population of Brazil. SUBJECTS/METHODS: Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European, and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of body mass index (BMI) in three Brazilian population-based cohorts from Northeast (Salvador), Southeast (Bambuí), and South (Pelotas). RESULTS: We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p = 2.76e-06). This variant is rare in Europeans but with frequencies of ~3% in West Africa and has a strong female-specific effect (95% CI: 2.32-5.65 kg/m2 per each A allele). We confirmed this sex-specific association and replicated its strong effect for an adjusted fat mass index in the same Pelotas cohort, and for BMI in another Brazilian cohort from São Paulo (Southeast Brazil). A meta-analysis confirmed the significant association. Remarkably, we observed that while the frequency of rs114066381-A allele ranges from 0.8 to 2.1% in the studied populations, it attains ~9% among women with morbid obesity from Pelotas, São Paulo, and Bambuí. The effect size of rs114066381 is at least five times higher than the FTO SNPs rs9939609 and rs1558902, already emblematic for their high effects. CONCLUSIONS: We identified six candidate SNPs associated with BMI. rs114066381 stands out for its high effect that was replicated and its high frequency in women with morbid obesity. We demonstrate how admixed populations are a source of new relevant phenotype-associated genetic variants.
Subject(s)
Body Mass Index , Genetics, Population , Polymorphism, Single Nucleotide , Aged , Aged, 80 and over , Alleles , Brazil , Child , Child, Preschool , Chromosome Mapping , Female , Humans , Male , Middle Aged , Phenotype , Regulatory Sequences, Nucleic Acid , Sex Factors , Young AdultABSTRACT
The Transatlantic Slave Trade transported more than 9 million Africans to the Americas between the early 16th and the mid-19th centuries. We performed a genome-wide analysis using 6,267 individuals from 25 populations to infer how different African groups contributed to North-, South-American, and Caribbean populations, in the context of geographic and geopolitical factors, and compared genetic data with demographic history records of the Transatlantic Slave Trade. We observed that West-Central Africa and Western Africa-associated ancestry clusters are more prevalent in northern latitudes of the Americas, whereas the South/East Africa-associated ancestry cluster is more prevalent in southern latitudes of the Americas. This pattern results from geographic and geopolitical factors leading to population differentiation. However, there is a substantial decrease in the between-population differentiation of the African gene pool within the Americas, when compared with the regions of origin from Africa, underscoring the importance of historical factors favoring admixture between individuals with different African origins in the New World. This between-population homogenization in the Americas is consistent with the excess of West-Central Africa ancestry (the most prevalent in the Americas) in the United States and Southeast-Brazil, with respect to historical-demography expectations. We also inferred that in most of the Americas, intercontinental admixture intensification occurred between 1750 and 1850, which correlates strongly with the peak of arrivals from Africa. This study contributes with a population genetics perspective to the ongoing social, cultural, and political debate regarding ancestry, admixture, and the mestizaje process in the Americas.
Subject(s)
Black People/genetics , Enslavement/history , Gene Pool , Genome, Human , Human Migration/history , Africa , Americas , History, 16th Century , History, 17th Century , History, 18th Century , History, 19th Century , Humans , PhylogeographyABSTRACT
Age-related cognitive decline (ACD) is the gradual process of decreasing of cognitive function over age. Most genetic risk factors for ACD have been identified in European populations and there are no reports in admixed Latin American individuals. We performed admixture mapping, genome-wide association analysis (GWAS), and fine-mapping to examine genetic factors associated with 15-year cognitive trajectory in 1,407 Brazilian older adults, comprising 14,956 Mini-Mental State Examination measures. Participants were enrolled as part of the Bambuí-Epigen Cohort Study of Aging. Our admixture mapping analysis identified a genomic region (3p24.2) in which increased Native American ancestry was significantly associated with faster ACD. Fine-mapping of this region identified a single nucleotide polymorphism (SNP) rs142380904 (ß = -0.044, SE = 0.01, p = 7.5 × 10-5) associated with ACD. In addition, our GWAS identified 24 associated SNPs, most in genes previously reported to influence cognitive function. The top six associated SNPs accounted for 18.5% of the ACD variance in our data. Furthermore, our longitudinal study replicated previous GWAS hits for cognitive decline and Alzheimer's disease. Our 15-year longitudinal study identified both ancestry-specific and cosmopolitan genetic variants associated with ACD in Brazilians, highlighting the need for more trans-ancestry genomic studies, especially in underrepresented ethnic groups.
Subject(s)
Aging , Cognitive Dysfunction/genetics , Polymorphism, Single Nucleotide , Age Factors , Aged , Brazil/epidemiology , Cognition , Cognitive Dysfunction/etiology , Cohort Studies , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Male , Middle AgedABSTRACT
EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs (info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources.
Subject(s)
Genome, Human/genetics , Brazil , Computational Biology/methods , Genomics/methods , Haplotypes/genetics , Humans , Latin America , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results , Software , WorkflowABSTRACT
The Brazilian population is considered to be highly admixed. The main contributing ancestral populations were European and African, with Amerindians contributing to a lesser extent. The aims of this study were to provide a resource for determining and quantifying individual continental ancestry using the smallest number of SNPs possible, thus allowing for a cost- and time-efficient strategy for genomic ancestry determination. We identified and validated a minimum set of 192 ancestry informative markers (AIMs) for the genetic ancestry determination of Brazilian populations. These markers were selected on the basis of their distribution throughout the human genome, and their capacity of being genotyped on widely available commercial platforms. We analyzed genotyping data from 6487 individuals belonging to three Brazilian cohorts. Estimates of individual admixture using this 192 AIM panels were highly correlated with estimates using ~370 000 genome-wide SNPs: 91%, 92%, and 74% of, respectively, African, European, and Native American ancestry components. Besides that, 192 AIMs are well distributed among populations from these ancestral continents, allowing greater freedom in future studies with this panel regarding the choice of reference populations. We also observed that genetic ancestry inferred by AIMs provides similar association results to the one obtained using ancestry inferred by genomic data (370 K SNPs) in a simple regression model with rs1426654, related to skin pigmentation, genotypes as dependent variable. In conclusion, these markers can be used to identify and accurately quantify ancestry of Latin Americans or US Hispanics/Latino individuals, in particular in the context of fine-mapping strategies that require the quantification of continental ancestry in thousands of individuals.
Subject(s)
Genome, Human , Polymorphism, Single Nucleotide , Population/genetics , American Indian or Alaska Native , Black People , Brazil , Genetic Markers , Humans , Pedigree , Skin Pigmentation/genetics , White PeopleABSTRACT
BACKGROUND: Asthma is a chronic disease of the airways and, despite the advances in the knowledge of associated genetic regions in recent years, their mechanisms have yet to be explored. Several genome-wide association studies have been carried out in recent years, but none of these have involved Latin American populations with a high level of miscegenation, as is seen in the Brazilian population. METHODS: 1246 children were recruited from a longitudinal cohort study in Salvador, Brazil. Asthma symptoms were identified in accordance with an International Study of Asthma and Allergies in Childhood (ISAAC) questionnaire. Following quality control, 1,877,526 autosomal SNPs were tested for association with childhood asthma symptoms by logistic regression using an additive genetic model. We complemented the analysis with an estimate of the phenotypic variance explained by common genetic variants. Replications were investigated in independent Mexican and US Latino samples. RESULTS: Two chromosomal regions reached genome-wide significance level for childhood asthma symptoms: the 14q11 region flanking the DAD1 and OXA1L genes (rs1999071, MAF 0.32, OR 1.78, 95% CI 1.45-2.18, p-value 2.83 × 10(-8)) and 15q22 region flanking the FOXB1 gene (rs10519031, MAF 0.04, OR 3.0, 95% CI 2.02-4.49, p-value 6.68 × 10(-8) and rs8029377, MAF 0.03, OR 2.49, 95% CI 1.76-3.53, p-value 2.45 × 10(-7)). eQTL analysis suggests that rs1999071 regulates the expression of OXA1L gene. However, the original findings were not replicated in the Mexican or US Latino samples. CONCLUSIONS: We conclude that the 14q11 and 15q22 regions may be associated with asthma symptoms in childhood.
Subject(s)
Asthma/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Child , Child, Preschool , Chromosomes, Human, Pair 14/genetics , Female , Humans , Latin America , Male , Metabolic Networks and Pathways/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics , Principal Component AnalysisABSTRACT
BACKGROUND: Self-rated health (SRH) has strong predictive value for mortality in different contexts and cultures, but there is inconsistent evidence on ethnoracial disparities in SRH in Latin America, possibly due to the complexity surrounding ethnoracial self-classification. MATERIALS/METHODS: We used 370,539 Single Nucleotide Polymorphisms (SNPs) to examine the association between individual genomic proportions of African, European and Native American ancestry, and ethnoracial self-classification, with baseline and 10-year SRH trajectories in 1,311 community dwelling older Brazilians. We also examined whether genomic ancestry and ethnoracial self-classification affect the predictive value of SRH for subsequent mortality. RESULTS: European ancestry predominated among participants, followed by African and Native American (median = 84.0%, 9.6% and 5.3%, respectively); the prevalence of Non-White (Mixed and Black) was 39.8%. Persons at higher levels of African and Native American genomic ancestry, and those self-identified as Non-White, were more likely to report poor health than other groups, even after controlling for socioeconomic conditions and an array of self-reported and objective physical health measures. Increased risks for mortality associated with worse SRH trajectories were strong and remarkably similar (hazard ratio ~3) across all genomic ancestry and ethno-racial groups. CONCLUSIONS: Our results demonstrated for the first time that higher levels of African and Native American genomic ancestry--and the inverse for European ancestry--were strongly correlated with worse SRH in a Latin American admixed population. Both genomic ancestry and ethnoracial self-classification did not modify the strong association between baseline SRH or SRH trajectory, and subsequent mortality.
Subject(s)
Aging/physiology , Genome, Human , Health Status , Self-Assessment , Brazil/epidemiology , Cohort Studies , Follow-Up Studies , HumansABSTRACT
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.
Subject(s)
Genetics, Population , Mutation , Black People/genetics , Brazil , Humans , White People/geneticsABSTRACT
BACKGROUND: Archaeology reports millenary cultural contacts between Peruvian Coast-Andes and the Amazon Yunga, a rainforest transitional region between Andes and Lower Amazonia. To clarify the relationships between cultural and biological evolution of these populations, in particular between Amazon Yungas and Andeans, we used DNA-sequence data, a model-based Bayesian approach and several statistical validations to infer a set of demographic parameters. RESULTS: We found that the genetic diversity of the Shimaa (an Amazon Yunga population) is a subset of that of Quechuas from Central-Andes. Using the Isolation-with-Migration population genetics model, we inferred that the Shimaa ancestors were a small subgroup that split less than 5300 years ago (after the development of complex societies) from an ancestral Andean population. After the split, the most plausible scenario compatible with our results is that the ancestors of Shimaas moved toward the Peruvian Amazon Yunga and incorporated the culture and language of some of their neighbors, but not a substantial amount of their genes. We validated our results using Approximate Bayesian Computations, posterior predictive tests and the analysis of pseudo-observed datasets. CONCLUSIONS: We presented a case study in which model-based Bayesian approaches, combined with necessary statistical validations, shed light into the prehistoric demographic relationship between Andeans and a population from the Amazon Yunga. Our results offer a testable model for the peopling of this large transitional environmental region between the Andes and the Lower Amazonia. However, studies on larger samples and involving more populations of these regions are necessary to confirm if the predominant Andean biological origin of the Shimaas is the rule, and not the exception.
Subject(s)
Genetics, Population , Indians, South American/genetics , Bayes Theorem , Biological Evolution , Genetic Variation , Human Migration , Humans , Molecular Sequence Data , Population Groups , South AmericaABSTRACT
BACKGROUND: In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps. RESULTS: We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms. CONCLUSIONS: Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats.
Subject(s)
Computational Biology/methods , Software , Algorithms , Genome , Proteomics , WorkflowABSTRACT
Large-scale genomics initiatives such as the HapMap project and the 1000-genomes rely on powerful bioinformatics support to assist data production and analysis. Contrastingly, few bioinformatics platforms oriented to smaller research groups exist to store, handle, share, and integrate data from different sources, as well as to assist these scientists to perform their analyses efficiently. We developed such a bioinformatics platform, DIVERGENOME, to assist population genetics and genetic epidemiology studies performed by small- to medium-sized research groups. The platform is composed of two integrated components, a relational database (DIVERGENOMEdb), and a set of tools to convert data formats as required by popular software in population genetics and genetic epidemiology (DIVERGENOMEtools). In DIVERGENOMEdb, information on genotypes, polymorphism, laboratory protocols, individuals, populations, and phenotypes is organized in projects. These can be queried according to permissions. Here, we validated DIVERGENOME through a use case regarding the analysis of SLC2A4 genetic diversity in human populations. DIVERGENOME, with its intuitive Web interface and automatic data loading capability, facilitates its use by individuals without bioinformatics background, allowing complex queries to be easily interrogated and straightforward data format conversions (not available in similar platforms). DIVERGENOME is open source, freely available, and can be accessed online (pggenetica.icb.ufmg.br/divergenome) or hosted locally.
Subject(s)
Computational Biology/methods , Molecular Epidemiology , Algorithms , Automation , Brazil , Databases, Genetic , Genetic Variation , Genetics, Population , Genome, Human , Genome-Wide Association Study , Glucose Transporter Type 4/genetics , Humans , Internet , Phenotype , SoftwareABSTRACT
Elucidating the pattern of genetic diversity for non-European populations is necessary to make the benefits of human genetics research available to individuals from these groups. In the era of large human genomic initiatives, Native American populations have been neglected, in particular, the Quechua, the largest South Amerindian group settled along the Andes. We characterized the genetic diversity of a Quechua population in a global setting, using autosomal noncoding sequences (nine unlinked loci for a total of 16 kb), 351 unlinked SNPs and 678 microsatellites and tested predictions of the model of the evolution of Native Americans proposed by (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496). European admixture is <5% and African ancestry is barely detectable in the studied population. The largest genetic distances were between African versus Quechua or Melanesian populations, which is concordant with the African origin of modern humans and the fact that South America was the last part of the world to be peopled. The diversity in the Quechua population is comparable with that of Eurasian populations, and the allele frequency spectrum based on resequencing data does not reflect a reduction in the proportion of rare alleles. Thus, the Quechua population is a large reservoir of common and rare genetic variants of South Amerindians. These results are consistent with and complement our evolutionary model of South Amerindians (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496), proposed based on Y-chromosome data, which predicts high genomic diversity due to the high level of gene flow between Andean populations and their long-term effective population size.