RESUMO
Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications. In this review article, we summarize the most common methods of global and local genetic ancestry estimation and discuss the use of admixture mapping studies in human diseases.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pesquisa Biomédica , Loci Gênicos/genética , Genótipo , HumanosRESUMO
Despite the genetic resemblance of Canary Islanders to other southern European populations, their geographical isolation and the historical admixture of aborigines (from North Africa) with sub-Saharan Africans and Europeans have shaped a distinctive genetic makeup that likely affects disease susceptibility and health disparities. Based on single nucleotide polymorphism array data and whole genome sequencing (30×), we inferred that the last African admixture took place â¼14 generations ago and estimated that up to 34% of the Canary Islander genome is of recent African descent. The length of regions in homozygosis and the ancestry-related mosaic organization of the Canary Islander genome support the view that isolation has been strongest on the two smallest islands. Furthermore, several genomic regions showed significant and large deviations in African or European ancestry and were significantly enriched in genes involved in prevalent diseases in this community, such as diabetes, asthma, and allergy. The most prominent of these regions were located near LCT and the HLA, two well-known targets of selection, at which 40â50% of the Canarian genome is of recent African descent according to our estimates. Putative selective signals were also identified in these regions near the SLC6A11-SLC6A1, KCNMB2, and PCDH20-PCDH9 genes. Taken together, our findings provide solid evidence of a significant recent African admixture, population isolation, and adaptation in this part of Europe, with the favoring of African alleles in some chromosome regions. These findings may have medical implications for populations of recent African ancestry.
Assuntos
População Negra/genética , Genoma Humano , População Branca/genética , Predisposição Genética para Doença , Humanos , Ilhas , Polimorfismo de Nucleotídeo Único , Seleção Genética , Espanha , Sequenciamento Completo do GenomaRESUMO
Methicillin-resistant Staphylococcus aureus (MRSA) is one of the major human pathogens. It could carry numerous resistance genes and virulence factors in its genome, some of which are related to the severity of the infection. An observational, descriptive, cross-sectional study was designed to molecularly analyze MRSA isolates that cause invasive infections in Paraguayan children from 2009 to 2013. Ten representative MRSA isolates of the main clonal complex identified were analyzed with short-read paired-end sequencing and assessed for the virulome, resistome, and phylogenetic relationships. All the genetically linked MRSA isolates were recovered from diverse clinical sources, patients, and hospitals at broad gap periods. The pan-genomic analysis of these clones revealed three major and different clonal complexes (CC30, CC5, and CC8), each composed of clones closely related to each other. The CC30 genomes prove to be a successful clone, strongly installed and disseminated throughout our country, and closely related to other CC30 public genomes from the region and the world. The CC5 shows the highest genetic variability, and the CC8 carried the complete arginine catabolic mobile element (ACME), closely related to the USA300-NAE-ACME+, identified as the major cause of CA-MRSA infections in North America. Multiple virulence and resistance genes were identified for the first time in this study, highlighting the complex virulence profiles of MRSA circulating in the country. This study opens a wide range of new possibilities for future projects and trials to improve the existing knowledge on the epidemiology of MRSA circulating in Paraguay. IMPORTANCE: The increasing prevalence of methicillin-resistant Staphylococcus aureus (MRSA) is a public health problem worldwide. The most frequent MRSA clones identified in Paraguay in previous studies (including community and hospital acquired) were the Pediatric (CC5-ST5-IV), the Cordobes-Chilean (CC5-ST5-I), the SouthWest Pacific (CC30-ST30-IV), and the Brazilian (CC8-ST239-III) clones. In this study, the pan-genomic analysis of the most representative MRSA clones circulating in invasive infection in Paraguayan children over the years 2009-2013, such as the CC30-ST30-IV, CC5-ST5-IV, and CC8-ST8-IV, was carried out to evaluate their genetic diversity, their repertoire of virulence factors, and antimicrobial resistance determinants. This revealed multiple virulence and resistance genes, highlighting the complex virulence profiles of MRSA circulating in Paraguay. Our work is the first genomic study of MRSA in Paraguay and will contribute to the development of genomic surveillance in the region and our understanding of the global epidemiology of this pathogen.
Assuntos
Staphylococcus aureus Resistente à Meticilina , Infecções Estafilocócicas , Humanos , Criança , Infecções Estafilocócicas/tratamento farmacológico , Filogenia , Estudos Transversais , Paraguai/epidemiologia , Genômica , Fatores de Virulência/genética , Células Clonais , Testes de Sensibilidade Microbiana , Antibacterianos/uso terapêuticoRESUMO
In anthropological, medical, and forensic studies, the nonrecombinant region of the human Y chromosome (NRY) enables accurate reconstruction of pedigree relationships and retrieval of ancestral information. Using high-throughput sequencing (HTS) data, we present a benchmarking analysis of command-line tools for NRY haplogroup classification. The evaluation was performed using paired Illumina data from whole-genome sequencing (WGS) and whole-exome sequencing (WES) experiments from 50 unrelated donors. Additionally, as a validation, we also used paired WGS/WES datasets of 54 individuals from the 1000 Genomes Project. Finally, we evaluated the tools on data from third-generation HTS obtained from a subset of donors and one reference sample. Our results show that WES, despite typically offering less genealogical resolution than WGS, is an effective method for determining the NRY haplogroup. Y-LineageTracker and Yleaf showed the highest accuracy for WGS data, classifying precisely 98% and 96% of the samples, respectively. Yleaf outperforms all benchmarked tools in the WES data, classifying approximately 90% of the samples. Yleaf, Y-LineageTracker, and pathPhynder can correctly classify most samples (88%) sequenced with third-generation HTS. As a result, Yleaf provides the best performance for applications that use WGS and WES. Overall, our study offers researchers with a guide that allows them to select the most appropriate tool to analyze the NRY region using both second- and third-generation HTS data.
RESUMO
The conquest of the Canary Islands by Europeans began at the beginning of the 15th century and culminated in 1496 with the surrender of the aborigines. The collapse of the aboriginal population during the conquest and the arrival of settlers caused a drastic change in the demographic composition of the archipelago. To shed light on this historical process, we analyzed 896 mitogenomes of current inhabitants from the seven main islands. Our findings confirm the continuity of aboriginal maternal contributions and the persistence of their genetic footprints in the current population, even at higher levels (>60% on average) than previously evidenced. Moreover, the age estimates for most autochthonous founder lineages support a first aboriginal arrival to the islands at the beginning of the first millennium. We also revealed for the first time that the main recognizable genetic influences from Europe are from Portuguese and Galicians.
RESUMO
Next-generation sequencing (NGS) applications have flourished in the last decade, permitting the identification of cancer driver genes and profoundly expanding the possibilities of genomic studies of cancer, including melanoma. Here we aimed to present a technical review across many of the methodological approaches brought by the use of NGS applications with a focus on assessing germline and somatic sequence variation. We provide cautionary notes and discuss key technical details involved in library preparation, the most common problems with the samples, and guidance to circumvent them. We also provide an overview of the sequence-based methods for cancer genomics, exposing the pros and cons of targeted sequencing vs. exome or whole-genome sequencing (WGS), the fundamentals of the most common commercial platforms, and a comparison of throughputs and key applications. Details of the steps and the main software involved in the bioinformatics processing of the sequencing results, from preprocessing to variant prioritization and filtering, are also provided in the context of the full spectrum of genetic variation (SNVs, indels, CNVs, structural variation, and gene fusions). Finally, we put the emphasis on selected bioinformatic pipelines behind (a) short-read WGS identification of small germline and somatic variants, (b) detection of gene fusions from transcriptomes, and (c) de novo assembly of genomes from long-read WGS data. Overall, we provide comprehensive guidance across the main methodological procedures involved in obtaining sequencing results for the most common short- and long-read NGS platforms, highlighting key applications in melanoma research.
RESUMO
Sepsis is a severe systemic inflammatory response to infections that is accompanied by organ dysfunction. Although the ancestral genetic background is a relevant factor for sepsis susceptibility, there is a lack of studies using the genetic singularities of a recently admixed population to identify loci involved in sepsis susceptibility. Here we aimed to discover new sepsis loci by completing the first admixture mapping study of sepsis in Canary Islanders, leveraging their distinctive genetic makeup as a mixture of Europeans and African ancestries. We used a case-control approach and inferred local ancestry blocks from genome-wide data from 113,414 polymorphisms genotyped in 343 patients with sepsis and 410 unrelated controls, all ascertained for grandparental origin in the Canary Islands (Spain). Deviations in local ancestries between cases and controls were tested using logistic regressions, followed by fine-mapping analyses based on imputed genotypes, in silico functional assessments, and gene expression analysis centered on the region of interest. The admixture mapping analysis detected that local European ancestry in a locus spanning 1.2 megabases of chromosome 8p23.1 was associated with sepsis (lowest p = 1.37 × 10-4; Odds Ratio [OR] = 0.51; 95%CI = 0.40-0.66). Fine-mapping studies prioritized the variant rs13249564 within intron 1 of MFHAS1 gene associated with sepsis (p = 9.94 × 10-4; OR = 0.65; 95%CI = 0.50-0.84). Functional and gene expression analyses focused on 8p23.1 allowed us to identify alternative genes with possible biological plausibility such as defensins, which are well-known effector molecules of innate immunity. By completing the first admixture mapping study of sepsis, our results revealed a new genetic locus (8p23.1) harboring a number of genes with plausible implications in sepsis susceptibility.
RESUMO
The current inhabitants of the Canary Islands have a unique genetic makeup in the European diversity landscape due to the existence of African footprints from recent admixture events, especially of North African components (> 20%). The underrepresentation of non-Europeans in genetic studies and the sizable North African ancestry, which is nearly absent from all existing catalogs of worldwide genetic diversity, justify the need to develop CIRdb, a population-specific reference catalog of natural genetic variation in the Canary Islanders. Based on array genotyping of the selected unrelated donors and comparisons against available datasets from European, sub-Saharan, and North African populations, we illustrate the intermediate genetic differentiation of Canary Islanders between Europeans and North Africans and the existence of within-population differences that are likely driven by genetic isolation. Here we describe the overall design and the methods that are being implemented to further develop CIRdb. This resource will help to strengthen the implementation of Precision Medicine in this population by contributing to increase the diversity in genetic studies. Among others, this will translate into improved ability to fine map disease genes and simplify the identification of causal variants and estimate the prevalence of unattended Mendelian diseases.
Assuntos
População Negra , Variação Genética , África do Norte , Genética Populacional , Humanos , EspanhaRESUMO
The mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. We also assessed the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.
Assuntos
Biologia Computacional/métodos , DNA Mitocondrial/classificação , Benchmarking , Genoma Mitocondrial , Haplótipos , Humanos , Sequenciamento Completo do GenomaRESUMO
Whole-exome sequencing has become a popular technique in research and clinical settings, assisting in disease diagnosis and increasing the understanding of disease pathogenesis. In this study, we aimed to compare common enrichment capture solutions available in the market. Peripheral blood-purified DNA samples were enriched with SureSelectQXT V6 (Agilent) and various Illumina solutions: TruSeq DNA Nano, TruSeq DNA Exome, Nextera DNA Exome, and Illumina DNA Prep with Enrichment, and sequenced on a HiSeq 4000. We found that their percentage of duplicate reads was as much as 2 times higher than previously reported values for the previous HiSeq series. SureSelectQXT and Illumina DNA Prep with Enrichment showed the best average on-target coverage, which improved when off-target regions were included. At high coverage levels and in shared bases, these two solutions and TruSeq DNA Exome provided three of the best performances. With respect to the number of small variants detected, SureSelectQXT presented the lowest number of detected variants in target regions. When off-target regions were considered, its ability equalized to other solutions. Our results show SureSelectQXT and Illumina DNA Prep with Enrichment to be the best enrichment capture solutions.
RESUMO
The forensic use of X-STRs requires the creation of allele and haplotype frequency databases in the populations where they are going to be used. Recently, an updated Spanish allele and haplotype frequency database for the new 17 X-STR panel has been created, being the only database available up to now for this new multiplex. In order to broaden the forensic applicability of the 17 X-STR panel, 513 individuals from four different populations located on the Atlantic Coast of Europe and North-West Africa have been studied, i.e. Brittany (France), Ireland, northern Portugal, and Casablanca (Morocco). Allele and haplotype frequency databases, as well as parameters of forensic interest for these populations are presented. The obtained results showed that the 17 X-STR panel constitutes a highly discriminative tool for forensic identification and kinship testing in the studied populations. Furthermore, we aimed to study if these populations located on the Atlantic coast actually share alike allele and haplotype frequency distributions since they have experienced genetic exchanges throughout history. This would allow creating larger forensic databases that include several genetically similar populations for its use in forensic casework. For this purpose, pairwise FST genetic distances between the analyzed populations and others from the Atlantic Coast previously studied with the 17 X-STR panel or the ten coincident markers included in the decaplex of the GHEP-ISFG were estimated. Our results suggest that certain nearby populations located on the European Atlantic coast could have underwent episodes of genetic interchange as they have not shown statistically significant differentiation between them. However, the population of Casablanca showed significant differentiation with the majority of the European populations. Likewise, the autochthonous Basque Country and Brittany populations have shown distinctive allele frequency distributions between them. Therefore, these findings seem to support that the use of independent allele and haplotype frequency databases for each population instead of a global database would be more appropriate for forensic purposes.