Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Nature ; 538(7624): 207-214, 2016 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-27654914

RESUMEN

The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama-Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25-40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated ~10-32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama-Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51-72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.


Asunto(s)
Genoma Humano/genética , Genómica , Nativos de Hawái y Otras Islas del Pacífico/genética , Filogenia , Grupos Raciales/genética , África/etnología , Australia , Conjuntos de Datos como Asunto , Clima Desértico , Flujo Génico , Genética de Población , Historia Antigua , Migración Humana/historia , Humanos , Lenguaje , Nueva Guinea , Dinámica Poblacional , Tasmania
2.
Proc Natl Acad Sci U S A ; 114(32): E6498-E6506, 2017 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-28716916

RESUMEN

Although situated ∼400 km from the east coast of Africa, Madagascar exhibits cultural, linguistic, and genetic traits from both Southeast Asia and Eastern Africa. The settlement history remains contentious; we therefore used a grid-based approach to sample at high resolution the genomic diversity (including maternal lineages, paternal lineages, and genome-wide data) across 257 villages and 2,704 Malagasy individuals. We find a common Bantu and Austronesian descent for all Malagasy individuals with a limited paternal contribution from Europe and the Middle East. Admixture and demographic growth happened recently, suggesting a rapid settlement of Madagascar during the last millennium. However, the distribution of African and Asian ancestry across the island reveals that the admixture was sex biased and happened heterogeneously across Madagascar, suggesting independent colonization of Madagascar from Africa and Asia rather than settlement by an already admixed population. In addition, there are geographic influences on the present genomic diversity, independent of the admixture, showing that a few centuries is sufficient to produce detectable genetic structure in human populations.


Asunto(s)
Pueblo Asiatico/genética , Población Negra/genética , Etnicidad/genética , Variación Genética , Genoma Humano , Estudio de Asociación del Genoma Completo , Anciano , Femenino , Humanos , Madagascar/etnología , Masculino , Persona de Mediana Edad
3.
Proc Natl Acad Sci U S A ; 112(8): 2491-6, 2015 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-25675502

RESUMEN

Heteroplasmy in human mtDNA may play a role in cancer, other diseases, and aging, but patterns of heteroplasmy variation across different tissues have not been thoroughly investigated. Here, we analyzed complete mtDNA genome sequences at ∼3,500× average coverage from each of 12 tissues obtained at autopsy from each of 152 individuals. We identified 4,577 heteroplasmies (with an alternative allele frequency of at least 0.5%) at 393 positions across the mtDNA genome. Surprisingly, different nucleotide positions (nps) exhibit high frequencies of heteroplasmy in different tissues, and, moreover, heteroplasmy is strongly dependent on the specific consensus allele at an np. All of these tissue-related and allele-related heteroplasmies show a significant age-related accumulation, suggesting positive selection for specific alleles at specific positions in specific tissues. We also find a highly significant excess of liver-specific heteroplasmies involving nonsynonymous changes, most of which are predicted to have an impact on protein function. This apparent positive selection for reduced mitochondrial function in the liver may reflect selection to decrease damaging byproducts of liver mitochondrial metabolism (i.e., "survival of the slowest"). Overall, our results provide compelling evidence for positive selection acting on some somatic mtDNA mutations.


Asunto(s)
Alelos , ADN Mitocondrial/genética , Mutación/genética , Especificidad de Órganos/genética , Selección Genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Secuencia de Bases , Niño , Preescolar , Humanos , Lactante , Recién Nacido , Hígado/metabolismo , Persona de Mediana Edad , Datos de Secuencia Molecular , Adulto Joven
4.
BMC Genomics ; 17: 139, 2016 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-26920804

RESUMEN

BACKGROUND: Minor allele detection in very high coverage sequence data (>1000X) has many applications such as detecting mtDNA heteroplasmy, somatic mutations in cancer or tumors, SNP calling in pool sequencing, etc., where reads with low frequency are not necessarily sequence error but may instead convey biological information. However, the suitability of common base quality recalibration tools for such applications has not been investigated in detail. RESULTS: We show that the widely used tool GATK BaseRecalibration has several limitations in minor allele detection. First, GATK IndelRealignment fails to work if the sequence coverage is above a certain level since it then becomes computationally infeasible. Second, the accuracy of the base quality largely depends on the database of known SNPs as the control, which limits the ability of de novo minor allele detection. Third, GATK reduces the base quality of sequence errors at the cost of reducing scores for true minor alleles. To overcome these limitations, we present a novel approach called SEGREG, which applies segmented regression to control sequences (e.g. phiX174 DNA) spiked into a sequencing run. Based on simulations SEGREG improves both the accuracy of base quality scores and the detection of minor alleles. We further investigate sequence error and recalibration parameters by applying a Logarithm Likelihood Ratio (LLR) approach to SEGREG recalibrated base quality scores for phiX174 DNA sequenced to very high coverage, and for mtDNA genome sequences previously analyzed for heteroplasmic variants. CONCLUSIONS: Our results suggest that SEGREG improves base recalibration without suffering the limitations discussed above, and the LLR approach benefits from SEGREG in identifying more true minor alleles, while avoiding false positives from sequencing error.


Asunto(s)
Alelos , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Calibración , Simulación por Computador , ADN Mitocondrial/genética , Genoma Mitocondrial , Humanos , Funciones de Verosimilitud , Polimorfismo de Nucleótido Simple , Proteínas Virales/genética
5.
Hum Genet ; 135(5): 541-553, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-27043341

RESUMEN

The recent availability of large-scale sequence data for the human Y chromosome has revolutionized analyses of and insights gained from this non-recombining, paternally inherited chromosome. However, the studies to date focus on Eurasian variation, and hence the diversity of early-diverging branches found in Africa has not been adequately documented. Here, we analyze over 900 kb of Y chromosome sequence obtained from 547 individuals from southern African Khoisan- and Bantu-speaking populations, identifying 232 new sequences from basal haplogroups A and B. We identify new clades in the phylogeny, an older age for the root, and substantially older ages for some individual haplogroups. Furthermore, while haplogroup B2a is traditionally associated with the spread of Bantu speakers, we find that it probably also existed in Khoisan groups before the arrival of Bantu speakers. Finally, there is pronounced variation in branch length between major haplogroups; in particular, haplogroups associated with Bantu speakers have significantly longer branches. Technical artifacts cannot explain this branch length variation, which instead likely reflects aspects of the demographic history of Bantu speakers, such as recent population expansion and an older average paternal age. The influence of demographic factors on branch length variation has broader implications both for the human Y phylogeny and for similar analyses of other species.


Asunto(s)
Población Negra/genética , Cromosomas Humanos Y/genética , Variación Genética/genética , Genética de Población , Haplotipos/genética , África , Humanos , Filogenia
6.
Nat Commun ; 14(1): 2734, 2023 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-37173341

RESUMEN

Formalin-fixed paraffin-embedded (FFPE) tissues constitute a vast and valuable patient material bank for clinical history and follow-up data. It is still challenging to achieve single cell/nucleus RNA (sc/snRNA) profile in FFPE tissues. Here, we develop a droplet-based snRNA sequencing technology (snRandom-seq) for FFPE tissues by capturing full-length total RNAs with random primers. snRandom-seq shows a minor doublet rate (0.3%), a much higher RNA coverage, and detects more non-coding RNAs and nascent RNAs, compared with state-of-art high-throughput scRNA-seq technologies. snRandom-seq detects a median of >3000 genes per nucleus and identifies 25 typical cell types. Moreover, we apply snRandom-seq on a clinical FFPE human liver cancer specimen and reveal an interesting subpopulation of nuclei with high proliferative activity. Our method provides a powerful snRNA-seq platform for clinical FFPE specimens and promises enormous applications in biomedical research.


Asunto(s)
Formaldehído , Perfilación de la Expresión Génica , Humanos , Perfilación de la Expresión Génica/métodos , Adhesión en Parafina/métodos , Fijación del Tejido/métodos , Análisis de Secuencia de ARN/métodos , ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Nuclear Pequeño
7.
BMC Bioinformatics ; 12: 347, 2011 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-21851598

RESUMEN

BACKGROUND: Comparing biological time series data across different conditions, or different specimens, is a common but still challenging task. Algorithms aligning two time series represent a valuable tool for such comparisons. While many powerful computation tools for time series alignment have been developed, they do not provide significance estimates for time shift measurements. RESULTS: Here, we present an extended version of the original DTW algorithm that allows us to determine the significance of time shift estimates in time series alignments, the DTW-Significance (DTW-S) algorithm. The DTW-S combines important properties of the original algorithm and other published time series alignment tools: DTW-S calculates the optimal alignment for each time point of each gene, it uses interpolated time points for time shift estimation, and it does not require alignment of the time-series end points. As a new feature, we implement a simulation procedure based on parameters estimated from real time series data, on a series-by-series basis, allowing us to determine the false positive rate (FPR) and the significance of the estimated time shift values. We assess the performance of our method using simulation data and real expression time series from two published primate brain expression datasets. Our results show that this method can provide accurate and robust time shift estimates for each time point on a gene-by-gene basis. Using these estimates, we are able to uncover novel features of the biological processes underlying human brain development and maturation. CONCLUSIONS: The DTW-S provides a convenient tool for calculating accurate and robust time shift estimates at each time point for each gene, based on time series data. The estimates can be used to uncover novel biological features of the system being studied. The DTW-S is freely available as an R package TimeShift at http://www.picb.ac.cn/Comparative/data.html.


Asunto(s)
Algoritmos , Simulación por Computador , Perfilación de la Expresión Génica , Corteza Prefrontal/crecimiento & desarrollo , Corteza Prefrontal/metabolismo , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Animales , Cerebelo/crecimiento & desarrollo , Cerebelo/metabolismo , Niño , Preescolar , Regulación del Desarrollo de la Expresión Génica , Humanos , Lactante , Macaca mulatta/genética , Macaca mulatta/metabolismo , Persona de Mediana Edad , Pan troglodytes/embriología , Pan troglodytes/metabolismo , Primates , Tiempo , Adulto Joven
8.
J Comput Biol ; 19(6): 766-75, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22697246

RESUMEN

Bioinformatics analyses frequently yield results in the form of lists of genes sorted by, for example, sequence similarity to a query sequence or degree of differential expression of a gene upon a change of cellular condition. Comparison of such results may depend strongly on the particular scoring system throughout the entire list, although the crucial information resides in which genes are ranked at the top of the list. Here, we propose to reduce the lists to the mere ranking of the genes and to compare only the ranked lists. To this end, we introduce a measure of similarity between ranked lists. Our measure puts particular emphasis on finding the same items near the top of the list, while the genes further down should not have a strong influence. Our approach can be understood as a special version of a two-dimensional Kolmogorov-Smirnov statistic. We present a dynamic programming algorithm for its computation and study the distribution of the similarity values. The performance on simulated and on real biological data is studied in comparison to other available measures. Supplementary Material is available online (www.liebertonline.com/cmb).


Asunto(s)
Algoritmos , Biología Computacional/estadística & datos numéricos , Expresión Génica , Programas Informáticos , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Estadísticas no Paramétricas
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda