Búsqueda | Portal de Búsqueda de la BVS Colombia

Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals.

Tournebize, Rémi; Chu, Gillian; Moorjani, Priya.

PLoS Genet ; 18(6): e1010243, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35737729

RESUMEN

Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups--including Native Americans, Oceanians and South Asians--have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time--with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events.

Asunto(s)

Pueblo Asiatico , Genética de Población , Alelos , Animales , Perros , Etnicidad , Efecto Fundador , Humanos , Densidad de Población

UPP2: fast and accurate alignment of datasets with fragmentary sequences.

Park, Minhyuk; Ivanovic, Stefan; Chu, Gillian; Shen, Chengze; Warnow, Tandy.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36625535

RESUMEN

MOTIVATION: Multiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. Ultra-large multiple sequence alignment using Phylogeny-aware Profiles (UPP) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full-length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets. RESULTS: We present UPP2, a direct improvement on UPP. The main advance is a fast technique for selecting HMMs in the ensemble that allows us to achieve the same accuracy as UPP but with greatly reduced runtime. We show that UPP2 produces more accurate alignments compared to leading MSA methods on datasets exhibiting substantial sequence length heterogeneity and is among the most accurate otherwise. AVAILABILITY AND IMPLEMENTATION: https://github.com/gillichu/sepp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Programas Informáticos , Alineación de Secuencia , Filogenia

CNAViz: An interactive webtool for user-guided segmentation of tumor DNA sequencing data.

Lalani, Zubair; Chu, Gillian; Hsu, Silas; Kagawa, Shaw; Xiang, Michael; Zaccaria, Simone; El-Kebir, Mohammed.

PLoS Comput Biol ; 18(10): e1010614, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-36228003

RESUMEN

Copy-number aberrations (CNAs) are genetic alterations that amplify or delete the number of copies of large genomic segments. Although they are ubiquitous in cancer and, thus, a critical area of current cancer research, CNA identification from DNA sequencing data is challenging because it requires partitioning of the genome into complex segments with the same copy-number states that may not be contiguous. Existing segmentation algorithms address these challenges either by leveraging the local information among neighboring genomic regions, or by globally grouping genomic regions that are affected by similar CNAs across the entire genome. However, both approaches have limitations: overclustering in the case of local segmentation, or the omission of clusters corresponding to focal CNAs in the case of global segmentation. Importantly, inaccurate segmentation will lead to inaccurate identification of CNAs. For this reason, most pan-cancer research studies rely on manual procedures of quality control and anomaly correction. To improve copy-number segmentation, we introduce CNAViz, a web-based tool that enables the user to simultaneously perform local and global segmentation, thus overcoming the limitations of each approach. Using simulated data, we demonstrate that by several metrics, CNAViz allows the user to obtain more accurate segmentation relative to existing local and global segmentation methods. Moreover, we analyze six bulk DNA sequencing samples from three breast cancer patients. By validating with parallel single-cell DNA sequencing data from the same samples, we show that by using CNAViz, our user was able to obtain more accurate segmentation and improved accuracy in downstream copy-number calling.

Asunto(s)

Neoplasias de la Mama , Neoplasias , Humanos , Femenino , Variaciones en el Número de Copia de ADN/genética , Neoplasias/genética , Algoritmos , Análisis de Secuencia de ADN , ADN de Neoplasias , Neoplasias de la Mama/genética

Evaluation of a Media Training Workshop for Nutrition Students and Trainees in Nova Scotia.

Harvey, Antonia; Chu, Gillian; Lordly, Daphne; Arsenault, Judy Fraser; Conlan, Sue; Laidlaw, Tess; Wadsworth, Laurie A; Grant, Shannan.

Can J Diet Pract Res ; 84(2): 112-118, 2023 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-36862844

RESUMEN

Gaps in communication training have been identified in Canadian and international academic and practicum dietetics programs. A workshop was developed to pilot supplementary media training to nutrition students/trainees studying in Nova Scotia. Students, interns, and faculty from two universities participated in the workshop. Data on perceived learning, media knowledge/skill use, and workshop feedback were collected immediately post-workshop using a mixed-form questionnaire. A modified questionnaire was administered eight months post-workshop to obtain information on utility of the perceived acquired knowledge/skills. Closed-ended responses underwent descriptive analysis, while open-ended responses underwent thematic analysis. Twenty-eight participants completed the questionnaire post-workshop, and six completed it at follow-up. All participants rated the workshop positively (7-point Likert scale) and reported learning something new (perceived). Perceived learning emphasized general media knowledge/skills and communication skills. Follow-up data suggested participants had applied perceived media knowledge/skills in message development and media and job interviews. These data suggest that nutrition students/trainees may benefit from supplementary communications and media training and provide a stimulus for ongoing curriculum review and discussion.

Asunto(s)

Curriculum , Estudiantes , Humanos , Nueva Escocia , Aprendizaje , Encuestas y Cuestionarios

Maximum Likelihood Inference of Time-scaled Cell Lineage Trees with Mixed-type Missing Data.

Mai, Uyen; Chu, Gillian; Raphael, Benjamin J.

bioRxiv ; 2024 Mar 23.

Artículo en Inglés | MEDLINE | ID: mdl-38496496

RESUMEN

Recent dynamic lineage tracing technologies combine CRISPR-based genome editing with single-cell sequencing to track cell divisions during development. A key computational problem in dynamic lineage tracing is to infer a cell lineage tree from the measured CRISPR-induced mutations. Three features of dynamic lineage tracing data distinguish this problem from standard phylogenetic tree inference. First, the CRISPR-editing process modifies a genomic location exactly once. This non-modifiable property is not well described by the time-reversible models commonly used in phylogenetics. Second, as a consequence of non-modifiability, the number of mutations per time unit decreases over time. Third, CRISPR-based genome-editing and single-cell sequencing results in high rates of both heritable and non-heritable (dropout) missing data. To model these features, we introduce the Probabilistic Mixed-type Missing (PMM) model. We describe an algorithm, LAML (Lineage Analysis via Maximum Likelihood), to search for the maximum likelihood (ML) tree under the PMM model. LAML combines an Expectation Maximization (EM) algorithm with a heuristic tree search to jointly estimate tree topology, branch lengths and missing data parameters. We derive a closed-form solution for the M-step in the case of no heritable missing data, and a block coordinate ascent approach in the general case which is more efficient than the standard General Time Reversible (GTR) phylogenetic model. On simulated data, LAML infers more accurate tree topologies and branch lengths than existing methods, with greater advantages on datasets with higher ratios of heritable to non-heritable missing data. We show that LAML provides unbiased time-scaled estimates of branch lengths. In contrast, we demonstrate that maximum parsimony methods for lineage tracing data not only underestimate branch lengths, but also yield branch lengths which are not proportional to time, due to the nonlinear decay in the number of mutations on branches further from the root. On lineage tracing data from a mouse model of lung adenocarcinoma, we show that LAML infers phylogenetic distances that are more concordant with gene expression data compared to distances derived from maximum parsimony. The LAML tree topology is more plausible than existing published trees, with fewer total cell migrations between distant metastases and fewer reseeding events where cells migrate back to the primary tumor. Crucially, we identify three distinct time epochs of metastasis progression, which includes a burst of metastasis events to various anatomical sites during a single month.

SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement.

Chu, Gillian; Warnow, Tandy.

Bioinform Adv ; 3(1): vbad008, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36818728

RESUMEN

Summary: Phylogenetic placement is the problem of placing 'query' sequences into an existing tree (called a 'backbone tree'). One of the most accurate phylogenetic placement methods to date is the maximum likelihood-based method pplacer, using RAxML to estimate numeric parameters on the backbone tree and then adding the given query sequence to the edge that maximizes the probability that the resulting tree generates the query sequence. Unfortunately, this way of running pplacer fails to return valid outputs on many moderately large backbone trees and so is limited to backbone trees with at most â¼10 000 leaves. SCAMPP is a technique to enable pplacer to run on larger backbone trees, which operates by finding a small 'placement subtree' specific to each query sequence, within which the query sequence are placed using pplacer. That approach matched the scalability and accuracy of APPLES-2, the previous most scalable method. Here, we explore a different aspect of pplacer's strategy: the technique used to estimate numeric parameters on the backbone tree. We confirm anecdotal evidence that using FastTree instead of RAxML to estimate numeric parameters on the backbone tree enables pplacer to scale to much larger backbone trees, almost (but not quite) matching the scalability of APPLES-2 and pplacer-SCAMPP. We then evaluate the combination of these two techniques-SCAMPP and the use of FastTree. We show that this combined approach, pplacer-SCAMPP-FastTree, has the same scalability as APPLES-2, improves on the scalability of pplacer-FastTree and achieves better accuracy than the comparably scalable methods. Availability and implementation: https://github.com/gillichu/PLUSplacer-taxtastic. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA