Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Improving quartet graph construction for scalable and accurate species tree estimation from gene trees.

Han, Yunheng; Molloy, Erin K.

Genome Res ; 33(7): 1042-1052, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37197990

RESUMO

methods are widely used to estimate species trees from genome-scale data. However, they can fail to produce accurate species trees when the input gene trees are highly discordant because of estimation error and biological processes, such as incomplete lineage sorting. Here, we introduce TREE-QMC, a new summary method that offers accuracy and scalability under these challenging scenarios. TREE-QMC builds upon weighted Quartet Max Cut, which takes weighted quartets as input and then constructs a species tree in a divide-and-conquer fashion, at each step forming a graph and seeking its max cut. The wQMC method has been successfully leveraged in the context of species tree estimation by weighting quartets by their frequencies in the gene trees; we improve upon this approach in two ways. First, we address accuracy by normalizing the quartet weights to account for "artificial taxa" introduced during the divide phase so subproblem solutions can be combined during the conquer phase. Second, we address scalability by introducing an algorithm to construct the graph directly from the gene trees; this gives TREE-QMC a time complexity of [Formula: see text], where n is the number of species and k is the number of gene trees, assuming the subproblem decomposition is perfectly balanced. These contributions enable TREE-QMC to be highly competitive in terms of species tree accuracy and empirical runtime with the leading quartet-based methods, even outperforming them on some model conditions explored in our simulation study. We also present the application of these methods to an avian phylogenomics data set.

Assuntos

Algoritmos , Genoma , Filogenia , Simulação por Computador , Modelos Genéticos

2.

Single-cell methylation sequencing data reveal succinct metastatic migration histories and tumor progression models.

Liu, Yuelin; Li, Xuan Cindy; Rashidi Mehrabadi, Farid; Schäffer, Alejandro A; Pratt, Drew; Crawford, David R; Malikic, Salem; Molloy, Erin K; Gopalan, Vishaka; Mount, Stephen M; Ruppin, Eytan; Aldape, Kenneth D; Sahinalp, S Cenk.

Genome Res ; 33(7): 1089-1100, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37316351

RESUMO

Recent studies exploring the impact of methylation in tumor evolution suggest that although the methylation status of many of the CpG sites are preserved across distinct lineages, others are altered as the cancer progresses. Because changes in methylation status of a CpG site may be retained in mitosis, they could be used to infer the progression history of a tumor via single-cell lineage tree reconstruction. In this work, we introduce the first principled distance-based computational method, Sgootr, for inferring a tumor's single-cell methylation lineage tree and for jointly identifying lineage-informative CpG sites that harbor changes in methylation status that are retained along the lineage. We apply Sgootr on single-cell bisulfite-treated whole-genome sequencing data of multiregionally sampled tumor cells from nine metastatic colorectal cancer patients, as well as multiregionally sampled single-cell reduced-representation bisulfite sequencing data from a glioblastoma patient. We show that the tumor lineages constructed reveal a simple model underlying tumor progression and metastatic seeding. A comparison of Sgootr against alternative approaches shows that Sgootr can construct lineage trees with fewer migration events and with more in concordance with the sequential-progression model of tumor evolution, with a running time a fraction of that used in prior studies. Lineage-informative CpG sites identified by Sgootr are in inter-CpG island (CGI) regions, as opposed to intra-CGIs, which have been the main regions of interest in genomic methylation-related analyses.

Assuntos

Metilação de DNA , Neoplasias , Humanos , Metilação de DNA/genética , Sulfitos , Análise de Sequência de DNA/métodos , Genoma , Neoplasias/genética , Ilhas de CpG/genética

3.

Inferring population structure in biobank-scale genomic data.

Chiu, Alec M; Molloy, Erin K; Tan, Zilong; Talwalkar, Ameet; Sankararaman, Sriram.

Am J Hum Genet ; 109(4): 727-737, 2022 04 07.

Artigo em Inglês | MEDLINE | ID: mdl-35298920

RESUMO

Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.

Assuntos

Bancos de Espécimes Biológicos , Genética Populacional , Frequência do Gene/genética , Genômica , Humanos

4.

Genetic and behavioral differences between above and below ground Culex pipiens bioforms.

Bell, Katherine L; Noreuil, Anna; Molloy, Erin K; Fritz, Megan L.

Heredity (Edinb) ; 132(5): 221-231, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38424351

RESUMO

Efficiency of mosquito-borne disease transmission is dependent upon both the preference and fidelity of mosquitoes as they seek the blood of vertebrate hosts. While mosquitoes select their blood hosts through multi-modal integration of sensory cues, host-seeking is primarily an odor-guided behavior. Differences in mosquito responses to hosts and their odors have been demonstrated to have a genetic component, but the underlying genomic architecture of these responses has yet to be fully resolved. Here, we provide the first characterization of the genomic architecture of host preference in the polymorphic mosquito species, Culex pipiens. The species exists as two morphologically identical bioforms, each with distinct avian and mammalian host preferences. Cx. pipiens females with empirically measured host responses were prepared into reduced representation DNA libraries and sequenced to identify genomic regions associated with host preference. Multiple genomic regions associated with host preference were identified on all 3 Culex chromosomes, and these genomic regions contained clusters of chemosensory genes, as expected based on work in Anopheles gambiae complex mosquitoes and in Aedes aegypti. One odorant receptor and one odorant binding protein gene showed one-to-one orthologous relationships to differentially expressed genes in A. gambiae complex members with divergent host preferences. Overall, our work identifies a distinct set of odorant receptors and odorant binding proteins that may enable Cx. pipiens females to distinguish between their vertebrate blood host species, and opens avenues for future functional studies that could measure the unique contributions of each gene to host preference phenotypes.

Assuntos

Culex , Receptores Odorantes , Animais , Culex/genética , Culex/fisiologia , Feminino , Receptores Odorantes/genética , Comportamento Alimentar , Comportamento Animal

5.

Assessment of plasmids for relating the 2020 Salmonella enterica serovar Newport onion outbreak to farms implicated by the outbreak investigation.

Commichaux, Seth; Rand, Hugh; Javkar, Kiran; Molloy, Erin K; Pettengill, James B; Pightling, Arthur; Hoffmann, Maria; Pop, Mihai; Jayeola, Victor; Foley, Steven; Luo, Yan.

BMC Genomics ; 24(1): 165, 2023 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-37016310

RESUMO

BACKGROUND: The Salmonella enterica serovar Newport red onion outbreak of 2020 was the largest foodborne outbreak of Salmonella in over a decade. The epidemiological investigation suggested two farms as the likely source of contamination. However, single nucleotide polymorphism (SNP) analysis of the whole genome sequencing data showed that none of the Salmonella isolates collected from the farm regions were linked to the clinical isolates-preventing the use of phylogenetics in source identification. Here, we explored an alternative method for analyzing the whole genome sequencing data driven by the hypothesis that if the outbreak strain had come from the farm regions, then the clinical isolates would disproportionately contain plasmids found in isolates from the farm regions due to horizontal transfer. RESULTS: SNP analysis confirmed that the clinical isolates formed a single, nearly-clonal clade with evidence for ancestry in California going back a decade. The clinical clade had a large core genome (4,399 genes) and a large and sparsely distributed accessory genome (2,577 genes, at least 64% on plasmids). At least 20 plasmid types occurred in the clinical clade, more than were found in the literature for Salmonella Newport. A small number of plasmids, 14 from 13 clinical isolates and 17 from 8 farm isolates, were found to be highly similar (> 95% identical)-indicating they might be related by horizontal transfer. Phylogenetic analysis was unable to determine the geographic origin, isolation source, or time of transfer of the plasmids, likely due to their promiscuous and transient nature. However, our resampling analysis suggested that observing a similar number and combination of highly similar plasmids in random samples of environmental Salmonella enterica within the NCBI Pathogen Detection database was unlikely, supporting a connection between the outbreak strain and the farms implicated by the epidemiological investigation. CONCLUSION: Horizontally transferred plasmids provided evidence for a connection between clinical isolates and the farms implicated as the source of the outbreak. Our case study suggests that such analyses might add a new dimension to source tracking investigations, but highlights the need for detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution.

Assuntos

Salmonella enterica , Sorogrupo , Cebolas/genética , Fazendas , Filogenia , Plasmídeos/genética , Surtos de Doenças

6.

Theoretical and Practical Considerations when using Retroelement Insertions to Estimate Species Trees in the Anomaly Zone.

Molloy, Erin K; Gatesy, John; Springer, Mark S.

Syst Biol ; 71(3): 721-740, 2022 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-34677617

RESUMO

A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony (as well as an approach based on MDC) typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e., local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100,000 parsimony-informative RIs. We found that, when given just 1000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e., clades separated by branches $>0.3$ coalescent units) with high support and identified rapid radiations (i.e., shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios. [Coalescence; incomplete lineage sorting; Laurasiatheria; Palaeognathae; parsimony; polymorphism parsimony; retroelement insertions; species trees; transposon.].

Assuntos

Paleógnatas , Retroelementos , Animais , Simulação por Computador , Modelos Genéticos , Filogenia , Retroelementos/genética

7.

Advancing admixture graph estimation via maximum likelihood network orientation.

Molloy, Erin K; Durvasula, Arun; Sankararaman, Sriram.

Bioinformatics ; 37(Suppl_1): i142-i150, 2021 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-34252951

RESUMO

MOTIVATION: Admixture, the interbreeding between previously distinct populations, is a pervasive force in evolution. The evolutionary history of populations in the presence of admixture can be modeled by augmenting phylogenetic trees with additional nodes that represent admixture events. While enabling a more faithful representation of evolutionary history, admixture graphs present formidable inferential challenges, and there is an increasing need for methods that are accurate, fully automated and computationally efficient. One key challenge arises from the size of the space of admixture graphs. Given that exhaustively evaluating all admixture graphs can be prohibitively expensive, heuristics have been developed to enable efficient search over this space. One heuristic, implemented in the popular method TreeMix, consists of adding edges to a starting tree while optimizing a suitable objective function. RESULTS: Here, we present a demographic model (with one admixed population incident to a leaf) where TreeMix and any other starting-tree-based maximum likelihood heuristic using its likelihood function is guaranteed to get stuck in a local optimum and return an incorrect network topology. To address this issue, we propose a new search strategy that we term maximum likelihood network orientation (MLNO). We augment TreeMix with an exhaustive search for an MLNO, referring to this approach as OrientAGraph. In evaluations including previously published admixture graphs, OrientAGraph outperformed TreeMix on 4/8 models (there are no differences in the other cases). Overall, OrientAGraph found graphs with higher likelihood scores and topological accuracy while remaining computationally efficient. Lastly, our study reveals several directions for improving maximum likelihood admixture graph estimation. AVAILABILITY AND IMPLEMENTATION: OrientAGraph is available on Github (https://github.com/sriramlab/OrientAGraph) under the GNU General Public License v3.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Humanos , Funções Verossimilhança , Filogenia , Grupos Populacionais

8.

TIPP2: metagenomic taxonomic profiling using phylogenetic markers.

Shah, Nidhi; Molloy, Erin K; Pop, Mihai; Warnow, Tandy.

Bioinformatics ; 37(13): 1839-1845, 2021 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-33471121

RESUMO

MOTIVATION: Metagenomics has revolutionized microbiome research by enabling researchers to characterize the composition of complex microbial communities. Taxonomic profiling is one of the critical steps in metagenomic analyses. Marker genes, which are single-copy and universally found across Bacteria and Archaea, can provide accurate estimates of taxon abundances in the sample. RESULTS: We present TIPP2, a marker gene-based abundance profiling method, which combines phylogenetic placement with statistical techniques to control classification precision and recall. TIPP2 includes an updated set of reference packages and several algorithmic improvements over the original TIPP method. We find that TIPP2 provides comparable or better estimates of abundance than other profiling methods (including Bracken, mOTUsv2 and MetaPhlAn2), and strictly dominates other methods when there are under-represented (novel) genomes present in the dataset. AVAILABILITY AND IMPLEMENTATION: The code for our method is freely available in open-source form at https://github.com/smirarab/sepp/blob/tipp2/README.TIPP.md. The code and procedure to create new reference packages for TIPP2 are available at https://github.com/shahnidhi/TIPP_reference_package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.

ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy.

Zhang, Chao; Scornavacca, Celine; Molloy, Erin K; Mirarab, Siavash.

Mol Biol Evol ; 37(11): 3292-3307, 2020 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-32886770

RESUMO

Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.

Assuntos

Técnicas Genéticas , Filogenia , Algoritmos , Plantas/genética , Leveduras/genética

10.

FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models.

Molloy, Erin K; Warnow, Tandy.

Bioinformatics ; 36(Suppl_1): i57-i65, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32657396

RESUMO

MOTIVATION: Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. RESULTS: We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. AVAILABILITY AND IMPEMENTATION: FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Duplicação Gênica , Biometria , Simulação por Computador , Filogenia

11.

Correction to: The performance of coalescent-based species tree estimation methods under models of missing data.

Nute, Michael; Chou, Jed; Molloy, Erin K; Warnow, Tandy.

BMC Genomics ; 21(1): 133, 2020 02 10.

Artigo em Inglês | MEDLINE | ID: mdl-32039710

RESUMO

After publication of [1], the authors were informed by John A. Rhodes of a counterexample to Theorem 11 of [1].

12.

TreeMerge: a new method for improving the scalability of species tree estimation methods.

Molloy, Erin K; Warnow, Tandy.

Bioinformatics ; 35(14): i417-i426, 2019 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-31510668

RESUMO

MOTIVATION: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. RESULTS: Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework-only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All. AVAILABILITY AND IMPLEMENTATION: TreeMerge is publicly available on Github (http://github.com/ekmolloy/treemerge). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Filogenia , Simulação por Computador , Coleta de Dados

13.

ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets.

Springer, Mark S; Molloy, Erin K; Sloan, Daniel B; Simmons, Mark P; Gatesy, John.

J Hered ; 111(2): 147-168, 2020 04 02.

Artigo em Inglês | MEDLINE | ID: mdl-31837265

RESUMO

DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the "no intralocus-recombination" assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.

Assuntos

Especiação Genética , Modelos Genéticos , Retroelementos , Vertebrados/genética , Animais , Elementos de DNA Transponíveis , Hibridização Genética , Filogenia

14.

To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.

Molloy, Erin K; Warnow, Tandy.

Syst Biol ; 67(2): 285-303, 2018 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-29029338

RESUMO

With the increasing availability of whole genome data, many species trees are being constructed from hundreds to thousands of loci. Although concatenation analysis using maximum likelihood is a standard approach for estimating species trees, it does not account for gene tree heterogeneity, which can occur due to many biological processes, such as incomplete lineage sorting. Coalescent species tree estimation methods, many of which are statistically consistent in the presence of incomplete lineage sorting, include Bayesian methods that coestimate the gene trees and the species tree, summary methods that compute the species tree by combining estimated gene trees, and site-based methods that infer the species tree from site patterns in the alignments of different loci. Due to concerns that poor quality loci will reduce the accuracy of estimated species trees, many recent phylogenomic studies have removed or filtered genes on the basis of phylogenetic signal and/or missing data prior to inferring species trees; little is known about the performance of species tree estimation methods when gene filtering is performed. We examine how incomplete lineage sorting, phylogenetic signal of individual loci, and missing data affect the absolute and the relative accuracy of species tree estimation methods and show how these properties affect methods' responses to gene filtering strategies. In particular, summary methods (ASTRAL-II, ASTRID, and MP-EST), a site-based coalescent method (SVDquartets within PAUP*), and an unpartitioned concatenation analysis using maximum likelihood (RAxML) were evaluated on a heterogeneous collection of simulated multilocus data sets, and the following trends were observed. Filtering genes based on gene tree estimation error improved the accuracy of the summary methods when levels of incomplete lineage sorting were low to moderate but did not benefit the summary methods under higher levels of incomplete lineage sorting, unless gene tree estimation error was also extremely high (a model condition with few replicates). Neither SVDquartets nor concatenation analysis using RAxML benefited from filtering genes on the basis of gene tree estimation error. Finally, filtering genes based on missing data was either neutral (i.e., did not impact accuracy) or else reduced the accuracy of all five methods. By providing insight into the consequences of gene filtering, we offer recommendations for estimating species tree in the presence of incomplete lineage sorting and reconcile seemingly conflicting observations made in prior studies regarding the impact of gene filtering.

Assuntos

Classificação/métodos , Especiação Genética , Modelos Genéticos , Filogenia , Simulação por Computador , Genômica , Análise de Sequência

15.

The performance of coalescent-based species tree estimation methods under models of missing data.

Nute, Michael; Chou, Jed; Molloy, Erin K; Warnow, Tandy.

BMC Genomics ; 19(Suppl 5): 286, 2018 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-29745854

RESUMO

BACKGROUND: Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS: We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS: All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data.

Assuntos

Classificação/métodos , Especiação Genética , Modelos Genéticos , Filogenia , Algoritmos , Simulação por Computador , Genes , Genômica , Especificidade da Espécie

16.

A brief assessment tool for investigating facets of moral judgment from realistic vignettes.

Kruepke, Michael; Molloy, Erin K; Bresin, Konrad; Barbey, Aron K; Verona, Edelyn.

Behav Res Methods ; 50(3): 922-936, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-28646400

RESUMO

Humans make moral judgments every day, and research demonstrates that these evaluations are based on a host of related event features (e.g., harm, legality). In order to acquire systematic data on how moral judgments are made, our assessments need to be expanded to include real-life, ecologically valid stimuli that take into account the numerous event features that are known to influence moral judgment. To facilitate this, Knutson et al. (in Social Cognitive and Affective Neuroscience, 5(4), 378-384, 2010) developed vignettes based on real-life episodic memories rated concurrently on key moral features; however, the method is time intensive (~1.4-3.4 h) and the stimuli and ratings require further validation and characterization. The present study addresses these limitations by: (i) validating three short subsets of these vignettes (39 per subset) that are time-efficient (10-25 min per subset) yet representative of the ratings and factor structure of the full set, (ii) norming ratings of moral features in a larger sample (total N = 661, each subset N = ~220 vs. Knutson et al. N = 30), (iii) examining the generalizability of the original factor structure by replicating it in a larger sample across vignette subsets, sex, and political ideology, and (iv) using latent profile analysis to empirically characterize vignette groupings based on event feature ratings profiles and vignette content. This study therefore provides researchers with a core battery of well-characterized and realistic vignettes, concurrently rated on key moral features that can be administered in a brief, time-efficient manner to advance research on the nature of moral judgment.

Assuntos

Escala de Avaliação Comportamental , Julgamento , Princípios Morais , Narração , Adulto , Feminino , Humanos , Masculino , Memória Episódica , Pessoa de Meia-Idade

17.

Corrigendum to: ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy.

Zhang, Chao; Scornavacca, Celine; Molloy, Erin K; Mirarab, Siavash.

Mol Biol Evol ; 38(10): 4655, 2021 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-34417619

18.

The influence of spatial resolution and smoothing on the detectability of resting-state and task fMRI.

Molloy, Erin K; Meyerand, Mary E; Birn, Rasmus M.

Neuroimage ; 86: 221-30, 2014 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-24021836

RESUMO

Functional MRI blood oxygen level-dependent (BOLD) signal changes can be subtle, motivating the use of imaging parameters and processing strategies that maximize the temporal signal-to-noise ratio (tSNR) and thus the detection power of neuronal activity-induced fluctuations. Previous studies have shown that acquiring data at higher spatial resolutions results in greater percent BOLD signal changes, and furthermore that spatially smoothing higher resolution fMRI data improves tSNR beyond that of data originally acquired at a lower resolution. However, higher resolution images come at the cost of increased acquisition time, and the number of image volumes also influences detectability. The goal of our study is to determine how the detection power of neuronally induced BOLD fluctuations acquired at higher spatial resolutions and then spatially smoothed compares to data acquired at the lower resolutions with the same imaging duration. The number of time points acquired during a given amount of imaging time is a practical consideration given the limited ability of certain populations to lie still in the MRI scanner. We compare acquisitions at three different in-plane spatial resolutions (3.50×3.50mm(2), 2.33×2.33mm(2), 1.75×1.75mm(2)) in terms of their tSNR, contrast-to-noise ratio, and the power to detect both task-related activation and resting-state functional connectivity. The impact of SENSE acceleration, which speeds up acquisition time increasing the number of images collected, is also evaluated. Our results show that after spatially smoothing the data to the same intrinsic resolution, lower resolution acquisitions have a slightly higher detection power of task-activation in some, but not all, brain areas. There were no significant differences in functional connectivity as a function of resolution after smoothing. Similarly, the reduced tSNR of fMRI data acquired with a SENSE factor of 2 is offset by the greater number of images acquired, resulting in few significant differences in detection power of either functional activation or connectivity after spatial smoothing.

Assuntos

Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Potenciais Evocados/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Descanso/fisiologia , Análise e Desempenho de Tarefas , Algoritmos , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise Espaço-Temporal

19.

Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem.

Dai, Junyan; Rubel, Tobias; Han, Yunheng; Molloy, Erin K.

Algorithms Mol Biol ; 19(1): 2, 2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38191515

RESUMO

The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem for the Dollo criterion score in [Formula: see text] time, where n is the number of leaves, k is the number of characters, and [Formula: see text] is the set of clades used as constraints. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. This motivated us to implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility for analyzing retroelement insertion presence / absence patterns for bats, birds, toothed whales as well as simulated data. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.

20.

Lightweight taxonomic profiling of long-read sequenced metagenomes with Lemur and Magnet.

Sapoval, Nicolae; Liu, Yunxi; Curry, Kristen D; Kille, Bryce; Huang, Wenyu; Kokroko, Natalie; Nute, Michael G; Tyshaieva, Alona; Dilthey, Alexander; Molloy, Erin K; Treangen, Todd J.

bioRxiv ; 2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38895276

RESUMO

Taxonomic profiling is a ubiquitous task in the analysis of clinical and environmental microbiomes. The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling from long-read shotgun metagenomic datasets. Lemur is a marker-gene based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet makes detailed presence/absence calls for bacterial genomes based on whole-genome read mapping. The tools work in sequence: Lemur estimates abundances conservatively, and Magnet operates on the genomes of identified organisms to filter out likely false positive taxa. The result is an increase in precision of as much as 70%, which far exceeds competing methods. By operating only on marker genes, Lemur is a comparatively lightweight software. We demonstrate that it can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs - a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. The reference is available at https://zenodo.org/records/10802546, and the software is open-source and available at https://github.com/treangenlab/lemur.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA