Pesquisa | Portal Regional da BVS

1.

Isotype-aware inference of B cell clonal lineage trees from single-cell sequencing data.

Weber, Leah L; Reiman, Derek; Roddur, Mrinmoy S; Qi, Yuanyuan; El-Kebir, Mohammed; Khan, Aly A.

Cell Genom ; 4(9): 100637, 2024 Sep 11.

Artigo em Inglês | MEDLINE | ID: mdl-39208795

RESUMO

Single-cell RNA sequencing (scRNA-seq) enables comprehensive characterization of the micro-evolutionary processes of B cells during an adaptive immune response, capturing features of somatic hypermutation (SHM) and class switch recombination (CSR). Existing phylogenetic approaches for reconstructing B cell evolution have primarily focused on the SHM process alone. Here, we present tree inference of B cell clonal lineages (TRIBAL), an algorithm designed to optimally reconstruct the evolutionary history of B cell clonal lineages undergoing both SHM and CSR from scRNA-seq data. Through simulations, we demonstrate that TRIBAL produces more comprehensive and accurate B cell lineage trees compared to existing methods. Using real-world datasets, TRIBAL successfully recapitulates expected biological trends in a model affinity maturation system while reconstructing evolutionary histories with more parsimonious class switching than state-of-the-art methods. Thus, TRIBAL significantly improves B cell lineage tracing, useful for modeling vaccine responses, disease progression, and the identification of therapeutic antibodies.

Assuntos

Algoritmos , Linfócitos B , Linhagem da Célula , Análise de Célula Única , Linfócitos B/imunologia , Análise de Célula Única/métodos , Linhagem da Célula/genética , Humanos , Filogenia , Hipermutação Somática de Imunoglobulina/genética , Switching de Imunoglobulina/genética , Análise de Sequência de RNA/métodos

2.

Enforcing Temporal Consistency in Migration History Inference.

Roddur, Mrinmoy Saha; Snir, Sagi; El-Kebir, Mohammed.

J Comput Biol ; 31(5): 396-415, 2024 05.

Artigo em Inglês | MEDLINE | ID: mdl-38754138

RESUMO

In addition to undergoing evolution, members of biological populations may also migrate between locations. Examples include the spread of tumor cells from the primary tumor to distant metastases or the spread of pathogens from one host to another. One may represent migration histories by assigning a location label to each vertex of a given phylogenetic tree such that an edge connecting vertices with distinct locations represents a migration. Some biological populations undergo comigration, a phenomenon where multiple taxa from distinct lineages simultaneously comigrate from one location to another. In this work, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogenetic tree, leading to three successive problems. First, we formulate the temporally consistent comigration problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the parsimonious consistent comigrations (PCC) problem, which aims to find comigrations given a location labeling of a phylogenetic tree. We show that PCC is NP-hard. Third, we formulate the parsimonious consistent comigration history (PCCH) problem, which infers the migration history given a phylogenetic tree and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We demonstrate our algorithms on simulated and real data.

Assuntos

Migração Animal , Movimento Celular , Modelos Biológicos , Migração Humana , Humanos , Animais , Algoritmos , Fatores de Tempo

3.

DERNA Enables Pareto Optimal RNA Design.

Gu, Xinyu; Qi, Yuanyuan; El-Kebir, Mohammed.

J Comput Biol ; 31(3): 179-196, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38416637

RESUMO

The design of an RNA sequence v that encodes an input target protein sequence w is a crucial aspect of messenger RNA (mRNA) vaccine development. There are an exponential number of possible RNA sequences for a single target protein due to codon degeneracy. These potential RNA sequences can assume various secondary structure conformations, each with distinct minimum free energy (MFE), impacting thermodynamic stability and mRNA half-life. Furthermore, the presence of species-specific codon usage bias, quantified by the codon adaptation index (CAI), plays a vital role in translation efficiency. While earlier studies focused on optimizing either MFE or CAI, recent research has underscored the advantages of simultaneously optimizing both objectives. However, optimizing one objective comes at the expense of the other. In this work, we present the Pareto Optimal RNA Design problem, aiming to identify the set of Pareto optimal solutions for which no alternative solutions exist that exhibit better MFE and CAI values. Our algorithm DEsign RNA (DERNA) uses the weighted sum method to enumerate the Pareto front by optimizing convex combinations of both objectives. We use dynamic programming to solve each convex combination in O(|w|3) time and O(|w|2) space. Compared with a CDSfold, previous approach that only optimizes MFE, we show on a benchmark data set that DERNA obtains solutions with identical MFE but superior CAI. Moreover, we show that DERNA matches the performance in terms of solution quality of LinearDesign, a recent approach that similarly seeks to balance MFE and CAI. We conclude by demonstrating our method's potential for mRNA vaccine design for the SARS-CoV-2 spike protein.

Assuntos

Algoritmos , RNA , Glicoproteína da Espícula de Coronavírus , Humanos , RNA/química , RNA Mensageiro , Códon

4.

Consensus Tree Under the Ancestor-Descendant Distance is NP-Hard.

Qi, Yuanyuan; El-Kebir, Mohammed.

J Comput Biol ; 31(1): 58-70, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38010616

RESUMO

Due to uncertainty in tumor phylogeny inference from sequencing data, many methods infer multiple, equally plausible phylogenies for the same cancer. To summarize the solution space T of tumor phylogenies, consensus tree methods seek a single best representative tree S under a specified pairwise tree distance function. One such distance function is the ancestor-descendant (AD) distance [Formula: see text] , which equals the size of the symmetric difference of the transitive closures of the edge sets [Formula: see text] and [Formula: see text] . Here, we show that finding a consensus tree S for tumor phylogenies T that minimizes the total AD distance [Formula: see text] is NP-hard.

Assuntos

Algoritmos , Neoplasias , Humanos , Consenso , Filogenia , Incerteza

5.

TRIBAL: Tree Inference of B cell Clonal Lineages.

Weber, Leah L; Reiman, Derek; Roddur, Mrinmoy S; Qi, Yuanyuan; El-Kebir, Mohammed; Khan, Aly A.

bioRxiv ; 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-38076836

RESUMO

B cells are a critical component of the adaptive immune system, responsible for producing antibodies that help protect the body from infections and foreign substances. Single cell RNA-sequencing (scRNA-seq) has allowed for both profiling of B cell receptor (BCR) sequences and gene expression. However, understanding the adaptive and evolutionary mechanisms of B cells in response to specific stimuli remains a significant challenge in the field of immunology. We introduce a new method, TRIBAL, which aims to infer the evolutionary history of clonally related B cells from scRNA-seq data. The key insight of TRIBAL is that inclusion of isotype data into the B cell lineage inference problem is valuable for reducing phylogenetic uncertainty that arises when only considering the receptor sequences. Consequently, the TRIBAL inferred B cell lineage trees jointly capture the somatic mutations introduced to the B cell receptor during affinity maturation and isotype transitions during class switch recombination. In addition, TRIBAL infers isotype transition probabilities that are valuable for gaining insight into the dynamics of class switching. Via in silico experiments, we demonstrate that TRIBAL infers isotype transition probabilities with the ability to distinguish between direct versus sequential switching in a B cell population. This results in more accurate B cell lineage trees and corresponding ancestral sequence and class switch reconstruction compared to competing methods. Using real-world scRNA-seq datasets, we show that TRIBAL recapitulates expected biological trends in a model affinity maturation system. Furthermore, the B cell lineage trees inferred by TRIBAL were equally plausible for the BCR sequences as those inferred by competing methods but yielded lower entropic partitions for the isotypes of the sequenced B cell. Thus, our method holds the potential to further advance our understanding of vaccine responses, disease progression, and the identification of therapeutic antibodies.

6.

Phertilizer: Growing a clonal tree from ultra-low coverage single-cell DNA sequencing of tumors.

Weber, Leah L; Zhang, Chuanyi; Ochoa, Idoia; El-Kebir, Mohammed.

PLoS Comput Biol ; 19(10): e1011544, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37819942

RESUMO

Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We present Phertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance of Phertilizer on simulated data as well as on two real datasets, finding that Phertilizer effectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.

Assuntos

Neoplasias , Árvores , Humanos , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única

7.

OFraMP: a fragment-based tool to facilitate the parametrization of large molecules.

Stroet, Martin; Caron, Bertrand; Engler, Martin S; van der Woning, Jimi; Kauffmann, Aude; van Dijk, Marc; El-Kebir, Mohammed; Visscher, Koen M; Holownia, Josef; Macfarlane, Callum; Bennion, Brian J; Gelpi-Dominguez, Svetlana; Lightstone, Felice C; van der Storm, Tijs; Geerke, Daan P; Mark, Alan E; Klau, Gunnar W.

J Comput Aided Mol Des ; 37(8): 357-371, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37310542

RESUMO

An Online tool for Fragment-based Molecule Parametrization (OFraMP) is described. OFraMP is a web application for assigning atomic interaction parameters to large molecules by matching sub-fragments within the target molecule to equivalent sub-fragments within the Automated Topology Builder (ATB, atb.uq.edu.au) database. OFraMP identifies and compares alternative molecular fragments from the ATB database, which contains over 890,000 pre-parameterized molecules, using a novel hierarchical matching procedure. Atoms are considered within the context of an extended local environment (buffer region) with the degree of similarity between an atom in the target molecule and that in the proposed match controlled by varying the size of the buffer region. Adjacent matching atoms are combined into progressively larger matched sub-structures. The user then selects the most appropriate match. OFraMP also allows users to manually alter interaction parameters and automates the submission of missing substructures to the ATB in order to generate parameters for atoms in environments not represented in the existing database. The utility of OFraMP is illustrated using the anti-cancer agent paclitaxel and a dendrimer used in organic semiconductor devices. OFraMP applied to paclitaxel (ATB ID 35922).

Assuntos

Software , Bases de Dados Factuais

8.

Modeling and predicting cancer clonal evolution with reinforcement learning.

Ivanovic, Stefan; El-Kebir, Mohammed.

Genome Res ; 33(7): 1078-1088, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37344104

RESUMO

Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Accurately modeling this process is key to understanding and predicting cancer evolution. Here, we introduce clone to mutation (CloMu), a flexible and low-parameter tree generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection, causality and interchangeability between mutations, and mutation fitness. Importantly, previous methods support only some of these tasks, and many suffer from overfitting on data sets with a large number of mutations. Using simulations, we show that CloMu either matches or outperforms current methods on a wide variety of prediction tasks. In particular, for simulated data with interchangeable mutations, current methods are unable to uncover causal relationships as effectively as CloMu. On breast cancer and leukemia cohorts, we show that CloMu determines similarities and causal relationships between mutations as well as the fitness of mutations. We validate CloMu's inferred mutation fitness values for the leukemia cohort by comparing them to clonal proportion data not used during training, showing high concordance. In summary, CloMu's low-parameter model facilitates a wide range of prediction tasks regarding cancer evolution on increasingly available cohort-level data sets.

Assuntos

Leucemia , Neoplasias , Humanos , Neoplasias/genética , Mutação , Evolução Clonal/genética , Redes Neurais de Computação

9.

What are the keys to succeeding as a computational biologist in today's research climate?

Berger, Bonnie; Tian, Dechao; Li, Wei Vivian; El-Kebir, Mohammed; Tomescu, Alexandru I; Singh, Ritambhara; Beerenwinkel, Niko; Li, Yu; Boucher, Christina; Bar-Joseph, Ziv.

Cell Syst ; 13(10): 781-785, 2022 Oct 19.

Artigo em Inglês | MEDLINE | ID: mdl-36265464

10.

CNAViz: An interactive webtool for user-guided segmentation of tumor DNA sequencing data.

Lalani, Zubair; Chu, Gillian; Hsu, Silas; Kagawa, Shaw; Xiang, Michael; Zaccaria, Simone; El-Kebir, Mohammed.

PLoS Comput Biol ; 18(10): e1010614, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36228003

RESUMO

Copy-number aberrations (CNAs) are genetic alterations that amplify or delete the number of copies of large genomic segments. Although they are ubiquitous in cancer and, thus, a critical area of current cancer research, CNA identification from DNA sequencing data is challenging because it requires partitioning of the genome into complex segments with the same copy-number states that may not be contiguous. Existing segmentation algorithms address these challenges either by leveraging the local information among neighboring genomic regions, or by globally grouping genomic regions that are affected by similar CNAs across the entire genome. However, both approaches have limitations: overclustering in the case of local segmentation, or the omission of clusters corresponding to focal CNAs in the case of global segmentation. Importantly, inaccurate segmentation will lead to inaccurate identification of CNAs. For this reason, most pan-cancer research studies rely on manual procedures of quality control and anomaly correction. To improve copy-number segmentation, we introduce CNAViz, a web-based tool that enables the user to simultaneously perform local and global segmentation, thus overcoming the limitations of each approach. Using simulated data, we demonstrate that by several metrics, CNAViz allows the user to obtain more accurate segmentation relative to existing local and global segmentation methods. Moreover, we analyze six bulk DNA sequencing samples from three breast cancer patients. By validating with parallel single-cell DNA sequencing data from the same samples, we show that by using CNAViz, our user was able to obtain more accurate segmentation and improved accuracy in downstream copy-number calling.

Assuntos

Neoplasias da Mama , Neoplasias , Humanos , Feminino , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Algoritmos , Análise de Sequência de DNA , DNA de Neoplasias , Neoplasias da Mama/genética

11.

Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses.

Zhang, Chuanyi; Sashittal, Palash; Xiang, Michael; Zhang, Yichi; Kazi, Ayesha; El-Kebir, Mohammed.

Mol Biol Evol ; 39(7)2022 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-35700225

RESUMO

Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the 5' end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.

Assuntos

Infecções por Coronavirus , Coronavirus , Coronavirus/genética , Infecções por Coronavirus/genética , Humanos , RNA Mensageiro/genética , RNA Viral/genética , Transcrição Gênica

12.

Parsimonious Clone Tree Integration in cancer.

Sashittal, Palash; Zaccaria, Simone; El-Kebir, Mohammed.

Algorithms Mol Biol ; 17(1): 3, 2022 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-35282838

RESUMO

BACKGROUND: Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor's clonal composition. RESULTS: To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. CONCLUSION: PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods.

13.

Design of SARS-CoV-2 Variant-Specific PCR Assays Considering Regional and Temporal Characteristics.

Oh, Chamteut; Sashittal, Palash; Zhou, Aijia; Wang, Leyi; El-Kebir, Mohammed; Nguyen, Thanh H.

Appl Environ Microbiol ; 88(7): e0228921, 2022 04 12.

Artigo em Inglês | MEDLINE | ID: mdl-35285246

RESUMO

Monitoring the prevalence of SARS-CoV-2 variants is necessary to make informed public health decisions during the COVID-19 pandemic. PCR assays have received global attention, facilitating a rapid understanding of variant dynamics because they are more accessible and scalable than genome sequencing. However, as PCR assays target only a few mutations, their accuracy could be reduced when these mutations are not exclusive to the target variants. Here we introduce PRIMES, an algorithm that evaluates the sensitivity and specificity of SARS-CoV-2 variant-specific PCR assays across different geographical regions by incorporating sequences deposited in the GISAID database. Using PRIMES, we determined that the accuracy of several PCR assays decreased when applied beyond the geographic scope of the study in which the assays were developed. Subsequently, we used this tool to design Alpha and Delta variant-specific PCR assays for samples from Illinois, USA. In silico analysis using PRIMES determined the sensitivity/specificity to be 0.99/0.99 for the Alpha variant-specific PCR assay and 0.98/1.00 for the Delta variant-specific PCR assay in Illinois, respectively. We applied these two variant-specific PCR assays to six local sewage samples and determined the dominant SARS-CoV-2 variant of either the wild type, the Alpha variant, or the Delta variant. Using next-generation sequencing (NGS) of the spike (S) gene amplicons of the Delta variant-dominant samples, we found six mutations exclusive to the Delta variant (S:T19R, S:Δ156/157, S:L452R, S:T478K, S:P681R, and S:D950N). The consistency between the variant-specific PCR assays and the NGS results supports the applicability of PRIMES. IMPORTANCE Monitoring the introduction and prevalence of variants of concern (VOCs) and variants of interest (VOIs) in a community can help the local authorities make informed public health decisions. PCR assays can be designed to keep track of SARS-CoV-2 variants by measuring unique mutation markers that are exclusive to the target variants. However, the mutation markers may not be exclusive to the target variants because of regional and temporal differences in variant dynamics. We introduce PRIMES, an algorithm that enables the design of reliable PCR assays for variant detection. Because PCR is more accessible, scalable, and robust for sewage samples than sequencing technology, our findings will contribute to improving global SARS-CoV-2 variant surveillance.

Assuntos

COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiologia , Humanos , Mutação , Pandemias , Reação em Cadeia da Polimerase , SARS-CoV-2/genética , Esgotos

14.

Emerging Topics in Cancer Evolution.

El-Kebir, Mohammed; Morris, Quaid; Oesper, Layla; Sahinalp, S Cenk.

Pac Symp Biocomput ; 27: 397-401, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34890166

RESUMO

Cancer results from an evolutionary process that yields a heterogeneous tumor with distinct subpopulations and varying sets of somatic mutations. This perspective discusses computational methods to infer models of evolutionary processes in cancer that aim to improve our understanding of tumorigenesis and ultimately enhance current clinical practice.

Assuntos

Biologia Computacional , Neoplasias , Humanos , Mutação , Neoplasias/genética

15.

Jumper enables discontinuous transcript assembly in coronaviruses.

Sashittal, Palash; Zhang, Chuanyi; Peng, Jian; El-Kebir, Mohammed.

Nat Commun ; 12(1): 6728, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34795232

RESUMO

Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.

Assuntos

COVID-19/genética , SARS-CoV-2/genética , Processamento Alternativo/genética , Perfilação da Expressão Gênica , Humanos , Transcriptoma/genética

16.

DeCiFering the elusive cancer cell fraction in tumor heterogeneity and evolution.

Satas, Gryte; Zaccaria, Simone; El-Kebir, Mohammed; Raphael, Benjamin J.

Cell Syst ; 12(10): 1004-1018.e10, 2021 10 20.

Artigo em Inglês | MEDLINE | ID: mdl-34416171

RESUMO

The cancer cell fraction (CCF), or proportion of cancerous cells in a tumor containing a single-nucleotide variant (SNV), is a fundamental statistic used to quantify tumor heterogeneity and evolution. Existing CCF estimation methods from bulk DNA sequencing data assume that every cell with an SNV contains the same number of copies of the SNV. This assumption is unrealistic in tumors with copy-number aberrations that alter SNV multiplicities. Furthermore, the CCF does not account for SNV losses due to copy-number aberrations, confounding downstream phylogenetic analyses. We introduce DeCiFer, an algorithm that overcomes these limitations by clustering SNVs using a novel statistic, the descendant cell fraction (DCF). The DCF quantifies both the prevalence of an SNV at the present time and its past evolutionary history using an evolutionary model that allows mutation losses. We show that DeCiFer yields more parsimonious reconstructions of tumor evolution than previously reported for 49 prostate cancer samples.

Assuntos

Neoplasias , Polimorfismo de Nucleotídeo Único , Algoritmos , Humanos , Masculino , Neoplasias/genética , Neoplasias/patologia , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA

17.

doubletD: detecting doublets in single-cell DNA sequencing data.

Weber, Leah L; Sashittal, Palash; El-Kebir, Mohammed.

Bioinformatics ; 37(Suppl_1): i214-i221, 2021 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-34252961

RESUMO

MOTIVATION: While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance. RESULTS: We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results. AVAILABILITY AND IMPLEMENTATION: https://github.com/elkebir-group/doubletD. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Análise de Célula Única , Software , Perfilação da Expressão Gênica , Funções Verossimilhança , Análise de Sequência de DNA , Análise de Sequência de RNA

18.

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors.

Weber, Leah L; El-Kebir, Mohammed.

Algorithms Mol Biol ; 16(1): 14, 2021 Jul 06.

Artigo em Inglês | MEDLINE | ID: mdl-34229713

RESUMO

BACKGROUND: Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor's evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor's evolutionary history as either linear or branched. RESULTS: We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. CONCLUSION: Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor's single-cell DNA sequencing data.

19.

Transcriptional Profiling of Porcine HCC Xenografts Provides Insights Into Tumor Cell Microenvironment Signaling.

Patel, Shovik S; Sandur, Amitha; El-Kebir, Mohammed; Gaba, Ron C; Schook, Lawrence B; Schachtschneider, Kyle M.

Front Genet ; 12: 657330, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33995488

RESUMO

Hepatocellular carcinoma (HCC) is the second leading cause of cancer-related death worldwide, representing the most common form of liver cancer. As HCC incidence and mortality continue to increase, there is a growing need for improved translational animal models to bridge the gap between basic HCC research and clinical practice to improve early detection and treatment strategies for this deadly disease. Recently the Oncopig cancer model-a novel transgenic swine model that recapitulates human cancer through Cre recombinase induced expression of KRAS G12D and TP53 R167H driver mutations-has been validated as a large animal translational model for human HCC. Due to the similar size, anatomy, physiology, immunology, genetics, and epigenetics between pigs and humans, the Oncopig has the potential to improve translation of novel diagnostic and therapeutic modalities into clinical practice. Recent studies have demonstrated the importance of tumor cells in shaping its surrounding microenvironment into one that is more proliferative, invasive, and metastatic; however, little is known about the impact of microenvironment signaling on HCC tumor biology and differential gene expression between HCC tumors and its tumor microenvironment (TME). In this study, transcriptional profiling was performed on Oncopig HCC xenograft tumors (n = 3) produced via subcutaneous injection of Oncopig HCC cells into severe combined immunodeficiency (SCID) mice. To differentiate between gene expression in the tumor and surrounding tumor microenvironment, RNA-seq reads originating from porcine (HCC tumor) and murine (microenvironment) cells were bioinformatically separated using Xenome. Principle component analysis (PCA) demonstrated clustering by group based on the expression of orthologous genes. Genes contributing to each principal component were extracted and subjected to functional analysis to identify alterations in pathway signaling between HCC cells and the microenvironment. Altered expression of genes associated with hepatic fibrosis deposition, immune response, and neo angiogenesis were observed. The results of this study provide insights into the interplay between HCC and microenvironment signaling in vivo, improving our understanding of the interplay between HCC tumor cells, the surrounding tumor microenvironment, and the impact on HCC development and progression.

20.

Moss enables high sensitivity single-nucleotide variant calling from multiple bulk DNA tumor samples.

Zhang, Chuanyi; El-Kebir, Mohammed; Ochoa, Idoia.

Nat Commun ; 12(1): 2204, 2021 04 13.

Artigo em Inglês | MEDLINE | ID: mdl-33850139

RESUMO

Intra-tumor heterogeneity renders the identification of somatic single-nucleotide variants (SNVs) a challenging problem. In particular, low-frequency SNVs are hard to distinguish from sequencing artifacts. While the increasing availability of multi-sample tumor DNA sequencing data holds the potential for more accurate variant calling, there is a lack of high-sensitivity multi-sample SNV callers that utilize these data. Here we report Moss, a method to identify low-frequency SNVs that recur in multiple sequencing samples from the same tumor. Moss provides any existing single-sample SNV caller the ability to support multiple samples with little additional time overhead. We demonstrate that Moss improves recall while maintaining high precision in a simulated dataset. On multi-sample hepatocellular carcinoma, acute myeloid leukemia and colorectal cancer datasets, Moss identifies new low-frequency variants that meet manual review criteria and are consistent with the tumor's mutational signature profile. In addition, Moss detects the presence of variants in more samples of the same tumor than reported by the single-sample caller. Moss' improved sensitivity in SNV calling will enable more detailed downstream analyses in cancer genomics.

Assuntos

DNA de Neoplasias/genética , Neoplasias Hepáticas/genética , Nucleotídeos , Algoritmos , Carcinoma Hepatocelular , Neoplasias Colorretais/genética , Frequência do Gene , Genômica/métodos , Humanos , Leucemia Mieloide Aguda/genética , Mutação , Polimorfismo de Nucleotídeo Único

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA