Search | VHL Regional Portal

NestedBD: Bayesian inference of phylogenetic trees from single-cell copy number profiles under a birth-death model.

Liu, Yushu; Edrisi, Mohammadamin; Yan, Zhi; A Ogilvie, Huw; Nakhleh, Luay.

Algorithms Mol Biol ; 19(1): 18, 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38685065

ABSTRACT

Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While such data have traditionally been available via "bulk sequencing," the more recently introduced techniques for single-cell DNA sequencing (scDNAseq) provide the type of data that makes CNA inference possible at the single-cell resolution. We introduce a new birth-death evolutionary model of CNAs and a Bayesian method, NestedBD, for the inference of evolutionary trees (topologies and branch lengths with relative mutation rates) from single-cell data. We evaluated NestedBD's performance using simulated data sets, benchmarking its accuracy against traditional phylogenetic tools as well as state-of-the-art methods. The results show that NestedBD infers more accurate topologies and branch lengths, and that the birth-death model can improve the accuracy of copy number estimation. And when applied to biological data sets, NestedBD infers plausible evolutionary histories of two colorectal cancer samples. NestedBD is available at https://github.com/Androstane/NestedBD .

Accurate integration of single-cell DNA and RNA for analyzing intratumor heterogeneity using MaCroDNA.

Edrisi, Mohammadamin; Huang, Xiru; Ogilvie, Huw A; Nakhleh, Luay.

Nat Commun ; 14(1): 8262, 2023 Dec 13.

Article in English | MEDLINE | ID: mdl-38092737

ABSTRACT

Cancers develop and progress as mutations accumulate, and with the advent of single-cell DNA and RNA sequencing, researchers can observe these mutations and their transcriptomic effects and predict proteomic changes with remarkable temporal and spatial precision. However, to connect genomic mutations with their transcriptomic and proteomic consequences, cells with either only DNA data or only RNA data must be mapped to a common domain. For this purpose, we present MaCroDNA, a method that uses maximum weighted bipartite matching of per-gene read counts from single-cell DNA and RNA-seq data. Using ground truth information from colorectal cancer data, we demonstrate the advantage of MaCroDNA over existing methods in accuracy and speed. Exemplifying the utility of single-cell data integration in cancer research, we suggest, based on results derived using MaCroDNA, that genomic mutations of large effect size increasingly contribute to differential expression between cells as Barrett's esophagus progresses to esophageal cancer, reaffirming the findings of the previous studies.

Subject(s)

Adenocarcinoma , Barrett Esophagus , Esophageal Neoplasms , Humans , Adenocarcinoma/genetics , RNA/genetics , Proteomics , Barrett Esophagus/genetics , Esophageal Neoplasms/pathology , DNA

Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data.

Edrisi, Mohammadamin; Valecha, Monica V; Chowdary, Sunkara B V; Robledo, Sergio; Ogilvie, Huw A; Posada, David; Zafar, Hamim; Nakhleh, Luay.

Bioinformatics ; 38(Suppl 1): i195-i202, 2022 06 24.

Article in English | MEDLINE | ID: mdl-35758771

ABSTRACT

MOTIVATION: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. RESULTS: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. AVAILABILITY AND IMPLEMENTATION: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.

Subject(s)

High-Throughput Nucleotide Sequencing , Nucleotides , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Humans , Phylogeny , Sequence Analysis, DNA

Current progress and open challenges for applying deep learning across the biosciences.

Sapoval, Nicolae; Aghazadeh, Amirali; Nute, Michael G; Antunes, Dinler A; Balaji, Advait; Baraniuk, Richard; Barberan, C J; Dannenfelser, Ruth; Dun, Chen; Edrisi, Mohammadamin; Elworth, R A Leo; Kille, Bryce; Kyrillidis, Anastasios; Nakhleh, Luay; Wolfe, Cameron R; Yan, Zhi; Yao, Vicky; Treangen, Todd J.

Nat Commun ; 13(1): 1728, 2022 04 01.

Article in English | MEDLINE | ID: mdl-35365602

ABSTRACT

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

Subject(s)

Deep Learning , Computational Biology , Phylogeny , Proteins , Systems Biology

Methods for copy number aberration detection from single-cell DNA-sequencing data.

Mallory, Xian F; Edrisi, Mohammadamin; Navin, Nicholas; Nakhleh, Luay.

Genome Biol ; 21(1): 208, 2020 08 17.

Article in English | MEDLINE | ID: mdl-32807205

ABSTRACT

Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.

Subject(s)

Base Sequence , Computational Biology/methods , DNA Copy Number Variations , Sequence Analysis, DNA/methods , Chromosome Aberrations , DNA , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics

Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data.

Mallory, Xian F; Edrisi, Mohammadamin; Navin, Nicholas; Nakhleh, Luay.

PLoS Comput Biol ; 16(7): e1008012, 2020 07.

Article in English | MEDLINE | ID: mdl-32658894

ABSTRACT

Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to single-cell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods-Ginkgo, HMMcopy, and CopyNumber-on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.

Subject(s)

DNA Copy Number Variations , Genome, Human , Sequence Analysis, DNA/methods , Single-Cell Analysis/methods , Algorithms , Chromosome Aberrations , Computational Biology , Computer Simulation , Gene Dosage , Humans , Mutation , Neoplasms/genetics , Ploidies , Poisson Distribution , ROC Curve , Reproducibility of Results , Software

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL