Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Comput Biol ; 31(5): 396-415, 2024 05.
Artículo en Inglés | MEDLINE | ID: mdl-38754138

RESUMEN

In addition to undergoing evolution, members of biological populations may also migrate between locations. Examples include the spread of tumor cells from the primary tumor to distant metastases or the spread of pathogens from one host to another. One may represent migration histories by assigning a location label to each vertex of a given phylogenetic tree such that an edge connecting vertices with distinct locations represents a migration. Some biological populations undergo comigration, a phenomenon where multiple taxa from distinct lineages simultaneously comigrate from one location to another. In this work, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogenetic tree, leading to three successive problems. First, we formulate the temporally consistent comigration problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the parsimonious consistent comigrations (PCC) problem, which aims to find comigrations given a location labeling of a phylogenetic tree. We show that PCC is NP-hard. Third, we formulate the parsimonious consistent comigration history (PCCH) problem, which infers the migration history given a phylogenetic tree and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We demonstrate our algorithms on simulated and real data.


Asunto(s)
Migración Animal , Movimiento Celular , Modelos Biológicos , Migración Humana , Humanos , Animales , Algoritmos , Factores de Tiempo
2.
J Comput Biol ; 31(3): 179-196, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38416637

RESUMEN

The design of an RNA sequence v that encodes an input target protein sequence w is a crucial aspect of messenger RNA (mRNA) vaccine development. There are an exponential number of possible RNA sequences for a single target protein due to codon degeneracy. These potential RNA sequences can assume various secondary structure conformations, each with distinct minimum free energy (MFE), impacting thermodynamic stability and mRNA half-life. Furthermore, the presence of species-specific codon usage bias, quantified by the codon adaptation index (CAI), plays a vital role in translation efficiency. While earlier studies focused on optimizing either MFE or CAI, recent research has underscored the advantages of simultaneously optimizing both objectives. However, optimizing one objective comes at the expense of the other. In this work, we present the Pareto Optimal RNA Design problem, aiming to identify the set of Pareto optimal solutions for which no alternative solutions exist that exhibit better MFE and CAI values. Our algorithm DEsign RNA (DERNA) uses the weighted sum method to enumerate the Pareto front by optimizing convex combinations of both objectives. We use dynamic programming to solve each convex combination in O(|w|3) time and O(|w|2) space. Compared with a CDSfold, previous approach that only optimizes MFE, we show on a benchmark data set that DERNA obtains solutions with identical MFE but superior CAI. Moreover, we show that DERNA matches the performance in terms of solution quality of LinearDesign, a recent approach that similarly seeks to balance MFE and CAI. We conclude by demonstrating our method's potential for mRNA vaccine design for the SARS-CoV-2 spike protein.


Asunto(s)
Algoritmos , ARN , Glicoproteína de la Espiga del Coronavirus , Humanos , ARN/química , ARN Mensajero , Codón
3.
J Comput Biol ; 31(1): 58-70, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38010616

RESUMEN

Due to uncertainty in tumor phylogeny inference from sequencing data, many methods infer multiple, equally plausible phylogenies for the same cancer. To summarize the solution space T of tumor phylogenies, consensus tree methods seek a single best representative tree S under a specified pairwise tree distance function. One such distance function is the ancestor-descendant (AD) distance [Formula: see text] , which equals the size of the symmetric difference of the transitive closures of the edge sets [Formula: see text] and [Formula: see text] . Here, we show that finding a consensus tree S for tumor phylogenies T that minimizes the total AD distance [Formula: see text] is NP-hard.


Asunto(s)
Algoritmos , Neoplasias , Humanos , Consenso , Filogenia , Incertidumbre
4.
bioRxiv ; 2023 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-38076836

RESUMEN

B cells are a critical component of the adaptive immune system, responsible for producing antibodies that help protect the body from infections and foreign substances. Single cell RNA-sequencing (scRNA-seq) has allowed for both profiling of B cell receptor (BCR) sequences and gene expression. However, understanding the adaptive and evolutionary mechanisms of B cells in response to specific stimuli remains a significant challenge in the field of immunology. We introduce a new method, TRIBAL, which aims to infer the evolutionary history of clonally related B cells from scRNA-seq data. The key insight of TRIBAL is that inclusion of isotype data into the B cell lineage inference problem is valuable for reducing phylogenetic uncertainty that arises when only considering the receptor sequences. Consequently, the TRIBAL inferred B cell lineage trees jointly capture the somatic mutations introduced to the B cell receptor during affinity maturation and isotype transitions during class switch recombination. In addition, TRIBAL infers isotype transition probabilities that are valuable for gaining insight into the dynamics of class switching. Via in silico experiments, we demonstrate that TRIBAL infers isotype transition probabilities with the ability to distinguish between direct versus sequential switching in a B cell population. This results in more accurate B cell lineage trees and corresponding ancestral sequence and class switch reconstruction compared to competing methods. Using real-world scRNA-seq datasets, we show that TRIBAL recapitulates expected biological trends in a model affinity maturation system. Furthermore, the B cell lineage trees inferred by TRIBAL were equally plausible for the BCR sequences as those inferred by competing methods but yielded lower entropic partitions for the isotypes of the sequenced B cell. Thus, our method holds the potential to further advance our understanding of vaccine responses, disease progression, and the identification of therapeutic antibodies.

5.
PLoS Comput Biol ; 19(10): e1011544, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37819942

RESUMEN

Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We present Phertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance of Phertilizer on simulated data as well as on two real datasets, finding that Phertilizer effectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.


Asunto(s)
Neoplasias , Árboles , Humanos , Variaciones en el Número de Copia de ADN/genética , Neoplasias/genética , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de la Célula Individual
6.
Genome Res ; 33(7): 1078-1088, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37344104

RESUMEN

Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Accurately modeling this process is key to understanding and predicting cancer evolution. Here, we introduce clone to mutation (CloMu), a flexible and low-parameter tree generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection, causality and interchangeability between mutations, and mutation fitness. Importantly, previous methods support only some of these tasks, and many suffer from overfitting on data sets with a large number of mutations. Using simulations, we show that CloMu either matches or outperforms current methods on a wide variety of prediction tasks. In particular, for simulated data with interchangeable mutations, current methods are unable to uncover causal relationships as effectively as CloMu. On breast cancer and leukemia cohorts, we show that CloMu determines similarities and causal relationships between mutations as well as the fitness of mutations. We validate CloMu's inferred mutation fitness values for the leukemia cohort by comparing them to clonal proportion data not used during training, showing high concordance. In summary, CloMu's low-parameter model facilitates a wide range of prediction tasks regarding cancer evolution on increasingly available cohort-level data sets.


Asunto(s)
Leucemia , Neoplasias , Humanos , Neoplasias/genética , Mutación , Evolución Clonal/genética , Redes Neurales de la Computación
7.
J Comput Aided Mol Des ; 37(8): 357-371, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37310542

RESUMEN

An Online tool for Fragment-based Molecule Parametrization (OFraMP) is described. OFraMP is a web application for assigning atomic interaction parameters to large molecules by matching sub-fragments within the target molecule to equivalent sub-fragments within the Automated Topology Builder (ATB, atb.uq.edu.au) database. OFraMP identifies and compares alternative molecular fragments from the ATB database, which contains over 890,000 pre-parameterized molecules, using a novel hierarchical matching procedure. Atoms are considered within the context of an extended local environment (buffer region) with the degree of similarity between an atom in the target molecule and that in the proposed match controlled by varying the size of the buffer region. Adjacent matching atoms are combined into progressively larger matched sub-structures. The user then selects the most appropriate match. OFraMP also allows users to manually alter interaction parameters and automates the submission of missing substructures to the ATB in order to generate parameters for atoms in environments not represented in the existing database. The utility of OFraMP is illustrated using the anti-cancer agent paclitaxel and a dendrimer used in organic semiconductor devices. OFraMP applied to paclitaxel (ATB ID 35922).


Asunto(s)
Programas Informáticos , Bases de Datos Factuales
8.
Materials (Basel) ; 16(7)2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-37048859

RESUMEN

The main objective of this study was to create a mathematical tool that could be used with experimental data to predict the rheological flow behavior of functionalized xanthan gum according to the types of chemical groups grafted onto its backbone. Different rheological and physicochemical analyses were applied to assess six derivatives synthesized via the etherification of xanthan gum by hydrophobic benzylation with benzyl chloride and carboxymethylation with monochloroacetic acid at three (regent/polymer) ratios R equal to 2.4 and 6. Results from the FTIR study verified that xanthan gum had been modified. The degree of substitution (DS) values varying between 0.2 and 2.9 for carboxymethylxanthan gum derivatives were found to be higher than that of hydrophobically modified benzyl xanthan gum for which the DS ranged from 0.5 to 1. The molecular weights of all the derivatives were found to be less than that of xanthan gum for the two types of derivatives, decreasing further as the degree of substitution (DS) increased. However, the benzyl xanthan gum derivatives presented higher molecular weights varying between 1,373,146 (g/mol) and 1,262,227 (g/mol) than carboxymethylxanthan gum derivatives (1,326,722-1,015,544) (g/mol). A shear-thinning behavior was observed in the derivatives, and the derivatives' viscosity was found to decrease with increasing DS. The second objective of this research was to create an ANN model to predict one of the rheological properties (the apparent viscosity). The significance of the ANN model (R2 = 0.99998 and MSE = 5.95 × 10-3) was validated by comparing experimental results with the predicted ones. The results showed that the model was an efficient tool for predicting rheological flow behavior.

9.
Micromachines (Basel) ; 14(3)2023 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-36985017

RESUMEN

This work aimed to formulate xanthan gum microspheres for the encapsulation of metformin hydrochloride, according to the process of ionotropic gelation. The obtained microparticles, based on various fractions of xanthan gum (0.5-1.25), were subjected to different physico-chemical tests and a drug release study. Microspheres with an average size varying between 110.96 µm and 208.27 µm were obtained. Encapsulation efficiency reached 93.11% at a 1.25% biopolymer concentration. The swelling study showed a swelling rate reaching 29.8% in the gastric medium (pH 1.2) and 360% in the intestinal medium (pH 6.8). The drug release studies showed complete metformin hydrochloride release from the beads, especially those prepared from xanthan gum at the concentration of 1.25%, in intestinal medium at 90.00% after 6 h. However, limited and insignificant drug release was observed within the gastric medium (32.50%). The dissolution profiles showed sustained release kinetics.

11.
PLoS Comput Biol ; 18(10): e1010614, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36228003

RESUMEN

Copy-number aberrations (CNAs) are genetic alterations that amplify or delete the number of copies of large genomic segments. Although they are ubiquitous in cancer and, thus, a critical area of current cancer research, CNA identification from DNA sequencing data is challenging because it requires partitioning of the genome into complex segments with the same copy-number states that may not be contiguous. Existing segmentation algorithms address these challenges either by leveraging the local information among neighboring genomic regions, or by globally grouping genomic regions that are affected by similar CNAs across the entire genome. However, both approaches have limitations: overclustering in the case of local segmentation, or the omission of clusters corresponding to focal CNAs in the case of global segmentation. Importantly, inaccurate segmentation will lead to inaccurate identification of CNAs. For this reason, most pan-cancer research studies rely on manual procedures of quality control and anomaly correction. To improve copy-number segmentation, we introduce CNAViz, a web-based tool that enables the user to simultaneously perform local and global segmentation, thus overcoming the limitations of each approach. Using simulated data, we demonstrate that by several metrics, CNAViz allows the user to obtain more accurate segmentation relative to existing local and global segmentation methods. Moreover, we analyze six bulk DNA sequencing samples from three breast cancer patients. By validating with parallel single-cell DNA sequencing data from the same samples, we show that by using CNAViz, our user was able to obtain more accurate segmentation and improved accuracy in downstream copy-number calling.


Asunto(s)
Neoplasias de la Mama , Neoplasias , Humanos , Femenino , Variaciones en el Número de Copia de ADN/genética , Neoplasias/genética , Algoritmos , Análisis de Secuencia de ADN , ADN de Neoplasias , Neoplasias de la Mama/genética
12.
Mol Biol Evol ; 39(7)2022 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-35700225

RESUMEN

Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the 5' end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.


Asunto(s)
Infecciones por Coronavirus , Coronavirus , Coronavirus/genética , Infecciones por Coronavirus/genética , Humanos , ARN Mensajero/genética , ARN Viral/genética , Transcripción Genética
13.
Algorithms Mol Biol ; 17(1): 3, 2022 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-35282838

RESUMEN

BACKGROUND: Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor's clonal composition. RESULTS: To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. CONCLUSION: PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods.

14.
Appl Environ Microbiol ; 88(7): e0228921, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35285246

RESUMEN

Monitoring the prevalence of SARS-CoV-2 variants is necessary to make informed public health decisions during the COVID-19 pandemic. PCR assays have received global attention, facilitating a rapid understanding of variant dynamics because they are more accessible and scalable than genome sequencing. However, as PCR assays target only a few mutations, their accuracy could be reduced when these mutations are not exclusive to the target variants. Here we introduce PRIMES, an algorithm that evaluates the sensitivity and specificity of SARS-CoV-2 variant-specific PCR assays across different geographical regions by incorporating sequences deposited in the GISAID database. Using PRIMES, we determined that the accuracy of several PCR assays decreased when applied beyond the geographic scope of the study in which the assays were developed. Subsequently, we used this tool to design Alpha and Delta variant-specific PCR assays for samples from Illinois, USA. In silico analysis using PRIMES determined the sensitivity/specificity to be 0.99/0.99 for the Alpha variant-specific PCR assay and 0.98/1.00 for the Delta variant-specific PCR assay in Illinois, respectively. We applied these two variant-specific PCR assays to six local sewage samples and determined the dominant SARS-CoV-2 variant of either the wild type, the Alpha variant, or the Delta variant. Using next-generation sequencing (NGS) of the spike (S) gene amplicons of the Delta variant-dominant samples, we found six mutations exclusive to the Delta variant (S:T19R, S:Δ156/157, S:L452R, S:T478K, S:P681R, and S:D950N). The consistency between the variant-specific PCR assays and the NGS results supports the applicability of PRIMES. IMPORTANCE Monitoring the introduction and prevalence of variants of concern (VOCs) and variants of interest (VOIs) in a community can help the local authorities make informed public health decisions. PCR assays can be designed to keep track of SARS-CoV-2 variants by measuring unique mutation markers that are exclusive to the target variants. However, the mutation markers may not be exclusive to the target variants because of regional and temporal differences in variant dynamics. We introduce PRIMES, an algorithm that enables the design of reliable PCR assays for variant detection. Because PCR is more accessible, scalable, and robust for sewage samples than sequencing technology, our findings will contribute to improving global SARS-CoV-2 variant surveillance.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiología , Humanos , Mutación , Pandemias , Reacción en Cadena de la Polimerasa , SARS-CoV-2/genética , Aguas del Alcantarillado
15.
Pac Symp Biocomput ; 27: 397-401, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34890166

RESUMEN

Cancer results from an evolutionary process that yields a heterogeneous tumor with distinct subpopulations and varying sets of somatic mutations. This perspective discusses computational methods to infer models of evolutionary processes in cancer that aim to improve our understanding of tumorigenesis and ultimately enhance current clinical practice.


Asunto(s)
Biología Computacional , Neoplasias , Humanos , Mutación , Neoplasias/genética
16.
Environ Sci Pollut Res Int ; 29(8): 12237-12248, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34562219

RESUMEN

The discovery of the occurrence of inorganic pollutants in surface waters is identified in the system assessment quality. The most harmful elements are pesticides, persistent organic pollutants, pharmaceuticals, personal care products, and heavy metals are still dangerous to the environment due to their general uses. Chromate has the largest concentration compared to the other metals in the wastewater industries. This work evaluates the application of the spinel p-CoAl2O4 as a photocatalyst prepared by the nitrate synthesis process to reduce Cr(VI), a hazardous metal for the environment. The photocatalyst was characterized using thermal analysis (TG), X-ray diffraction, UV-diffuse reflectance spectroscopy, scanning electron microscopy, fluorescent X-ray, Fourier transform infrared spectroscopy, electrical conductivity, and photoelectrochemically. The results showed that the efficiency of optimum reduction of Cr(Vl) to Cr(IIl) photoreduction is more effective (77%) for pH = 3.6 than that at high pH values up to 8 (7%). Moreover, the effect of the hetero-system CoAl2O4/ZnO on photocatalytic efficiency was investigated. The photocatalytic activity increases up to 99% with 1 g L-1, a total catalyst dosage over the hetero-system CoAl2O4/ZnO at a ratio of 75%/25%. This data is better relative to CoAl2O4 or ZnO alone. The Cr(VI) photoreduction activity improvement was caused by the best separation and the photogeneration of electron-hole on the CoAl2O4/ZnO surfaces. Finally, the Lagergren pseudo-first-order and the Langmuir-Hinshelwood models fit well the experimental kinetics.


Asunto(s)
Óxido de Zinc , Óxido de Aluminio , Catálisis , Cromo , Cobalto , Óxido de Magnesio , Espectroscopía Infrarroja por Transformada de Fourier
17.
Nat Commun ; 12(1): 6728, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34795232

RESUMEN

Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.


Asunto(s)
COVID-19/genética , SARS-CoV-2/genética , Empalme Alternativo/genética , Perfilación de la Expresión Génica , Humanos , Transcriptoma/genética
18.
Cell Syst ; 12(10): 1004-1018.e10, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34416171

RESUMEN

The cancer cell fraction (CCF), or proportion of cancerous cells in a tumor containing a single-nucleotide variant (SNV), is a fundamental statistic used to quantify tumor heterogeneity and evolution. Existing CCF estimation methods from bulk DNA sequencing data assume that every cell with an SNV contains the same number of copies of the SNV. This assumption is unrealistic in tumors with copy-number aberrations that alter SNV multiplicities. Furthermore, the CCF does not account for SNV losses due to copy-number aberrations, confounding downstream phylogenetic analyses. We introduce DeCiFer, an algorithm that overcomes these limitations by clustering SNVs using a novel statistic, the descendant cell fraction (DCF). The DCF quantifies both the prevalence of an SNV at the present time and its past evolutionary history using an evolutionary model that allows mutation losses. We show that DeCiFer yields more parsimonious reconstructions of tumor evolution than previously reported for 49 prostate cancer samples.


Asunto(s)
Neoplasias , Polimorfismo de Nucleótido Simple , Algoritmos , Humanos , Masculino , Neoplasias/genética , Neoplasias/patología , Filogenia , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN
19.
Algorithms Mol Biol ; 16(1): 14, 2021 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-34229713

RESUMEN

BACKGROUND: Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor's evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor's evolutionary history as either linear or branched. RESULTS: We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. CONCLUSION: Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor's single-cell DNA sequencing data.

20.
Bioinformatics ; 37(Suppl_1): i214-i221, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252961

RESUMEN

MOTIVATION: While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance. RESULTS: We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results. AVAILABILITY AND IMPLEMENTATION: https://github.com/elkebir-group/doubletD. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Perfilación de la Expresión Génica , Funciones de Verosimilitud , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...