Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Methods ; 126: 18-28, 2017 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-28651966

RESUMEN

RNA-binding proteins recognize RNA sequences and structures, but there is currently no systematic and accurate method to derive large (>12base) motifs de novo that reflect a combination of intrinsic preference to both sequence and structure. To address this absence, we introduce RNAcompete-S, which couples a single-step competitive binding reaction with an excess of random RNA 40-mers to a custom computational pipeline for interrogation of the bound RNA sequences and derivation of SSMs (Sequence and Structure Models). RNAcompete-S confirms that HuR, QKI, and SRSF1 prefer binding sites that are single stranded, and recapitulates known 8-10bp sequence and structure preferences for Vts1p and RBMY. We also derive an 18-base long SSM for Drosophila SLBP, which to our knowledge has not been previously determined by selections from pure random sequence, and accurately discriminates human replication-dependent histone mRNAs. Thus, RNAcompete-S enables accurate identification of large, intrinsic sequence-structure specificities with a uniform assay.


Asunto(s)
Secuencia de Bases/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas de Unión al ARN/genética , Humanos , Proteínas de Unión al ARN/química , Análisis de Secuencia de ARN/métodos
2.
BMC Bioinformatics ; 15: 35, 2014 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-24484323

RESUMEN

BACKGROUND: High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. But automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. RESULTS: We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a "partial order plot". Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available. CONCLUSIONS: PhyloSub can be applied to frequencies of any "binary" somatic mutation, including SNVs as well as small insertions and deletions. The PhyloSub and partial order plot software is available from https://github.com/morrislab/phylosub/.


Asunto(s)
Evolución Clonal/genética , Biología Computacional/métodos , Neoplasias/genética , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Teorema de Bayes , Técnicas Citológicas , Evolución Molecular , Genotipo , Humanos , Mutación , Neoplasias/clasificación , Filogenia , Programas Informáticos
3.
Neural Comput ; 24(9): 2473-507, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22594829

RESUMEN

The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches.


Asunto(s)
Algoritmos , Inteligencia Artificial , Inhibición Psicológica , Reconocimiento de Normas Patrones Automatizadas , Animales , Encéfalo/anatomía & histología , Clasificación/métodos , Humanos , Modelos Estadísticos , Procesos Estocásticos , Sinapsis/fisiología
4.
Am J Med ; 130(5): 601.e17-601.e22, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-28065773

RESUMEN

BACKGROUND: A small proportion of patients account for a high proportion of healthcare use. Accurate preemptive identification may facilitate tailored intervention. We sought to determine whether machine learning techniques using text from a family practice electronic medical record can be used to predict future high emergency department use and total costs by patients who are not yet high emergency department users or high cost to the healthcare system. METHODS: Text from fields of the cumulative patient profile within an electronic medical record of 43,111 patients was indexed. Separate training and validation cohorts were created. After processing, 11,905 words were used to fit a logistic regression model. The primary outcomes of interest in the 12 months after prediction were 3 or more emergency department visits and being in the top 5% in healthcare expenditures. Outcomes were assessed through linkage to administrative databases housed at the Institute for Clinical Evaluative Sciences. RESULTS: In the model to predict frequent emergency department visits, after excluding patients who were high emergency department users in the previous year, the area under the receiver operating characteristic curve was 0.71. By using the same methodology, the model to predict the top 5% in total system costs had an area under the receiver operating characteristic curve of 0.76. CONCLUSIONS: Machine learning techniques can be applied to analyze free text contained in electronic medical records. This dataset is more predictive of patients who will generate future high costs than future emergency department visits. It remains to be seen whether these predictions can be used to reduce costs by early interventions in this cohort of patients.


Asunto(s)
Minería de Datos , Registros Electrónicos de Salud , Servicio de Urgencia en Hospital/economía , Servicio de Urgencia en Hospital/estadística & datos numéricos , Costos de Hospital , Modelos Logísticos , Algoritmos , Humanos , Curva ROC
5.
Pac Symp Biocomput ; : 20-31, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25592565

RESUMEN

Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…


Asunto(s)
Neoplasias/patología , Algoritmos , Teorema de Bayes , Biología Computacional , Simulación por Computador , Frecuencia de los Genes , Genotipo , Humanos , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/patología , Funciones de Verosimilitud , Aprendizaje Automático , Modelos Biológicos , Modelos Estadísticos , Mutación , Neoplasias/genética , Células Madre Neoplásicas/patología , Filogenia , Estadísticas no Paramétricas
6.
Genome Biol ; 16: 35, 2015 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-25786235

RESUMEN

Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Filogenia , Algoritmos , Células Clonales , Análisis por Conglomerados , Simulación por Computador , Variaciones en el Número de Copia de ADN , Frecuencia de los Genes , Heterogeneidad Genética , Humanos , Mutación , Estándares de Referencia
7.
Pac Symp Biocomput ; : 388-99, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24297564

RESUMEN

Label propagation methods are extremely well-suited for a variety of biomedical prediction tasks based on network data. However, these algorithms cannot be used to integrate feature-based data sources with networks. We propose an efficient learning algorithm to integrate these two types of heterogeneous data sources to perform binary prediction tasks on node features (e.g., gene prioritization, disease gene prediction). Our method, LMGraph, consists of two steps. In the first step, we extract a small set of "network features" from the nodes of networks that represent connectivity with labeled nodes in the prediction tasks. In the second step, we apply a simple weighting scheme in conjunction with linear classifiers to combine these network features with other feature data. This two-step procedure allows us to (i) learn highly scalable and computationally efficient linear classifiers, (ii) and seamlessly combine feature-based data sources with networks. Our method is much faster than label propagation which is already known to be computationally efficient on large-scale prediction problems. Experiments on multiple functional interaction networks from three species (mouse, y, C.elegans) with tens of thousands of nodes and hundreds of binary prediction tasks demonstrate the efficacy of our method.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Animales , Inteligencia Artificial , Caenorhabditis elegans/genética , Biología Computacional , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas/estadística & datos numéricos , Drosophila melanogaster/genética , Ontología de Genes/estadística & datos numéricos , Ratones , Modelos Genéticos
8.
Proc Int Conf Mach Learn ; 2012: 703-710, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-25285328

RESUMEN

In many real-world applications of machine learning classifiers, it is essential to predict the probability of an example belonging to a particular class. This paper proposes a simple technique for predicting probabilities based on optimizing a ranking loss, followed by isotonic regression. This semi-parametric technique offers both good ranking and regression performance, and models a richer set of probability distributions than statistical workhorses such as logistic regression. We provide experimental results that show the effectiveness of this technique on real-world applications of probability prediction.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA