Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
J Digit Imaging ; 23(5): 554-61, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19760292

RESUMO

The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5-17.6), irregular mass shape (OR 10.0, CI 3.4-29.5), spiculated mass margin (OR 20.4, CI 1.9-222.8), and subject age (ß = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings.


Assuntos
Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Densitometria/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Biópsia , Mama/patologia , Neoplasias da Mama/patologia , Feminino , Humanos , Modelos Logísticos , Mamografia , Valor Preditivo dos Testes , Estudos Prospectivos , Sistema de Registros
2.
Bioinformatics ; 23(21): 2851-8, 2007 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-17933855

RESUMO

MOTIVATION: One bottleneck in high-throughput protein crystallography is interpreting an electron-density map, that is, fitting a molecular model to the 3D picture crystallography produces. Previously, we developed ACMI (Automatic Crystallographic Map Interpreter), an algorithm that uses a probabilistic model to infer an accurate protein backbone layout. Here, we use a sampling method known as particle filtering to produce a set of all-atom protein models. We use the output of ACMI to guide the particle filter's sampling, producing an accurate, physically feasible set of structures. RESULTS: We test our algorithm on 10 poor-quality experimental density maps. We show that particle filtering produces accurate all-atom models, resulting in fewer chains, lower sidechain RMS error and reduced R factor, compared to simply placing the best-matching sidechains on ACMI's trace. We show that our approach produces a more accurate model than three leading methods--Textal, Resolve and ARP/WARP--in terms of main chain completeness, sidechain identification and crystallographic R factor. AVAILABILITY: Source code and experimental density maps available at http://ftp.cs.wisc.edu/machine-learning/shavlik-group/programs/acmi/


Assuntos
Absorciometria de Fóton/métodos , Algoritmos , Cristalografia por Raios X/métodos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Simulação por Computador , Filtração/métodos , Modelos Estatísticos , Tamanho da Partícula , Conformação Proteica
3.
Bioinformatics ; 22(14): e81-9, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16873525

RESUMO

One particularly time-consuming step in protein crystallography is interpreting the electron density map; that is, fitting a complete molecular model of the protein into a 3D image of the protein produced by the crystallographic process. In poor-quality electron density maps, the interpretation may require a significant amount of a crystallographer's time. Our work investigates automating the time-consuming initial backbone trace in poor-quality density maps. We describe ACMI (Automatic Crystallographic Map Interpreter), which uses a probabilistic model known as a Markov field to represent the protein. Residues of the protein are modeled as nodes in a graph, while edges model pairwise structural interactions. Modeling the protein in this manner allows the model to be flexible, considering an almost infinite number of possible conformations, while rejecting any that are physically impossible. Using an efficient algorithm for approximate inference--belief propagation--allows the most probable trace of the protein's backbone through the density map to be determined. We test ACMI on a set of ten protein density maps (at 2.5 to 4.0 A resolution), and compare our results to alternative approaches. At these resolutions, ACMI offers a more accurate backbone trace than current approaches.


Assuntos
Cristalografia por Raios X/métodos , Microanálise por Sonda Eletrônica/métodos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos , Dados de Sequência Molecular
4.
Artigo em Inglês | MEDLINE | ID: mdl-26306246

RESUMO

While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy.

5.
Artigo em Inglês | MEDLINE | ID: mdl-26158123

RESUMO

Machine learning is continually being applied to a growing set of fields, including the social sciences, business, and medicine. Some fields present problems that are not easily addressed using standard machine learning approaches and, in particular, there is growing interest in differential prediction. In this type of task we are interested in producing a classifier that specifically characterizes a subgroup of interest by maximizing the difference in predictive performance for some outcome between subgroups in a population. We discuss adapting maximum margin classifiers for differential prediction. We first introduce multiple approaches that do not affect the key properties of maximum margin classifiers, but which also do not directly attempt to optimize a standard measure of differential prediction. We next propose a model that directly optimizes a standard measure in this field, the uplift measure. We evaluate our models on real data from two medical applications and show excellent results.

6.
Artigo em Inglês | MEDLINE | ID: mdl-26158122

RESUMO

We introduce Score As You Lift (SAYL), a novel Statistical Relational Learning (SRL) algorithm, and apply it to an important task in the diagnosis of breast cancer. SAYL combines SRL with the marketing concept of uplift modeling, uses the area under the uplift curve to direct clause construction and final theory evaluation, integrates rule learning and probability assignment, and conditions the addition of each new theory rule to existing ones. Breast cancer, the most common type of cancer among women, is categorized into two subtypes: an earlier in situ stage where cancer cells are still confined, and a subsequent invasive stage. Currently older women with in situ cancer are treated to prevent cancer progression, regardless of the fact that treatment may generate undesirable side-effects, and the woman may die of other causes. Younger women tend to have more aggressive cancers, while older women tend to have more indolent tumors. Therefore older women whose in situ tumors show significant dissimilarity with in situ cancer in younger women are less likely to progress, and can thus be considered for watchful waiting. Motivated by this important problem, this work makes two main contributions. First, we present the first multi-relational uplift modeling system, and introduce, implement and evaluate a novel method to guide search in an SRL framework. Second, we compare our algorithm to previous approaches, and demonstrate that the system can indeed obtain differential rules of interest to an expert on real data, while significantly improving the data uplift.

7.
Healthcom ; 2013(15th): 283-285, 2013 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-26501132

RESUMO

When mammography reveals a suspicious finding, a core needle biopsy is usually recommended. In 5% to 15% of these cases, the biopsy diagnosis is non-definitive and a more invasive surgical excisional biopsy is recommended to confirm a diagnosis. The majority of these cases will ultimately be proven benign. The use of excisional biopsy for diagnosis negatively impacts patient quality of life and increases costs to the healthcare system. In this work, we employ a multi-relational machine learning approach to predict when a patient with a non-definitive core needle biopsy diagnosis need not undergo an excisional biopsy procedure because the risk of malignancy is low.

8.
AMIA Annu Symp Proc ; 2013: 876-85, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24551380

RESUMO

Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk prediction on mammograms. We also identified the interaction effect between the genetic variants and collected mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disease patterns, mechanisms, and/or natural history.


Assuntos
Teorema de Bayes , Neoplasias da Mama/genética , Mamografia , Medição de Risco/métodos , Neoplasias da Mama/diagnóstico por imagem , Estudos de Casos e Controles , Feminino , Predisposição Genética para Doença , Genótipo , Humanos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Curva ROC
9.
J Bioinform Comput Biol ; 10(1): 1240009, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22809310

RESUMO

Protein X-ray crystallography--the most popular method for determining protein structures--remains a laborious process requiring a great deal of manual crystallographer effort to interpret low-quality protein images. Automating this process is critical in creating a high-throughput protein-structure determination pipeline. Previously, our group developed ACMI, a probabilistic framework for producing protein-structure models from electron-density maps produced via X-ray crystallography. ACMI uses a Markov Random Field to model the three-dimensional (3D) location of each non-hydrogen atom in a protein. Calculating the best structure in this model is intractable, so ACMI uses approximate inference methods to estimate the optimal structure. While previous results have shown ACMI to be the state-of-the-art method on this task, its approximate inference algorithm remains computationally expensive and susceptible to errors. In this work, we develop Probabilistic Ensembles in ACMI (PEA), a framework for leveraging multiple, independent runs of approximate inference to produce estimates of protein structures. Our results show statistically significant improvements in the accuracy of inference resulting in more complete and accurate protein structures. In addition, PEA provides a general framework for advanced approximate inference methods in complex problem domains.


Assuntos
Algoritmos , Proteínas/química , Simulação por Computador , Cristalografia por Raios X , Probabilidade , Conformação Proteica
10.
Cancer ; 116(14): 3310-21, 2010 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-20564067

RESUMO

BACKGROUND: Discriminating malignant breast lesions from benign ones and accurately predicting the risk of breast cancer for individual patients are crucial to successful clinical decisions. In the past, several artificial neural network (ANN) models have been developed for breast cancer-risk prediction. All studies have reported discrimination performance, but not one has assessed calibration, which is an equivalently important measure for accurate risk prediction. In this study, the authors have evaluated whether an artificial neural network (ANN) trained on a large prospectively collected dataset of consecutive mammography findings can discriminate between benign and malignant disease and accurately predict the probability of breast cancer for individual patients. METHODS: Our dataset consisted of 62,219 consecutively collected mammography findings matched with the Wisconsin State Cancer Reporting System. The authors built a 3-layer feedforward ANN with 1000 hidden-layer nodes. The authors trained and tested their ANN by using 10-fold cross-validation to predict the risk of breast cancer. The authors used area the under the receiver-operating characteristic curve (AUC), sensitivity, and specificity to evaluate discriminative performance of the radiologists and their ANN. The authors assessed the accuracy of risk prediction (ie, calibration) of their ANN by using the Hosmer-Lemeshow (H-L) goodness-of-fit test. RESULTS: Their ANN demonstrated superior discrimination (AUC, 0.965) compared with the radiologists (AUC, 0.939; P<.001). The authors' ANN was also well calibrated as shown by an H-L goodness of fit P-value of .13. CONCLUSIONS: The authors' ANN can effectively discriminate malignant abnormalities from benign ones and accurately predict the risk of breast cancer for individual abnormalities.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Redes Neurais de Computação , Idoso , Calibragem , Tomada de Decisões Assistida por Computador , Diagnóstico por Computador , Discriminação Psicológica , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Mamografia , Pessoa de Meia-Idade , Medição de Risco , Sensibilidade e Especificidade
11.
Int J Data Min Bioinform ; 3(2): 205-27, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19517990

RESUMO

Several methods for automatically constructing a protein model from an electron-density map require searching for many small protein-fragment templates in the density. We propose to use the spherical-harmonic decomposition of the template and the maps density to speed this matching. Unlike other template-matching approaches, this allows us to eliminate large portions of the map unlikely to match any templates. We train several first-pass filters for this elimination task. We show our new template-matching method improves accuracy and reduces running time, compared to previous approaches. Finally, we extend our method to produce a structural-homology detection algorithm using electron density.


Assuntos
Cristalografia por Raios X/métodos , Modelos Moleculares , Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Probabilidade , Conformação Proteica
12.
Artigo em Inglês | MEDLINE | ID: mdl-23765123

RESUMO

Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

13.
Artigo em Inglês | MEDLINE | ID: mdl-16448001

RESUMO

Current methods for interpreting oligonucleotide-based SNP-detection microarrays, SNP chips, are based on statistics and require extensive parameter tuning as well as extremely high-resolution images of the chip being processed. We present a method, based on a simple data-classification technique called nearest-neighbors that, on haploid organisms, produces results comparable to the published results of the leading statistical methods and requires very little in the way of parameter tuning. Furthermore, it can interpret SNP chips using lower-resolution scanners of the type more typically used in current microarray experiments. Along with our algorithm, we present the results of a SNP-detection experiment where, when independently applying this algorithm to six identical SARS SNP chips, we correctly identify all 24 SNPs in a particular strain of the SARS virus, with between 6 and 13 false positives across the six experiments.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Reconhecimento Automatizado de Padrão/métodos
14.
Bioinformatics ; 19(10): 1227-35, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12835266

RESUMO

MOTIVATION: In order to understand transcription regulation in a given prokaryotic genome, it is critical to identify operons, the fundamental units of transcription, in such species. While there are a growing number of organisms whose sequence and gene coordinates are known, by and large their operons are not known. RESULTS: We present a probabilistic approach to predicting operons using Bayesian networks. Our approach exploits diverse evidence sources such as sequence and expression data. We evaluate our approach on the Escherichia coli K-12 genome where our results indicate we are able to identify over 78% of its operons at a 10% false positive rate. Also, empirical evaluation using a reduced set of data sources suggests that our approach may have significant value for organisms that do not have as rich of evidence sources as E.coli. AVAILABILITY: Our E.coli K-12 operon predictions are available at http://www.biostat.wisc.edu/gene-regulation.


Assuntos
Algoritmos , Teorema de Bayes , Escherichia coli/genética , Perfilação da Expressão Gênica/métodos , Regulação Bacteriana da Expressão Gênica/genética , Óperon/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Genoma Bacteriano , Regiões Promotoras Genéticas/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA