Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder.

Cao, Shichen; Li, Jingjing; Nelson, Kenric P; Kon, Mark A.

Entropy (Basel) ; 24(3)2022 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-35327933

RESUMO

We present a coupled variational autoencoder (VAE) method, which improves the accuracy and robustness of the model representation of handwritten numeral images. The improvement is measured in both increasing the likelihood of the reconstructed images and in reducing divergence between the posterior and a prior latent distribution. The new method weighs outlier samples with a higher penalty by generalizing the original evidence lower bound function using a coupled entropy function based on the principles of nonlinear statistical coupling. We evaluated the performance of the coupled VAE model using the Modified National Institute of Standards and Technology (MNIST) dataset and its corrupted modification C-MNIST. Histograms of the likelihood that the reconstruction matches the original image show that the coupled VAE improves the reconstruction and this improvement is more substantial when seeded with corrupted images. All five corruptions evaluated showed improvement. For instance, with the Gaussian corruption seed the accuracy improves by 1014 (from 10-57.2 to 10-42.9) and robustness improves by 1022 (from 10-109.2 to 10-87.0). Furthermore, the divergence between the posterior and prior distribution of the latent distribution is reduced. Thus, in contrast to the ß-VAE design, the coupled VAE algorithm improves model representation, rather than trading off the performance of the reconstruction and latent distribution divergence.

2.

Optimizing decision tree structures for spectral histopathology (SHP).

Mu, Xinying; Remiszewski, Stan; Kon, Mark; Ergin, Aysegül; Diem, Max.

Analyst ; 143(24): 5935-5939, 2018 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-30406772

RESUMO

This paper reviews methods to arrive at optimum decision tree or label tree structures to analyze large SHP datasets. Supervised methods of analysis can utilize either sequential or (flat) multi-classifiers depending on the variance in the data, and on the number of spectral classes to be distinguished. For small number of spectral classes, multi-classifiers have been used in the past, but for the analysis of datasets containing large numbers (â¼20) of disease or tissue types, mixed decision tree structures were found to be advantageous. In these mixed structures, discrimination into classes and subclasses is achieved via hierarchical decision/label tree structures.

Assuntos

Árvores de Decisões , Patologia/métodos , Algoritmos , Neoplasias da Mama/classificação , Humanos , Neoplasias Pulmonares/classificação

3.

Multimodal Learning and Intelligent Prediction of Symptom Development in Individual Parkinson's Patients.

Przybyszewski, Andrzej W; Kon, Mark; Szlufik, Stanislaw; Szymanski, Artur; Habela, Piotr; Koziorowski, Dariusz M.

Sensors (Basel) ; 16(9)2016 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-27649187

RESUMO

We still do not know how the brain and its computations are affected by nerve cell deaths and their compensatory learning processes, as these develop in neurodegenerative diseases (ND). Compensatory learning processes are ND symptoms usually observed at a point when the disease has already affected large parts of the brain. We can register symptoms of ND such as motor and/or mental disorders (dementias) and even provide symptomatic relief, though the structural effects of these are in most cases not yet understood. It is very important to obtain early diagnosis, which can provide several years in which we can monitor and partly compensate for the disease's symptoms, with the help of various therapies. In the case of Parkinson's disease (PD), in addition to classical neurological tests, measurements of eye movements are diagnostic. We have performed measurements of latency, amplitude, and duration in reflexive saccades (RS) of PD patients. We have compared the results of our measurement-based diagnoses with standard neurological ones. The purpose of our work was to classify how condition attributes predict the neurologist's diagnosis. For n = 10 patients, the patient age and parameters based on RS gave a global accuracy in predictions of neurological symptoms in individual patients of about 80%. Further, by adding three attributes partly related to patient 'well-being' scores, our prediction accuracies increased to 90%. Our predictive algorithms use rough set theory, which we have compared with other classifiers such as Naïve Bayes, Decision Trees/Tables, and Random Forests (implemented in KNIME/WEKA). We have demonstrated that RS are powerful biomarkers for assessment of symptom progression in PD.

Assuntos

Aprendizado de Máquina , Doença de Parkinson/diagnóstico , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Movimentos Sacádicos/fisiologia

4.

Classification of malignant and benign tumors of the lung by infrared spectral histopathology (SHP).

Akalin, Ali; Mu, Xinying; Kon, Mark A; Ergin, Aysegül; Remiszewski, Stan H; Thompson, Clay M; Raz, Dan J; Diem, Max; Bird, Benjamin; Miljkovic, Milos.

Lab Invest ; 95(4): 406-21, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25664390

RESUMO

We report results of a study utilizing a novel tissue classification method, based on label-free spectral techniques, for the classification of lung cancer histopathological samples on a tissue microarray. The spectral diagnostic method allows reproducible and objective classification of unstained tissue sections. This is accomplished by acquiring infrared data sets containing thousands of spectra, each collected from tissue pixels â¼6 µm on edge; these pixel spectra contain an encoded snapshot of the entire biochemical composition of the pixel area. The hyperspectral data sets are subsequently decoded by methods of multivariate analysis that reveal changes in the biochemical composition between tissue types, and between various stages and states of disease. In this study, a detailed comparison between classical and spectral histopathology is presented, suggesting that spectral histopathology can achieve levels of diagnostic accuracy that is comparable to that of multipanel immunohistochemistry.

Assuntos

Técnicas Histológicas/métodos , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/patologia , Espectrofotometria Infravermelho/métodos , Análise Serial de Tecidos/métodos , Humanos , Análise Multivariada

5.

Statistical analysis of a lung cancer spectral histopathology (SHP) data set.

Mu, Xinying; Kon, Mark; Ergin, Aysegül; Remiszewski, Stan; Akalin, Ali; Thompson, Clay M; Diem, Max.

Analyst ; 140(7): 2449-64, 2015 Apr 07.

Artigo em Inglês | MEDLINE | ID: mdl-25664623

RESUMO

We report results on a statistical analysis of an infrared spectral dataset comprising a total of 388 lung biopsies from 374 patients. The method of correlating classical and spectral results and analyzing the resulting data has been referred to as spectral histopathology (SHP) in the past. Here, we show that standard bio-statistical procedures, such as strict separation of training and blinded test sets, result in a balanced accuracy of better than 95% for the distinction of normal, necrotic and cancerous tissues, and better than 90% balanced accuracy for the classification of small cell, squamous cell and adenocarcinomas. Preliminary results indicate that further sub-classification of adenocarcinomas should be feasible with similar accuracy once sufficiently large datasets have been collected.

Assuntos

Interpretação Estatística de Dados , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/patologia , Algoritmos , Inteligência Artificial , Humanos , Espectrofotometria Infravermelho

6.

Uncertainty quantification of receptor ligand binding sites prediction.

Chen, Nanjie; Yu, Dongliang; Beglov, Dmitri; Kon, Mark; Castrillon-Candas, Julio Enrique.

ArXiv ; 2024 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-38344224

RESUMO

Recent advancements in protein docking site prediction have highlighted the limitations of traditional rigid docking algorithms, like PIPER, which often neglect critical stochastic elements such as solvent-induced fluctuations. These oversights can lead to inaccuracies in identifying viable docking sites due to the complexity of high-dimensional, stochastic energy manifolds with low regularity. To address this issue, our research introduces a novel model where the molecular shapes of ligands and receptors are represented using multi-variate Karhunen-Lo `eve (KL) expansions. This method effectively captures the stochastic nature of energy manifolds, allowing for a more accurate representation of molecular interactions.Developed as a plugin for PIPER, our scientific computing software enhances the platform, delivering robust uncertainty measures for the energy manifolds of ranked binding sites. Our results demonstrate that top-ranked binding sites, characterized by lower uncertainty in the stochastic energy manifold, align closely with actual docking sites. Conversely, sites with higher uncertainty correlate with less optimal docking positions. This distinction not only validates our approach but also sets a new standard in protein docking predictions, offering substantial implications for future molecular interaction research and drug development.

7.

Biomedical informatics for computer-aided decision support systems: a survey.

Belle, Ashwin; Kon, Mark A; Najarian, Kayvan.

ScientificWorldJournal ; 2013: 769639, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23431259

RESUMO

The volumes of current patient data as well as their complexity make clinical decision making more challenging than ever for physicians and other care givers. This situation calls for the use of biomedical informatics methods to process data and form recommendations and/or predictions to assist such decision makers. The design, implementation, and use of biomedical informatics systems in the form of computer-aided decision support have become essential and widely used over the last two decades. This paper provides a brief review of such systems, their application protocols and methodologies, and the future challenges and directions they suggest.

Assuntos

Tomada de Decisões Assistida por Computador , Sistemas de Apoio a Decisões Clínicas , Informática Médica/métodos , Inteligência Artificial , Tecnologia Biomédica , Biologia Computacional/métodos , Biologia Computacional/tendências , Coleta de Dados , Técnicas de Apoio para a Decisão , Odontologia/métodos , Medicina de Emergência , Humanos , Processamento de Imagem Assistida por Computador , Unidades de Terapia Intensiva , Neoplasias/terapia , Radiologia/métodos

8.

Infrared spectral histopathology (SHP): a novel diagnostic tool for the accurate classification of lung cancer.

Bird, Benjamin; Miljkovic, Milo Sbreve; Remiszewski, Stan; Akalin, Ali; Kon, Mark; Diem, Max.

Lab Invest ; 92(9): 1358-73, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22751349

RESUMO

We report results of a study utilizing a recently developed tissue diagnostic method, based on label-free spectral techniques, for the classification of lung cancer histopathological samples from a tissue microarray. The spectral diagnostic method allows reproducible and objective diagnosis of unstained tissue sections. This is accomplished by acquiring infrared hyperspectral data sets containing thousands of spectra, each collected from tissue pixels about 6 µm on edge; these pixel spectra contain an encoded snapshot of the entire biochemical composition of the pixel area. The hyperspectral data sets are subsequently decoded by methods of multivariate analysis, which reveal changes in the biochemical composition between tissue types, and between various stages and states of disease. In this study, a detailed comparison between classical and spectral histopathology (SHP) is presented, which suggests SHP can achieve levels of diagnostic accuracy that is comparable to that of multi-panel immunohistochemistry.

Assuntos

Neoplasias Pulmonares/diagnóstico , Espectrofotometria Infravermelho/métodos , Humanos , Neoplasias Pulmonares/classificação

9.

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.

Shi, Ping; Ray, Surajit; Zhu, Qifu; Kon, Mark A.

BMC Bioinformatics ; 12: 375, 2011 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-21939564

RESUMO

BACKGROUND: The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers. RESULTS: We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets CONCLUSIONS: The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis.

Assuntos

Algoritmos , Inteligência Artificial , Neoplasias/tratamento farmacológico , Neoplasias/genética , Humanos , Neoplasias/metabolismo , Neoplasias/radioterapia , Prognóstico , Software , Máquina de Vetores de Suporte

10.

BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes.

DiMucci, Demetrius; Kon, Mark; Segrè, Daniel.

Front Mol Biosci ; 8: 663532, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34222331

RESUMO

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables ("rules") frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn's disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.

11.

Classification of malignant and benign tumors of the lung by infrared spectral histopathology (SHP).

Akalin, Ali; Mu, Xinying; Kon, Mark A; Ergin, Aysegül; Remiszewski, Stan H; Thompson, Clay M; Raz, Dan J; Diem, Max.

Lab Invest ; 95(6): 697, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-26012705

12.

Analytic regularity and stochastic collocation of high-dimensional Newton iterates.

Castrillón-Candás, Julio E; Kon, Mark.

Adv Comput Math ; 46(3)2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32377059

RESUMO

In this paper we introduce concepts from uncertainty quantification (UQ) and numerical analysis for the efficient evaluation of stochastic high dimensional Newton iterates. In particular, we develop complex analytic regularity theory of the solution with respect to the random variables. This justifies the application of sparse grids for the computation of statistical measures. Convergence rates are derived and are shown to be subexponential or algebraic with respect to the number of realizations of random perturbations. Due the accuracy of the method, sparse grids are well suited for computing low probability events with high confidence. We apply our method to the power flow problem. Numerical experiments on the non-trivial 39 bus New England power system model with large stochastic loads are consistent with the theoretical convergence rates. Moreover, compared to the Monte Carlo method our approach is at least 1011 times faster for the same accuracy.

13.

Identifying factors associated with opioid cessation in a biracial sample using machine learning.

Cox, Jiayi W; Sherva, Richard M; Lunetta, Kathryn L; Saitz, Richard; Kon, Mark; Kranzler, Henry R; Gelernter, Joel; Farrer, Lindsay A.

Explor Med ; 1(1): 27-41, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33554217

RESUMO

AIM: Racial disparities in opioid use disorder (OUD) management exist, however, and there is limited research on factors that influence opioid cessation in different population groups. METHODS: We employed multiple machine learning prediction algorithms least absolute shrinkage and selection operator, random forest, deep neural network, and support vector machine to assess factors associated with ceasing opioid use in a sample of 1,192 African Americans (AAs) and 2,557 individuals of European ancestry (EAs) who met Diagnostic and Statistical Manual of Mental Disorders, 5th Edition criteria for OUD. Values for nearly 4,000 variables reflecting demographics, alcohol and other drug use, general health, non-drug use behaviors, and diagnoses for other psychiatric disorders, were obtained for each participant from the Semi-Structured Assessment for Drug Dependence and Alcoholism, a detailed semi-structured interview. RESULTS: Support vector machine models performed marginally better on average than other machine learning methods with maximum prediction accuracies of 75.4% in AAs and 79.4% in EAs. Subsequent stepwise regression considered the 83 most highly ranked variables across all methods and models and identified less recent cocaine use (AAs: odds ratio (OR) = 1.82, P = 9.19 × 10-5; EAs: OR = 1.91, P = 3.30 × 10-15), shorter duration of opioid use (AAs: OR = 0.55, P = 5.78 × 10-6; EAs: OR = 0.69, P = 3.01 × 10-7), and older age (AAs: OR = 2.44, P = 1.41 × 10-12; EAs: OR = 2.00, P = 5.74 × 10-9) as the strongest independent predictors of opioid cessation in both AAs and EAs. Attending self-help groups for OUD was also an independent predictor (P < 0.05) in both population groups, while less gambling severity (OR = 0.80, P = 3.32 × 10-2) was specific to AAs and post-traumatic stress disorder recovery (OR = 1.93, P = 7.88 × 10-5), recent antisocial behaviors (OR = 0.64, P = 2.69 × 10-3), and atheism (OR = 1.45, P = 1.34 × 10-2) were specific to EAs. Factors related to drug use comprised about half of the significant independent predictors in both AAs and EAs, with other predictors related to non-drug use behaviors, psychiatric disorders, overall health, and demographics. CONCLUSIONS: These proof-of-concept findings provide avenues for hypothesis-driven analysis, and will lead to further research on strategies to improve OUD management in EAs and AAs.

14.

A new phylogenetic diversity measure generalizing the shannon index and its application to phyllostomid bats.

Allen, Benjamin; Kon, Mark; Bar-Yam, Yaneer.

Am Nat ; 174(2): 236-43, 2009 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-19548837

RESUMO

Protecting biodiversity involves preserving the maximum number and abundance of species while giving special attention to species with unique genetic or morphological characteristics. In balancing different priorities, conservation policymakers may consider quantitative measures that compare diversity across ecological communities. To serve this purpose, a measure should increase or decrease with changes in community composition in a way that reflects what is valued, including species richness, evenness, and distinctness. However, counterintuitively, studies have shown that established indices, including those that emphasize average interspecies phylogenetic distance, may increase with the elimination of species. We introduce a new diversity index, the phylogenetic entropy, which generalizes in a natural way the Shannon index to incorporate species relatedness. Phylogenetic entropy favors communities in which highly distinct species are more abundant, but it does not advocate decreasing any species proportion below a community structure-dependent threshold. We contrast the behavior of multiple indices on a community of phyllostomid bats in the Selva Lacandona. The optimal genus distribution for phylogenetic entropy populates all genera in a linear relationship to their total phylogenetic distance to other genera. Two other indices favor eliminating 12 out of the 23 genera.

Assuntos

Biodiversidade , Quirópteros/genética , Filogenia , Animais , Classificação/métodos , Conservação dos Recursos Naturais , México , Dinâmica Populacional

15.

Bioinformatics and biomedical informatics.

Najarian, Kayvan; Deriche, Rachid; Kon, Mark A; Hirata, Nina S T.

ScientificWorldJournal ; 2013: 591976, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23818827

Assuntos

Mapeamento Cromossômico/tendências , Biologia Computacional/métodos , Biologia Computacional/tendências , Perfilação da Expressão Gênica/tendências , Análise de Sequência com Séries de Oligonucleotídeos/tendências

16.

Machine Learning Reveals Missing Edges and Putative Interaction Mechanisms in Microbial Ecosystem Networks.

DiMucci, Demetrius; Kon, Mark; Segrè, Daniel.

mSystems ; 3(5)2018.

Artigo em Inglês | MEDLINE | ID: mdl-30417106

RESUMO

Microbes affect each other's growth in multiple, often elusive, ways. The ensuing interdependencies form complex networks, believed to reflect taxonomic composition as well as community-level functional properties and dynamics. The elucidation of these networks is often pursued by measuring pairwise interactions in coculture experiments. However, the combinatorial complexity precludes an exhaustive experimental analysis of pairwise interactions, even for moderately sized microbial communities. Here, we used a machine learning random forest approach to address this challenge. In particular, we show how partial knowledge of a microbial interaction network, combined with trait-level representations of individual microbial species, can provide accurate inference of missing edges in the network and putative mechanisms underlying the interactions. We applied our algorithm to three case studies: an experimentally mapped network of interactions between auxotrophic Escherichia coli strains, a community of soil microbes, and a large in silico network of metabolic interdependencies between 100 human gut-associated bacteria. For this last case, 5% of the network was sufficient to predict the remaining 95% with 80% accuracy, and the mechanistic hypotheses produced by the algorithm accurately reflected known metabolic exchanges. Our approach, broadly applicable to any microbial or other ecological network, may drive the discovery of new interactions and new molecular mechanisms, both for therapeutic interventions involving natural communities and for the rational design of synthetic consortia. IMPORTANCE Different organisms in a microbial community may drastically affect each other's growth phenotypes, significantly affecting the community dynamics, with important implications for human and environmental health. Novel culturing methods and the decreasing costs of sequencing will gradually enable high-throughput measurements of pairwise interactions in systematic coculturing studies. However, a thorough characterization of all interactions that occur within a microbial community is greatly limited both by the combinatorial complexity of possible assortments and by the limited biological insight that interaction measurements typically provide without laborious specific follow-ups. Here, we show how a simple and flexible formal representation of microbial pairs can be used for the classification of interactions via machine learning. The approach we propose predicts with high accuracy the outcome of yet-to-be performed experiments and generates testable hypotheses about the mechanisms of specific interactions.

17.

Integrating genomic data to predict transcription factor binding.

Holloway, Dustin T; Kon, Mark; DeLisi, Charles.

Genome Inform ; 16(1): 83-94, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16362910

RESUMO

Transcription factor binding sites (TFBS) in gene promoter regions are often predicted by using position specific scoring matrices (PSSMs), which summarize sequence patterns of experimentally determined TF binding sites. Although PSSMs are more reliable than simple consensus string matching in predicting a true binding site, they generally result in high numbers of false positive hits. This study attempts to reduce the number of false positive matches and generate new predictions by integrating various types of genomic data by two methods: a Bayesian allocation procedure, and support vector machine classification. Several methods will be explored to strengthen the prediction of a true TFBS in the Saccharomyces cerevisiae genome: binding site degeneracy, binding site conservation, phylogenetic profiling, TF binding site clustering, gene expression profiles, GO functional annotation, and k-mer counts in promoter regions. Binding site degeneracy (or redundancy) refers to the number of times a particular transcription factor's binding motif is discovered in the upstream region of a gene. Phylogenetic conservation takes into account the number of orthologous upstream regions in other genomes that contain a particular binding site. Phylogenetic profiling refers to the presence or absence of a gene across a large set of genomes. Binding site clusters are statistically significant clusters of TF binding sites detected by the algorithm ClusterBuster. Gene expression takes into account the idea that when the gene expression profiles of a transcription factor and a potential target gene are correlated, then it is more likely that the gene is a genuine target. Also, genes with highly correlated expression profiles are often regulated by the same TF(s). The GO annotation data takes advantage of the idea that common transcription targets often have related function. Finally, the distribution of the counts of all k-mers of length 4, 5, and 6 in gene's promoter region were examined as means to predict TF binding. In each case the data are compared to known true positives taken from ChIP-chip data, Transfac, and the Saccharomyces Genome Database. First, degeneracy, conservation, expression, and binding site clusters were examined independently and in combination via Bayesian allocation. Then, binding sites were predicted with a support vector machine (SVM) using all methods alone and in combination. The SVM works best when all genomic data are combined, but can also identify which methods contribute the most to accurate classification. On average, a support vector machine can classify binding sites with high sensitivity and an accuracy of almost 80%.

Assuntos

Genoma Fúngico , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Teorema de Bayes , Sítios de Ligação , Imunoprecipitação da Cromatina , Análise por Conglomerados , Biologia Computacional , Evolução Molecular , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Genes Fúngicos , Filogenia , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/genética

18.

A method for generating new datasets based on copy number for cancer analysis.

Kim, Shinuk; Kon, Mark; Kang, Hyunsik.

Biomed Res Int ; 2015: 467514, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25949998

RESUMO

New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer.

Assuntos

Variações do Número de Cópias de DNA/genética , Bases de Dados Genéticas , Glioblastoma/genética , Neoplasias Ovarianas/genética , Software , Algoritmos , Conjuntos de Dados como Assunto , Feminino , Genoma Humano , Humanos

19.

Quantification of regurgitant fraction in mitral regurgitation by cardiovascular magnetic resonance: comparison of techniques.

Kon, Mark W S; Myerson, Saul G; Moat, Neil E; Pennell, Dudley J.

J Heart Valve Dis ; 13(4): 600-7, 2004 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-15311866

RESUMO

BACKGROUND AND AIM OF THE STUDY: Cardiovascular magnetic resonance (CMR) assessment of mitral regurgitant volume from the subtraction of the right ventricular stroke volume (RVSV) from left ventricular stroke volume (LVSV) has commonly been performed using volumetric techniques. This is sensitive to errors in RVSV visualization and regurgitation of other heart valves, and therefore subtracting aortic flow volume from LVSV may be preferable. The study aim was to compare both techniques in a single CMR examination. METHODS: Twenty-eight patients with isolated mitral regurgitation underwent left ventricular (LV) and right ventricular (RV) volumetry and aortic flow volume measurements. Mitral regurgitant fraction (RF) was calculated as either RF(VOL) = [LVSV - RVSV] or RF(FLOW) = [LVSV - aortic flow volume], both expressed as a fraction of LVSV. The agreement of the measurements was assessed as a measure of robustness in clinical practice. RESULTS: There was good agreement between aortic and pulmonary flow (mean +/- SD difference -0.8 +/- 8.1 ml), and aortic flow volume and RVSV by volumetry (mean difference -2.6 +/- 11.8 ml). Intra- and interobserver variability (SD) of aortic flow volume (+/-6.6 ml and +/-5.3 ml) was superior to that of the RVSV (+/-8.5 ml and +/-12 ml). The intra- and inter-observer variability (SD) of RF(FLOW) was lower (+/-4.8% and +/-7.7%) than by RF(VOL) (+/-6.7% and +/-8.8%). CONCLUSION: The RF(FLOW) technique maximized intra- and inter-observer agreement, and is the optimal CMR technique to quantify mitral regurgitation. RF(FLOW) also has the advantage of allowing correction for aortic regurgitation when it is present, and is potentially independent of the effects of tricuspid and pulmonary regurgitation.

Assuntos

Imagem Cinética por Ressonância Magnética , Insuficiência da Valva Mitral/diagnóstico por imagem , Insuficiência da Valva Mitral/fisiopatologia , Adulto , Idoso , Insuficiência da Valva Aórtica/diagnóstico por imagem , Insuficiência da Valva Aórtica/epidemiologia , Insuficiência da Valva Aórtica/fisiopatologia , Velocidade do Fluxo Sanguíneo/fisiologia , Feminino , Ventrículos do Coração/diagnóstico por imagem , Ventrículos do Coração/fisiopatologia , Humanos , Masculino , Pessoa de Meia-Idade , Insuficiência da Valva Mitral/epidemiologia , Variações Dependentes do Observador , Insuficiência da Valva Pulmonar/diagnóstico por imagem , Insuficiência da Valva Pulmonar/epidemiologia , Insuficiência da Valva Pulmonar/fisiopatologia , Radiografia , Estatística como Assunto , Volume Sistólico/fisiologia

20.

Cancer survival classification using integrated data sets and intermediate information.

Kim, Shinuk; Park, Taesung; Kon, Mark.

Artif Intell Med ; 62(1): 23-31, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-24997860

RESUMO

OBJECTIVE: Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. METHODS: FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). RESULTS: In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. CONCLUSION: Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses.

Assuntos

Inteligência Artificial , Neoplasias Encefálicas/mortalidade , Glioblastoma/mortalidade , Neoplasias Ovarianas/mortalidade , Algoritmos , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Conjuntos de Dados como Assunto , Feminino , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , MicroRNAs/metabolismo , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/metabolismo , Modelos de Riscos Proporcionais , RNA Mensageiro/metabolismo , Sensibilidade e Especificidade , Taxa de Sobrevida

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA