Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Bioinformatics ; 37(23): 4366-4374, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34247234

RESUMO

MOTIVATION: X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structures for some applications including rational drug design and molecular docking and motivates development of methods that accurately predict structure quality from sequence. RESULTS: We introduce XRRpred, the first predictor of the resolution and R-free values from protein sequences. XRRpred relies on original sequence profiles, hand-crafted features, empirically selected and parametrized regressors and modern resampling techniques. Using an independent test dataset, we show that XRRpred provides accurate predictions of resolution and R-free. We demonstrate that XRRpred's predictions correctly model relationship between the resolution and R-free and reproduce structure quality relations between structural classes of proteins. We also show that XRRpred significantly outperforms indirect alternative ways to predict the structure quality that include predictors of crystallization propensity and an alignment-based approach. XRRpred is available as a convenient webserver that allows batch predictions and offers informative visualization of the results. AVAILABILITY AND IMPLEMENTATION: http://biomine.cs.vcu.edu/servers/XRRPred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Simulação de Acoplamento Molecular , Proteínas/química , Sequência de Aminoácidos , Cristalografia por Raios X , Cristalização
2.
J Biomed Inform ; 109: 103527, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32777484

RESUMO

PURPOSE: To present a Machine Learning pipeline for automatically relabeling anatomical structure sets in the Digital Imaging and Communications in Medicine (DICOM) format to a standard nomenclature that will enable data abstraction for research and quality improvement. METHODS: DICOM structure sets from approximately 1200 lung and prostate cancer patients across 40 treatment centers were used to build predictive models to automate the relabeling of clinically specified structure labels to standardized labels as defined by the American Association of Physics in Medicine's (AAPM) Task Group 263 (TG-263). Volumetric bitmaps were created based on the delineated volumes and were combined with associated bony anatomy data to build feature vectors. Feature reduction was performed with singular value decomposition and the resulting vectors were used for predicting the label of each structure using five different classifier algorithms on the Apache Spark platform with 5-fold cross-validation. Undersampling methods were used to deal with underlying class imbalance that hindered the performance of classifiers. Experiments were performed on both a curated version of the data, which included only annotated structures, and the non-curated data that included all structures from the original treatment plans. RESULTS: Random Forest provided the highest accuracies with F1 scores of 98.77 for lung and 95.06 for prostate on the curated data sets. Scores were lower with 95.67 for lung and 90.22 for prostate on the non-curated data sets, highlighting some of the challenges of classifying real clinical data. Including bony anatomy data and pooling information from all structures for the same patient both increased accuracies. In some cases, undersampling with k-Means clustering for class balancing improved classifier accuracy but in all experiments it significantly reduced run time compared to random undersampling. CONCLUSION: This work shows that structure sets can be relabeled using our approach with accuracies over 95% for many structure types when presented with curated data. Although accuracies dropped when using the full non-curated data sets, some structure types were still correctly labeled over 90% of the time. With similar results obtained on an external test data set, we can infer that the proposed models are likely to work on other clinical data sets.


Assuntos
Algoritmos , Aprendizado de Máquina , Análise por Conglomerados , Humanos , Masculino
3.
IEEE Trans Neural Netw Learn Syst ; 34(9): 6390-6404, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35085094

RESUMO

Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. Modern advances in deep learning have further magnified the importance of the imbalanced data problem, especially when learning from images. Therefore, there is a need for an oversampling method that is specifically tailored to deep learning models, can work on raw images while preserving their properties, and is capable of generating high-quality, artificial images that can enhance minority classes and balance the training set. We propose Deep synthetic minority oversampling technique (SMOTE), a novel oversampling algorithm for deep learning models that leverages the properties of the successful SMOTE algorithm. It is simple, yet effective in its design. It consists of three major components: 1) an encoder/decoder framework; 2) SMOTE-based oversampling; and 3) a dedicated loss function that is enhanced with a penalty term. An important advantage of DeepSMOTE over generative adversarial network (GAN)-based oversampling is that DeepSMOTE does not require a discriminator, and it generates high-quality artificial images that are both information-rich and suitable for visual inspection. DeepSMOTE code is publicly available at https://github.com/dd1github/DeepSMOTE.

4.
Mach Learn ; : 1-36, 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35668720

RESUMO

Continuous learning from streaming data is among the most challenging topics in the contemporary machine learning. In this domain, learning algorithms must not only be able to handle massive volume of rapidly arriving data, but also adapt themselves to potential emerging changes. The phenomenon of evolving nature of data streams is known as concept drift. While there is a plethora of methods designed for detecting its occurrence, all of them assume that the drift is connected with underlying changes in the source of data. However, one must consider the possibility of a malicious injection of false data that simulates a concept drift. This adversarial setting assumes a poisoning attack that may be conducted in order to damage the underlying classification system by forcing an adaptation to false data. Existing drift detectors are not capable of differentiating between real and adversarial concept drift. In this paper, we propose a framework for robust concept drift detection in the presence of adversarial and poisoning attacks. We introduce the taxonomy for two types of adversarial concept drifts, as well as a robust trainable drift detector. It is based on the augmented restricted Boltzmann machine with improved gradient computation and energy function. We also introduce Relative Loss of Robustness-a novel measure for evaluating the performance of concept drift detectors under poisoning attacks. Extensive computational experiments, conducted on both fully and sparsely labeled data streams, prove the high robustness and efficacy of the proposed drift detection framework in adversarial scenarios.

5.
IEEE Trans Neural Netw Learn Syst ; 31(8): 2818-2831, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31247563

RESUMO

Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.

6.
Artif Intell Med ; 65(3): 219-27, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26319694

RESUMO

OBJECTIVES: Early recognition of breast cancer, the most commonly diagnosed form of cancer in women, is of crucial importance, given that it leads to significantly improved chances of survival. Medical thermography, which uses an infrared camera for thermal imaging, has been demonstrated as a particularly useful technique for early diagnosis, because it detects smaller tumors than the standard modality of mammography. METHODS AND MATERIAL: In this paper, we analyse breast thermograms by extracting features describing bilateral symmetries between the two breast areas, and present a classification system for decision making. Clearly, the costs associated with missing a cancer case are much higher than those for mislabelling a benign case. At the same time, datasets contain significantly fewer malignant cases than benign ones. Standard classification approaches fail to consider either of these aspects. In this paper, we introduce a hybrid cost-sensitive classifier ensemble to address this challenging problem. Our approach entails a pool of cost-sensitive decision trees which assign a higher misclassification cost to the malignant class, thereby boosting its recognition rate. A genetic algorithm is employed for simultaneous feature selection and classifier fusion. As an optimisation criterion, we use a combination of misclassification cost and diversity to achieve both a high sensitivity and a heterogeneous ensemble. Furthermore, we prune our ensemble by discarding classifiers that contribute minimally to the decision making. RESULTS: For a challenging dataset of about 150 thermograms, our approach achieves an excellent sensitivity of 83.10%, while maintaining a high specificity of 89.44%. This not only signifies improved recognition of malignant cases, it also statistically outperforms other state-of-the-art algorithms designed for imbalanced classification, and hence provides an effective approach for analysing breast thermograms. CONCLUSIONS: Our proposed hybrid cost-sensitive ensemble can facilitate a highly accurate early diagnostic of breast cancer based on thermogram features. It overcomes the difficulties posed by the imbalanced distribution of patients in the two analysed groups.


Assuntos
Neoplasias da Mama/diagnóstico , Árvores de Decisões , Diagnóstico por Computador/métodos , Termografia/economia , Termografia/métodos , Algoritmos , Análise Custo-Benefício , Reações Falso-Negativas , Reações Falso-Positivas , Feminino , Humanos , Sensibilidade e Especificidade
7.
Int J Neural Syst ; 24(3): 1430007, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24552506

RESUMO

Currently, methods of combined classification are the focus of intense research. A properly designed group of combined classifiers exploiting knowledge gathered in a pool of elementary classifiers can successfully outperform a single classifier. There are two essential issues to consider when creating combined classifiers: how to establish the most comprehensive pool and how to design a fusion model that allows for taking full advantage of the collected knowledge. In this work, we address the issues and propose an AdaSS+, training algorithm dedicated for the compound classifier system that effectively exploits local specialization of the elementary classifiers. An effective training procedure consists of two phases. The first phase detects the classifier competencies and adjusts the respective fusion parameters. The second phase boosts classification accuracy by elevating the degree of local specialization. The quality of the proposed algorithms are evaluated on the basis of a wide range of computer experiments that show that AdaSS+ can outperform the original method and several reference classifiers.


Assuntos
Inteligência Artificial , Modelos Teóricos , Reconhecimento Automatizado de Padrão , Algoritmos , Simulação por Computador , Humanos
8.
Health Informatics J ; 19(1): 3-15, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23486822

RESUMO

Pattern recognition and machine learning methods provide an attractive approach for building decision support systems. Classification trees are frequently used algorithms for such tasks owing to their intuitive structure and effectiveness. It has been shown that for complex medical data, combining a number of base classifiers improves their overall accuracy. Classification tree ensembles have a certain number of free parameters to set, which can significantly affect their performance. In recent years such ensembles were often used by practitioners without a mathematical background (e.g. physicians), who may be unaware of how to obtain the optimal settings. Therefore, it is difficult for them to choose the satisfactory properties, while in most of the cases the default parameters proposed for them are not necessarily the most efficient. The aim of this article is to ascertain which types of combined tree classifiers give the best performance for medical decision support and which parameters should be chosen for them. A set of rules for end-users on how to tune their ensembles is proposed.


Assuntos
Algoritmos , Benchmarking/métodos , Sistemas de Apoio a Decisões Clínicas/normas , Árvores de Decisões , Eficiência Organizacional , Inteligência Artificial , Humanos , Bases de Conhecimento , Reconhecimento Visual de Modelos , Interface Usuário-Computador
9.
Artigo em Inglês | MEDLINE | ID: mdl-24111386

RESUMO

Thermal infrared imaging has been shown to be useful for diagnosing breast cancer, since it is able to detect small tumors and hence can lead to earlier diagnosis. In this paper, we present a computer-aided diagnosis approach for analysing breast thermograms. We extract image features that describe bilateral differences of the breast regions in the thermogram, and then feed these features to an ensemble classifier. For the classification, we present an extension to the Under-Sampling Balanced Ensemble (USBE) algorithm. USBE addresses the problem of imbalanced class distribution that is common in medical decision making by training different classifiers on different subspaces, where each subspace is created so as to resemble a balanced classification problem. To combine the individual classifiers, we use a neural fuser based on discriminants and apply a classifier selection procedure based on a pairwise double-fault diversity measure to discard irrelevant and similar classifiers. We demonstrate that our approach works well, and that it statistically outperforms various other ensemble approaches including the original USBE algorithm.


Assuntos
Neoplasias da Mama/diagnóstico , Diagnóstico por Computador/métodos , Termografia , Algoritmos , Neoplasias da Mama/classificação , Bases de Dados Factuais , Análise Discriminante , Feminino , Humanos , Distribuição Normal , Sensibilidade e Especificidade , Máquina de Vetores de Suporte
10.
Artigo em Inglês | MEDLINE | ID: mdl-24110975

RESUMO

Various connective tissue diseases lead to morphological alternations of blood capillaries. Consequently, observation of the capillaries at the finger nailfold - nailfold capillaroscopy (NC) - is a standard method for diagnosing diseases such as scleroderma or Raynaud's phenomenon. This is typically performed through manual inspection by an expert to lead to a determination of one of the established NC scleroderma patterns (early, active, and late). In this paper, we present an automated method of analysing nailfold capillaroscopy images and categorising them into NC patterns. For this purpose, we extract a carefully chosen set of texture features from the images and employ an ensemble classification approach to arrive at decisions for each captured finger which are then aggregated to form a diagnosis for the patient. Experimental results on a set of 60 NC images from 16 subjects demonstrate the accuracy and usefulness of our presented approach.


Assuntos
Capilares/patologia , Doenças do Tecido Conjuntivo/patologia , Angioscopia Microscópica/métodos , Unhas/irrigação sanguínea , Reconhecimento Automatizado de Padrão/métodos , Estudos de Casos e Controles , Diagnóstico por Computador/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Doença de Raynaud/patologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA