Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
J Chem Inf Model ; 60(12): 5995-6006, 2020 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-33140954

RESUMO

Semi-supervised learning has proved its efficacy in utilizing extensive unlabeled data to alleviate the use of a large amount of supervised data and improve model performance. Despite its tremendous potential, semi-supervised learning has yet to be implemented in the field of drug discovery. Empirical testing of drugs and their classification is costly and time-consuming. In contrast, predicting therapeutic applications of drugs from their structural formulas using semi-supervised learning would reduce costs and time significantly. Herein, we employ a new multicontrastive-based semi-supervised learning algorithm-MultiCon-for classifying drugs into 12 categories, according to therapeutic applications, on the basis of image analyses of their structural formulas. By rational use of data balancing, online augmentations of the drug image data during training, and the combined use of multicontrastive loss with consistency regularization, MultiCon achieves better class prediction accuracies when compared with the state-of-the-art machine learning methods across a variety of existing semi-supervised learning benchmarks. In particular, it performs exceptionally well with a limited number of labeled examples. For instance, with just 5000 labeled drugs in a PubChem (D3) data set, MultiCon achieved a class prediction accuracy of 97.74%.


Assuntos
Preparações Farmacêuticas , Aprendizado de Máquina Supervisionado , Algoritmos
2.
Phys Eng Sci Med ; 45(1): 31-42, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34780042

RESUMO

COVID-19 is an infectious disease, which has adversely affected public health and the economy across the world. On account of the highly infectious nature of the disease, rapid automated diagnosis of COVID-19 is urgently needed. A few recent findings suggest that chest X-rays and CT scans can be used by machine learning for the diagnosis of COVID-19. Herein, we employed semi-supervised learning (SSL) approaches to detect COVID-19 cases accurately by analyzing digital chest X-rays and CT scans. On a relatively small COVID-19 radiography dataset, which contains only 219 COVID-19 positive images, 1341 normal and 1345 viral pneumonia images, our algorithm, COVIDCon, which takes advantage of data augmentation, consistency regularization, and multicontrastive learning, attains 97.07% average class prediction accuracy, with 1000 labeled images, which is 7.65% better than the next best SSL method, virtual adversarial training. COVIDCon performs even better on a larger COVID-19 CT Scan dataset that contains 82,767 images. It achieved an excellent accuracy of 99.13%, at 20,000 labels, which is 6.45% better than the next best pseudo-labeling approach. COVIDCon outperforms other state-of-the-art algorithms at every label that we have investigated. These results demonstrate COVIDCon as the benchmark SSL algorithm for potential diagnosis of COVID-19 from chest X-rays and CT-Scans. Furthermore, COVIDCon performs exceptionally well in identifying COVID-19 positive cases from a completely unseen repository with a confirmed COVID-19 case history. COVIDCon, may provide a fast, accurate, and reliable method for screening COVID-19 patients.


Assuntos
COVID-19 , Aprendizado Profundo , COVID-19/diagnóstico por imagem , Humanos , SARS-CoV-2 , Aprendizado de Máquina Supervisionado , Tomografia Computadorizada por Raios X/métodos , Raios X
3.
Commun Med (Lond) ; 2: 134, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36317054

RESUMO

Background: The intensity of transmission of Aedes-borne viruses is heterogeneous, and multiple factors can contribute to variation at small spatial scales. Illuminating drivers of heterogeneity in prevalence over time and space would provide information for public health authorities. The objective of this study is to detect the spatiotemporal clusters and determine the risk factors of three major Aedes-borne diseases, Chikungunya virus (CHIKV), Dengue virus (DENV), and Zika virus (ZIKV) clusters in Mexico. Methods: We present an integrated analysis of Aedes-borne diseases (ABDs), the local climate, and the socio-demographic profiles of 2469 municipalities in Mexico. We used SaTScan to detect spatial clusters and utilize the Pearson correlation coefficient, Randomized Dependence Coefficient, and SHapley Additive exPlanations to analyze the influence of socio-demographic and climatic factors on the prevalence of ABDs. We also compare six machine learning techniques, including XGBoost, decision tree, Support Vector Machine with Radial Basis Function kernel, K nearest neighbors, random forest, and neural network to predict risk factors of ABDs clusters. Results: DENV is the most prevalent of the three diseases throughout Mexico, with nearly 60.6% of the municipalities reported having DENV cases. For some spatiotemporal clusters, the influence of socio-economic attributes is larger than the influence of climate attributes for predicting the prevalence of ABDs. XGBoost performs the best in terms of precision-measure for ABDs prevalence. Conclusions: Both socio-demographic and climatic factors influence ABDs transmission in different regions of Mexico. Future studies should build predictive models supporting early warning systems to anticipate the time and location of ABDs outbreaks and determine the stand-alone influence of individual risk factors and establish causal mechanisms.

4.
Travel Med Infect Dis ; 49: 102360, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35644475

RESUMO

Surveillance is a critical component of any dengue prevention and control program. There is an increasing effort to use drones in mosquito control surveillance. Due to the novelty of drones, data are scarce on the impact and acceptance of their use in the communities to collect health-related data. The use of drones raises concerns about the protection of human privacy. Here, we show how willingness to be trained and acceptance of drone use in tech-savvy communities can help further discussions in mosquito surveillance. A cross-sectional study was conducted in Malaysia, Mexico, and Turkey to assess knowledge of diseases caused by Aedes mosquitoes, perceptions about drone use for data collection, and acceptance of drones for Aedes mosquito surveillance around homes. Compared with people living in Turkey, Mexicans had 14.3 (p < 0.0001) times higher odds and Malaysians had 4.0 (p = 0.7030) times the odds of being willing to download a mosquito surveillance app. Compared to urban dwellers, rural dwellers had 1.56 times the odds of being willing to be trained. There is widespread community support for drone use in mosquito surveillance and this community buy-in suggests a potential for success in mosquito surveillance using drones. A successful surveillance and community engagement system may be used to monitor a variety of mosquito spp. Future research should include qualitative interview data to add context to these findings.


Assuntos
Aedes , Dengue , Animais , Estudos Transversais , Dengue/epidemiologia , Dengue/prevenção & controle , Humanos , Malásia , México , Turquia , Dispositivos Aéreos não Tripulados
5.
BMC Bioinformatics ; 8: 299, 2007 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-17697349

RESUMO

BACKGROUND: Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT), which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory. RESULTS: Application of random matrix theory to microarray data of S. oneidensis, E. coli, yeast, A. thaliana, Drosophila, mouse and human indicates that there is a sharp transition of nearest neighbour spacing distribution (NNSD) of correlation matrix after gradually removing certain elements insider the matrix. Testing on an in silico modular model has demonstrated that this transition can be used to determine the correlation threshold for revealing modular co-expression networks. The co-expression network derived from yeast cell cycling microarray data is supported by gene annotation. The topological properties of the resulting co-expression network agree well with the general properties of biological networks. Computational evaluations have showed that RMT approach is sensitive and robust. Furthermore, evaluation on sampled expression data of an in silico modular gene system has showed that under-sampled expressions do not affect the recovery of gene co-expression network. Moreover, the cellular roles of 215 functionally unknown genes from yeast, E. coli and S. oneidensis are predicted by the gene co-expression networks using guilt-by-association principle, many of which are supported by existing information or our experimental verification, further demonstrating the reliability of this approach for gene function prediction. CONCLUSION: Our rigorous analysis of gene expression microarray profiles using RMT has showed that the transition of NNSD of correlation matrix of microarray profile provides a profound theoretical criterion to determine the correlation threshold for identifying gene co-expression networks.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Família Multigênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador
6.
Bioinformatics ; 20(16): 2605-17, 2004 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-15130935

RESUMO

MOTIVATION: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. RESULT: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. AVAILABILITY: DGSOT is available upon request from the authors.


Assuntos
Algoritmos , Inteligência Artificial , Proteínas de Ciclo Celular/genética , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA