Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Brief Bioinform ; 22(1): 474-484, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885044

RESUMO

BACKGROUND: With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS: We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION: BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Gerenciamento de Dados/métodos , Bases de Dados de Compostos Químicos , Bases de Dados Genéticas , Humanos
2.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33313673

RESUMO

Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.


Assuntos
Modelos Biológicos , Redes Neurais de Computação , Máquina de Vetores de Suporte , Animais , Cyprinidae , Daphnia , Tetrahymena pyriformis
3.
J Clin Pharm Ther ; 45(2): 318-323, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-31721244

RESUMO

WHAT IS KNOWN AND OBJECTIVE: Personalized treatment with tacrolimus has remained a challenge. The present study aimed to evaluate the potential of an integrative approach to predict individual tacrolimus concentrations and dosages based on endogenous CYP3A4 phenotype, CYP3A5 genotype and clinical variables. METHODS: A random forest (RF) algorithm which incorporated an endogenous CYP3A4 phenotype (assessed by urinary ratio of 6ß-hydroxycortisol and 6ß-hydroxycortisone to cortisol and cortisone), CYP3A5*3 genotype and other clinical determinants of tacrolimus disposition was performed in 182 medically stable renal transplant recipients. RESULTS AND DISCUSSION: The results suggested that endogenous CYP3A4 phenotype was the most important determinant of tacrolimus concentrations and dose requirements. RF models provided high goodness of fit (R2 ) with .92 and .95 for the prediction of tacrolimus trough concentrations and dosages, respectively, as well as high predictability (Q2 ) with 0.63 and 0.70, respectively. Significant correlations existed between experimental and predictive data. WHAT IS NEW AND CONCLUSION: In summary, endogenous CYP3A4 phenotype is a critical biomarker for the determination of tacrolimus disposition. This predictive RF approach based on CYP3A4 biomarker with the combination of CYP3A5*3 genotype and other clinical variables can be used for predicting tacrolimus concentrations and dosages, which may serve as a useful tool in individualized tacrolimus dosing.


Assuntos
Citocromo P-450 CYP3A/genética , Imunossupressores/administração & dosagem , Transplante de Rim , Tacrolimo/administração & dosagem , Adulto , Algoritmos , Povo Asiático , Biomarcadores/metabolismo , Feminino , Genótipo , Humanos , Imunossupressores/farmacocinética , Masculino , Pessoa de Meia-Idade , Fenótipo , Tacrolimo/farmacocinética
4.
Kidney Blood Press Res ; 42(6): 1045-1052, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29197864

RESUMO

BACKGROUND/AIMS: Renal biopsy is the gold standard to determine the pathologic type of primary nephrotic syndrome, which is critical for diagnosis, choice of treatment and evaluation of prognosis. However, in some cases, renal biopsy cannot be performed. METHODS: To explore the possibility of predicting the histology type of primary nephrotic syndrome without the need for biopsy, we trained and validated a machine learning algorithm using data from 222 patients with biopsy-confirmed primary nephrotic syndrome treated at our hospital between May 2008 and January 2016. The model was then tested prospectively on another sample of 63 patients with biopsy-confirmed primary nephrotic syndrome. RESULTS: Overall accuracy of prediction from the retrospective set of 222 patients was 62.2% across all types of nephrotic syndrome. The accuracy of model prediction for the prospectively collected dataset of 63 patients was 61.9%. The algorithm identified 17 of 33 variables as contributing strongly to type of renal pathology. CONCLUSION: To our knowledge, this is the first such application of machine learning to predict the pathologic type of primary nephrotic syndrome, which may be clinically useful by itself as well as helpful for guiding future efforts at machine learning-based prediction in other disease contexts.


Assuntos
Aprendizado de Máquina , Síndrome Nefrótica/diagnóstico , Adulto , Algoritmos , Biópsia , Feminino , Humanos , Masculino , Síndrome Nefrótica/patologia , Valor Preditivo dos Testes , Prognóstico
5.
Bioinformatics ; 31(11): 1857-9, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25619996

RESUMO

UNLABELLED: Amino acid sequence-derived structural and physiochemical descriptors are extensively utilized for the research of structural, functional, expression and interaction profiles of proteins and peptides. We developed protr, a comprehensive R package for generating various numerical representation schemes of proteins and peptides from amino acid sequence. The package calculates eight descriptor groups composed of 22 types of commonly used descriptors that include about 22 700 descriptor values. It allows users to select amino acid properties from the AAindex database, and use self-defined properties to construct customized descriptors. For proteochemometric modeling, it calculates six types of scales-based descriptors derived by various dimensionality reduction methods. The protr package also integrates the functionality of similarity score computation derived by protein sequence alignment and Gene Ontology semantic similarity measures within a list of proteins, and calculates profile-based protein features based on position-specific scoring matrix. We also developed ProtrWeb, a user-friendly web server for calculating descriptors presented in the protr package. AVAILABILITY AND IMPLEMENTATION: The protr package is freely available from CRAN: http://cran.r-project.org/package=protr, ProtrWeb, is freely available at http://protrweb.scbdd.com/.


Assuntos
Peptídeos/química , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Aminoácidos/química , Internet , Matrizes de Pontuação de Posição Específica , Conformação Proteica , Alinhamento de Sequência
6.
J Chem Inf Model ; 56(4): 763-73, 2016 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-27018227

RESUMO

The Caco-2 cell monolayer model is a popular surrogate in predicting the in vitro human intestinal permeability of a drug due to its morphological and functional similarity with human enterocytes. A quantitative structure-property relationship (QSPR) study was carried out to predict Caco-2 cell permeability of a large data set consisting of 1272 compounds. Four different methods including multivariate linear regression (MLR), partial least-squares (PLS), support vector machine (SVM) regression and Boosting were employed to build prediction models with 30 molecular descriptors selected by nondominated sorting genetic algorithm-II (NSGA-II). The best Boosting model was obtained finally with R(2) = 0.97, RMSEF = 0.12, Q(2) = 0.83, RMSECV = 0.31 for the training set and RT(2) = 0.81, RMSET = 0.31 for the test set. A series of validation methods were used to assess the robustness and predictive ability of our model according to the OECD principles and then define its applicability domain. Compared with the reported QSAR/QSPR models about Caco-2 cell permeability, our model exhibits certain advantage in database size and prediction accuracy to some extent. Finally, we found that the polar volume, the hydrogen bond donor, the surface area and some other descriptors can influence the Caco-2 permeability to some extent. These results suggest that the proposed model is a good tool for predicting the permeability of drug candidates and to perform virtual screening in the early stage of drug development.


Assuntos
Absorção Fisico-Química , Descoberta de Drogas/métodos , Modelos Moleculares , Disponibilidade Biológica , Células CACO-2 , Humanos , Conformação Molecular , Permeabilidade , Relação Quantitativa Estrutura-Atividade
7.
J Comput Aided Mol Des ; 30(5): 413-24, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27167132

RESUMO

Drug-target interactions (DTIs) are central to current drug discovery processes and public health fields. Analyzing the DTI profiling of the drugs helps to infer drug indications, adverse drug reactions, drug-drug interactions, and drug mode of actions. Therefore, it is of high importance to reliably and fast predict DTI profiling of the drugs on a genome-scale level. Here, we develop the TargetNet server, which can make real-time DTI predictions based only on molecular structures, following the spirit of multi-target SAR methodology. Naïve Bayes models together with various molecular fingerprints were employed to construct prediction models. Ensemble learning from these fingerprints was also provided to improve the prediction ability. When the user submits a molecule, the server will predict the activity of the user's molecule across 623 human proteins by the established high quality SAR model, thus generating a DTI profiling that can be used as a feature vector of chemicals for wide applications. The 623 SAR models related to 623 human proteins were strictly evaluated and validated by several model validation strategies, resulting in the AUC scores of 75-100 %. We applied the generated DTI profiling to successfully predict potential targets, toxicity classification, drug-drug interactions, and drug mode of action, which sufficiently demonstrated the wide application value of the potential DTI profiling. The TargetNet webserver is designed based on the Django framework in Python, and is freely accessible at http://targetnet.scbdd.com .


Assuntos
Descoberta de Drogas , Preparações Farmacêuticas/química , Ligação Proteica , Proteínas/química , Algoritmos , Teorema de Bayes , Interações Medicamentosas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Internet , Modelos Teóricos , Preparações Farmacêuticas/metabolismo , Proteínas/metabolismo , Software
8.
IEEE Trans Vis Comput Graph ; 30(1): 295-305, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37878445

RESUMO

Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.

9.
IEEE Trans Vis Comput Graph ; 30(1): 1369-1379, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37878449

RESUMO

The transfer function is crucial for direct volume rendering (DVR) to create an informative visual representation of volumetric data. However, manually adjusting the transfer function to achieve the desired DVR result can be time-consuming and unintuitive. In this paper, we propose Differentiable Design Galleries, an image-based transfer function design approach to help users explore the design space of transfer functions by taking advantage of the recent advances in deep learning and differentiable rendering. Specifically, we leverage neural rendering to learn a latent design space, which is a continuous manifold representing various types of implicit transfer functions. We further provide a set of interactive tools to support intuitive query, navigation, and modification to obtain the target design, which is represented as a neural-rendered design exemplar. The explicit transfer function can be reconstructed from the target design with a differentiable direct volume renderer. Experimental results on real volumetric data demonstrate the effectiveness of our method.

10.
IEEE Trans Vis Comput Graph ; 30(1): 142-152, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37871057

RESUMO

The visualization of streaming high-dimensional data often needs to consider the speed in dimensionality reduction algorithms, the quality of visualized data patterns, and the stability of view graphs that usually change over time with new data. Existing methods of streaming high-dimensional data visualization primarily line up essential modules in a serial manner and often face challenges in satisfying all these design considerations. In this research, we propose a novel parallel framework for streaming high-dimensional data visualization to achieve high data processing speed, high quality in data patterns, and good stability in visual presentations. This framework arranges all essential modules in parallel to mitigate the delays caused by module waiting in serial setups. In addition, to facilitate the parallel pipeline, we redesign these modules with a parametric non-linear embedding method for new data embedding, an incremental learning method for online embedding function updating, and a hybrid strategy for optimized embedding updating. We also improve the coordination mechanism among these modules. Our experiments show that our method has advantages in embedding speed, quality, and stability over other existing methods to visualize streaming high-dimensional data.

11.
IEEE Trans Vis Comput Graph ; 30(1): 573-583, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37878443

RESUMO

Quantum computing is a rapidly evolving field that enables exponential speed-up over classical algorithms. At the heart of this revolutionary technology are quantum circuits, which serve as vital tools for implementing, analyzing, and optimizing quantum algorithms. Recent advancements in quantum computing and the increasing capability of quantum devices have led to the development of more complex quantum circuits. However, traditional quantum circuit diagrams suffer from scalability and readability issues, which limit the efficiency of analysis and optimization processes. In this research, we propose a novel visualization approach for large-scale quantum circuits by adopting semantic analysis to facilitate the comprehension of quantum circuits. We first exploit meta-data and semantic information extracted from the underlying code of quantum circuits to create component segmentations and pattern abstractions, allowing for easier wrangling of massive circuit diagrams. We then develop Quantivine, an interactive system for exploring and understanding quantum circuits. A series of novel circuit visualizations is designed to uncover contextual details such as qubit provenance, parallelism, and entanglement. The effectiveness of Quantivine is demonstrated through two usage scenarios of quantum circuits with up to 100 qubits and a formal user evaluation with quantum experts. A free copy of this paper and all supplemental materials are available at https://osf.io/2m9yh/?view_only=0aa1618c97244f5093cd7ce15f1431f9.

12.
IEEE Trans Vis Comput Graph ; 27(2): 1666-1676, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33275582

RESUMO

Efficient layout of large-scale graphs remains a challenging problem: the force-directed and dimensionality reduction-based methods suffer from high overhead for graph distance and gradient computation. In this paper, we present a new graph layout algorithm, called DRGraph, that enhances the nonlinear dimensionality reduction process with three schemes: approximating graph distances by means of a sparse distance matrix, estimating the gradient by using the negative sampling technique, and accelerating the optimization process through a multi-level layout scheme. DRGraph achieves a linear complexity for the computation and memory consumption, and scales up to large-scale graphs with millions of nodes. Experimental results and comparisons with state-of-the-art graph layout methods demonstrate that DRGraph can generate visually comparable layouts with a faster running time and a lower memory requirement.

13.
IEEE Trans Vis Comput Graph ; 27(2): 1655-1665, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33104510

RESUMO

We design and evaluate a novel layout fine-tuning technique for node-link diagrams that facilitates exemplar-based adjustment of a group of substructures in batching mode. The key idea is to transfer user modifications on a local substructure to other substructures in the entire graph that are topologically similar to the exemplar. We first precompute a canonical representation for each substructure with node embedding techniques and then use it for on-the-fly substructure retrieval. We design and develop a light-weight interactive system to enable intuitive adjustment, modification transfer, and visual graph exploration. We also report some results of quantitative comparisons, three case studies, and a within-participant user study.

14.
IEEE Trans Vis Comput Graph ; 26(1): 1256-1266, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31443013

RESUMO

Visual querying is essential for interactively exploring massive trajectory data. However, the data uncertainty imposes profound challenges to fulfill advanced analytics requirements. On the one hand, many underlying data does not contain accurate geographic coordinates, e.g., positions of a mobile phone only refer to the regions (i.e., mobile cell stations) in which it resides, instead of accurate GPS coordinates. On the other hand, domain experts and general users prefer a natural way, such as using a natural language sentence, to access and analyze massive movement data. In this paper, we propose a visual analytics approach that can extract spatial-temporal constraints from a textual sentence and support an effective query method over uncertain mobile trajectory data. It is built up on encoding massive, spatially uncertain trajectories by the semantic information of the POls and regions covered by them, and then storing the trajectory documents in text database with an effective indexing scheme. The visual interface facilitates query condition specification, situation-aware visualization, and semantic exploration of large trajectory data. Usage scenarios on real-world human mobility datasets demonstrate the effectiveness of our approach.


Assuntos
Gráficos por Computador , Movimento/fisiologia , Processamento de Linguagem Natural , Algoritmos , Bases de Dados Factuais , Humanos , Semântica , Incerteza , Interface Usuário-Computador
15.
J BUON ; 24(2): 585-590, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31128010

RESUMO

PURPOSE: To investigate the correlations of coagulation indexes and inflammatory changes with the prognosis of lung cancer (LC) patients complicated with thromboembolic (TE) disease. METHODS: A total of 84 LC patients complicated with TE disease admitted to hospital from January 2010 to January 2016 were enrolled in this study and their clinical data were retrospectively analyzed. A 2-year post-treatment follow-up was carried out. According to the prognosis, patients were divided into 2 groups as dead group (n=25) and alive group (n=59). The coagulation indexes and inflammatory factor levels before low-molecular weight heparin (LMWH) treatment and on the 1st, 3rd, and 7th day after treatment were compared between the two groups. Their relations with the prognosis of patients were analyzed using Pearson method. RESULTS: No statistically significant difference was found in the prothrombin time (PT), levels of Fibrinogen (FIB), D-Dimer (D-D), Interleukin-6 (IL-6) and Procalcitonin (PCT), and activated partial thromboplastin time (APTT) before treatment between the two groups (p>0.05). The PT and levels of FIB, D-D, IL-6, and PCT on the 1st, 3rd, and 7th day after treatment were significantly increased in the dead group compared to those in the alive group, while the APTT was remarkably shortened. Moreover, the PT was gradually prolonged and FIB, D-D, IL-6 and PCT levels were increased in the dead group , but the APTT was gradually shortened over time (p<0.05). The poor prognosis of LC patients complicated with TE disease was positively correlated with PT, FIB, D-D, IL-6 and PCT, but negatively correlated with APTT (p<0.05). CONCLUSION: The poor prognosis of LC complicated with TE disease has positive correlations with PT, FIB, D-D, IL-6 and PCT, and a negative association with APTT, providing a certain reference as a prognostic value in the diagnosis and treatment.


Assuntos
Inflamação/epidemiologia , Neoplasias Pulmonares/epidemiologia , Prognóstico , Tromboembolia/epidemiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Testes de Coagulação Sanguínea , Feminino , Humanos , Inflamação/complicações , Inflamação/patologia , Interleucina-6/metabolismo , Neoplasias Pulmonares/complicações , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Masculino , Pessoa de Meia-Idade , Contagem de Plaquetas/métodos , Tempo de Protrombina , Tromboembolia/complicações , Tromboembolia/metabolismo , Tromboembolia/patologia
16.
Acta Biochim Pol ; 55(2): 241-9, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18560604

RESUMO

In this study, we report the cloning and characteristics of an adiponectin-like receptor gene from Bombyx mori (BmAdipoR) with highly conserved deduced amino-acid sequences and similar structure to the human adiponectin receptor (AdipoR). Structural analysis of the translated cDNA suggested it encoded a membrane protein with seven transmembrane domains. BmAdipoR was found to be expressed in multiple tissues and highly expressed in Malpighian tubules, fat body and testis. BmNPV (Bombyx mori nucleopolyhedrovirus) bacmid system combined with confocal microscopy revealed that BmAdipoR was targeted to the cell membrane. We also found that infection with BmNPV did not have an effect on BmAdipoR mRNA quantity in the midgut of susceptible Bombyx mori strain (306) at 48 h, but BmAdipoR mRNA quantity increased significantly at 72 h. We concluded that BmAdipoR gene was a membrane protein ubiquitously expressed in Bombyx mori tissues and that its expression was altered by treating with BmNPV.


Assuntos
Bombyx/genética , Genes de Insetos , Proteínas de Insetos/genética , Receptores de Adiponectina/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Bombyx/metabolismo , Linhagem Celular , Clonagem Molecular , Primers do DNA/genética , DNA Complementar/genética , Feminino , Humanos , Proteínas de Insetos/química , Proteínas de Insetos/metabolismo , Masculino , Túbulos de Malpighi/metabolismo , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores de Adiponectina/química , Receptores de Adiponectina/metabolismo , Proteínas Recombinantes de Fusão/genética , Proteínas Recombinantes de Fusão/metabolismo , Homologia de Sequência de Aminoácidos , Especificidade da Espécie , Distribuição Tecidual
17.
IEEE Trans Vis Comput Graph ; 24(9): 2636-2648, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-28976317

RESUMO

Urban data is massive, heterogeneous, and spatio-temporal, posing a substantial challenge for visualization and analysis. In this paper, we design and implement a novel visual analytics approach, Visual Analyzer for Urban Data (VAUD), that supports the visualization, querying, and exploration of urban data. Our approach allows for cross-domain correlation from multiple data sources by leveraging spatial-temporal and social inter-connectedness features. Through our approach, the analyst is able to select, filter, aggregate across multiple data sources and extract information that would be hidden to a single data subset. To illustrate the effectiveness of our approach, we provide case studies on a real urban dataset that contains the cyber-, physical-, and social- information of 14 million citizens over 22 days.

18.
Sci Rep ; 8(1): 10231, 2018 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-29980727

RESUMO

Effective treatment of lupus nephritis and assessment of patient prognosis depend on accurate pathological classification and careful use of acute and chronic pathological indices. Renal biopsy can provide most reliable predicting power. However, clinicians still need auxiliary tools under certain circumstances. Comprehensive statistical analysis of clinical indices may be an effective support and supplementation for biopsy. In this study, 173 patients with lupus nephritis were classified based on histology and scored on acute and chronic indices. These results were compared against machine learning predictions involving multilinear regression and random forest analysis. For three class random forest analysis, total classification accuracy was 51.3% (class II 53.7%, class III&IV 56.2%, class V 40.1%). For two class random forest analysis, class II accuracy reached 56.2%; class III&IV 63.7%; class V 61%. Additionally, machine learning selected out corresponding important variables for each class prediction. Multiple linear regression predicted the index of chronic pathology (CI) (Q2 = 0.746, R2 = 0.771) and the acute index (AI) (Q2 = 0.516, R2 = 0.576), and each variable's importance was calculated in AI and CI models. Evaluation of lupus nephritis by machine learning showed potential for assessment of lupus nephritis.


Assuntos
Nefrite Lúpica/classificação , Nefrite Lúpica/patologia , Aprendizado de Máquina , Modelos Estatísticos , Adulto , Biópsia , Feminino , Humanos , Nefrite Lúpica/cirurgia , Masculino , Prognóstico , Proteinúria/epidemiologia , Estudos Retrospectivos
19.
J Cheminform ; 9(1): 27, 2017 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-29086046

RESUMO

BACKGROUND: In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, scikit-learn package for model building, and ggplot2 package for statistical analysis and visualization, etc.). In addition, it may require strong programming skills to accomplish these jobs, which poses severe challenges for users without advanced training in computer programming. Therefore, an online pipelining platform that integrates a number of selected tools is a valuable and efficient solution that can meet the needs of related researchers. RESULTS: This work presents a web-based pipelining platform, called ChemSAR, for generating SAR classification models of small molecules. The capabilities of ChemSAR include the validation and standardization of chemical structure representation, the computation of 783 1D/2D molecular descriptors and ten types of widely-used fingerprints for small molecules, the filtering methods for feature selection, the generation of predictive models via a step-by-step job submission process, model interpretation in terms of feature importance and tree visualization, as well as a helpful report generation system. The results can be visualized as high-quality plots and downloaded as local files. CONCLUSION: ChemSAR provides an integrated web-based platform for generating SAR classification models that will benefit cheminformatics and other biomedical users. It is freely available at: http://chemsar.scbdd.com . Graphical abstract .

20.
J Cheminform ; 8: 34, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27330567

RESUMO

BACKGROUND: More and more evidences from network biology indicate that most cellular components exert their functions through interactions with other cellular components, such as proteins, DNAs, RNAs and small molecules. The rapidly increasing amount of publicly available data in biology and chemistry enables researchers to revisit interaction problems by systematic integration and analysis of heterogeneous data. Currently, some tools have been developed to represent these components. However, they have some limitations and only focus on the analysis of either small molecules or proteins or DNAs/RNAs. To the best of our knowledge, there is still a lack of freely-available, easy-to-use and integrated platforms for generating molecular descriptors of DNAs/RNAs, proteins, small molecules and their interactions. RESULTS: Herein, we developed a comprehensive molecular representation platform, called BioTriangle, to emphasize the integration of cheminformatics and bioinformatics into a molecular informatics platform for computational biology study. It contains a feature-rich toolkit used for the characterization of various biological molecules and complex interaction samples including chemicals, proteins, DNAs/RNAs and even their interactions. By using BioTriangle, users are able to start a full pipelining from getting molecular data, molecular representation to constructing machine learning models conveniently. CONCLUSION: BioTriangle provides a user-friendly interface to calculate various features of biological molecules and complex interaction samples conveniently. The computing tasks can be submitted and performed simply in a browser without any sophisticated installation and configuration process. BioTriangle is freely available at http://biotriangle.scbdd.com.Graphical abstractAn overview of BioTriangle. A platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA