Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Bioinformatics ; 37(2): 243-249, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-32722774

RESUMO

MOTIVATION: Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews. RESULTS: The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. AVAILABILITY AND IMPLEMENTATION: We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Preparações Farmacêuticas , Mineração de Dados , Humanos , Idioma , Federação Russa
2.
Phys Chem Chem Phys ; 24(42): 25853-25863, 2022 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-36279016

RESUMO

Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.

3.
Bioinformatics ; 36(10): 3215-3224, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32049317

RESUMO

MOTIVATION: Imaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development. RESULTS: We present ColocML, a machine learning approach addressing this gap. With the help of 42 imaging MS experts from nine laboratories, we created a gold standard of 2210 pairs of ion images ranked by their co-localization. We evaluated existing co-localization measures and developed novel measures using term frequency-inverse document frequency and deep neural networks. The semi-supervised deep learning Pi model and the cosine score applied after median thresholding performed the best (Spearman 0.797 and 0.794 with expert rankings, respectively). We illustrate these measures by inferring co-localization properties of 10 273 molecules from 3685 public METASPACE datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/metaspace2020/coloc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Espectrometria de Massas , Software , Aprendizado de Máquina Supervisionado
4.
Mol Syst Biol ; 16(10): e9474, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33022142

RESUMO

The advent of single-cell methods is paving the way for an in-depth understanding of the cell cycle with unprecedented detail. Due to its ramifications in nearly all biological processes, the evaluation of cell cycle progression is critical for an exhaustive cellular characterization. Here, we present DeepCycle, a deep learning method for estimating a cell cycle trajectory from unsegmented single-cell microscopy images, relying exclusively on the brightfield and nuclei-specific fluorescent signals. DeepCycle was evaluated on 2.6 million single-cell microscopy images of MDCKII cells with the fluorescent FUCCI2 system. DeepCycle provided a latent representation of cell images revealing a continuous and closed trajectory of the cell cycle. Further, we validated the DeepCycle trajectories by showing its nearly perfect correlation with real time measured from live-cell imaging of cells undergoing an entire cell cycle. This is the first model able to resolve the closed cell cycle trajectory, including cell division, solely based on unsegmented microscopy data from adherent cell cultures.


Assuntos
Ciclo Celular , Processamento de Imagem Assistida por Computador/métodos , Análise de Célula Única/métodos , Imagem com Lapso de Tempo/métodos , Animais , Linhagem Celular , Cães , Microscopia de Fluorescência , Redes Neurais de Computação
5.
Nat Methods ; 14(1): 57-60, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27842059

RESUMO

High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM). We introduce a metabolite-signal match score and a target-decoy FDR estimate for spatial metabolomics.


Assuntos
Encéfalo/metabolismo , Biologia Computacional/métodos , Espectrometria de Massas/métodos , Metaboloma , Metabolômica/métodos , Imagem Molecular/métodos , Software , Animais , Encéfalo/citologia , Cromatografia Líquida , Reações Falso-Positivas , Feminino , Camundongos , Camundongos Endogâmicos C57BL
6.
Mol Pharm ; 15(10): 4378-4385, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29473756

RESUMO

Convolutional neural networks (CNN) have been successfully used to handle three-dimensional data and are a natural match for data with spatial structure such as 3D molecular structures. However, a direct 3D representation of a molecule with atoms localized at voxels is too sparse, which leads to poor performance of the CNNs. In this work, we present a novel approach where atoms are extended to fill other nearby voxels with a transformation based on the wave transform. Experimenting on 4.5 million molecules from the Zinc database, we show that our proposed representation leads to better performance of CNN-based autoencoders than either the voxel-based representation or the previously used Gaussian blur of atoms and then successfully apply the new representation to classification tasks such as MACCS fingerprint prediction.


Assuntos
Redes Neurais de Computação , Animais , Bases de Dados Factuais , Humanos
7.
J Biomed Inform ; 84: 93-102, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29906585

RESUMO

Text mining of scientific libraries and social media has already proven itself as a reliable tool for drug repurposing and hypothesis generation. The task of mapping a disease mention to a concept in a controlled vocabulary, typically to the standard thesaurus in the Unified Medical Language System (UMLS), is known as medical concept normalization. This task is challenging due to the differences in the use of medical terminology between health care professionals and social media texts coming from the lay public. To bridge this gap, we use sequence learning with recurrent neural networks and semantic representation of one- or multi-word expressions: we develop end-to-end architectures directly tailored to the task, including bidirectional Long Short-Term Memory, Gated Recurrent Units with an attention mechanism, and additional semantic similarity features based on UMLS. Our evaluation against a standard benchmark shows that recurrent neural networks improve results over an effective baseline for classification based on convolutional neural networks. A qualitative examination of mentions discovered in a dataset of user reviews collected from popular online health information platforms as well as a quantitative evaluation both show improvements in the semantic representation of health-related expressions in social media.


Assuntos
Mineração de Dados/métodos , Informática Médica/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Mídias Sociais , Unified Medical Language System , Linguística , Preparações Farmacêuticas , Probabilidade , Semântica , Rede Social
8.
Mol Pharm ; 14(9): 3098-3104, 2017 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-28703000

RESUMO

Deep generative adversarial networks (GANs) are the emerging technology in drug discovery and biomarker development. In our recent work, we demonstrated a proof-of-concept of implementing deep generative adversarial autoencoder (AAE) to identify new molecular fingerprints with predefined anticancer properties. Another popular generative model is the variational autoencoder (VAE), which is based on deep neural architectures. In this work, we developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of (a) adjustability in generating molecular fingerprints; (b) capacity of processing very large molecular data sets; and (c) efficiency in unsupervised pretraining for regression model. Our results suggest that the proposed AAE model significantly enhances the capacity and efficiency of development of the new molecules with specific anticancer properties using the deep generative models.


Assuntos
Modelos Teóricos , Inteligência Artificial , Simulação por Computador , Formação de Conceito , Aprendizagem , Redes Neurais de Computação
9.
Appl Environ Microbiol ; 81(24): 8265-76, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26386059

RESUMO

Hadal ecosystems are found at a depth of 6,000 m below sea level and below, occupying less than 1% of the total area of the ocean. The microbial communities and metabolic potential in these ecosystems are largely uncharacterized. Here, we present four single amplified genomes (SAGs) obtained from 8,219 m below the sea surface within the hadal ecosystem of the Puerto Rico Trench (PRT). These SAGs are derived from members of deep-sea clades, including the Thaumarchaeota and SAR11 clade, and two are related to previously isolated piezophilic (high-pressure-adapted) microorganisms. In order to identify genes that might play a role in adaptation to deep-sea environments, comparative analyses were performed with genomes from closely related shallow-water microbes. The archaeal SAG possesses genes associated with mixotrophy, including lipoylation and the glycine cleavage pathway. The SAR11 SAG encodes glycolytic enzymes previously reported to be missing from this abundant and cosmopolitan group. The other SAGs, which are related to piezophilic isolates, possess genes that may supplement energy demands through the oxidation of hydrogen or the reduction of nitrous oxide. We found evidence for potential trench-specific gene distributions, as several SAG genes were observed only in a PRT metagenome and not in shallower deep-sea metagenomes. These results illustrate new ecotype features that might perform important roles in the adaptation of microorganisms to life in hadal environments.


Assuntos
Archaea/classificação , Archaea/genética , Genoma Arqueal/genética , Metagenoma/genética , Água do Mar/microbiologia , Aclimatação , Archaea/isolamento & purificação , Sequência de Bases , DNA Arqueal/genética , Ecossistema , Metabolismo Energético/fisiologia , Ácidos Graxos/metabolismo , Lipídeos/biossíntese , Dados de Sequência Molecular , Oceanos e Mares , Porto Rico , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Enxofre/metabolismo , Microbiologia da Água
10.
BMC Genomics ; 14 Suppl 1: S7, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368723

RESUMO

Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYESHAMMER. While BAYESHAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYESHAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.


Assuntos
Algoritmos , Análise de Sequência de DNA , Teorema de Bayes , Análise por Conglomerados , Escherichia coli/genética , Análise de Célula Única
11.
Anal Chem ; 85(23): 11189-95, 2013 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-24180335

RESUMO

Imaging mass spectrometry (imaging MS) has emerged in the past decade as a label-free, spatially resolved, and multipurpose bioanalytical technique for direct analysis of biological samples from animal tissue, plant tissue, biofilms, and polymer films. Imaging MS has been successfully incorporated into many biomedical pipelines where it is usually applied in the so-called untargeted mode-capturing spatial localization of a multitude of ions from a wide mass range.3 An imaging MS data set usually comprises thousands of spectra and tens to hundreds of thousands of mass-to-charge (m/z) images and can be as large as several gigabytes. Unsupervised analysis of an imaging MS data set aims at finding hidden structures in the data with no a priori information used and is often exploited as the first step of imaging MS data analysis. We propose a novel, easy-to-use and easy-to-implement approach to answer one of the key questions of unsupervised analysis of imaging MS data: what do all m/z images look like? The key idea of the approach is to cluster all m/z images according to their spatial similarity so that each cluster contains spatially similar m/z images. We propose a visualization of both spatial and spectral information obtained using clustering that provides an easy way to understand what all m/z images look like. We evaluated the proposed approach on matrix-assisted laser desorption ionization imaging MS data sets of a rat brain coronal section and human larynx carcinoma and discussed several scenarios of data analysis.


Assuntos
Encéfalo/anatomia & histologia , Neoplasias Laríngeas/diagnóstico , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Estatística como Assunto/métodos , Animais , Encéfalo/patologia , Análise por Conglomerados , Bases de Dados Factuais , Humanos , Espectrometria de Massas/métodos , Ratos
12.
PeerJ Comput Sci ; 8: e865, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35494794

RESUMO

Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced to make self-supervised depth prediction more precise. However, most existing approaches still focus on single-frame depth estimation, even in the self-supervised setting. Since most methods can operate with frame sequences, we believe that the quality of current models can be significantly improved with the help of information about previous frames. In this work, we study different ways of integrating recurrent blocks and attention mechanisms into a common self-supervised depth estimation pipeline. We propose a set of modifications that utilize temporal information from previous frames and provide new neural network architectures for monocular depth estimation in a self-supervised manner. Our experiments on the KITTI dataset show that proposed modifications can be an effective tool for exploiting temporal information in a depth prediction pipeline.

13.
Front Big Data ; 5: 931206, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35993029

RESUMO

Human personality traits are key drivers behind our decision making, influencing our lives on a daily basis. Inference of personality traits, such as the Myers-Briggs personality type, as well as an understanding of dependencies between personality traits and user behavior on various social media platforms, is of crucial importance to modern research and industry applications such as recommender systems. The emergence of diverse and cross-purpose social media avenues makes it possible to perform user personality profiling automatically and efficiently based on data represented across multiple data modalities. However, research efforts on personality profiling from multi-source multi-modal social media data are relatively sparse; the impact of different social network data on profiling performance and of personality traits on applications such as recommender systems is yet to be evaluated. Furthermore, large-scale datasets are also lacking in the research community. To fill these gaps, in this work we develop a novel multi-view fusion framework PERS that infers Myers-Briggs personality type indicators. We evaluate the results not just across data modalities but also across different social networks, and also evaluate the impact of inferred personality traits on recommender systems. Our experimental results demonstrate that PERS is able to learn from multi-view data for personality profiling by efficiently leveraging highly varied data from diverse social multimedia sources. Furthermore, we demonstrate that inferred personality traits can be beneficial to other industry applications. Among other results, we show that people tend to reveal multiple facets of their personality in different social media avenues. We also release a social multimedia dataset in order to facilitate further research on this direction.

14.
Front Chem ; 9: 800133, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35004615

RESUMO

We present a computational workflow based on quantum chemical calculations and generative models based on deep neural networks for the discovery of novel materials. We apply the developed workflow to search for molecules suitable for the fusion of triplet-triplet excitations (triplet-triplet fusion, TTF) in blue OLED devices. By applying generative machine learning models, we have been able to pinpoint the most promising regions of the chemical space for further exploration. Another neural network based on graph convolutions was trained to predict excitation energies; with this network, we estimate the alignment of energy levels and filter molecules before running time-consuming quantum chemical calculations. We present a comprehensive computational evaluation of several generative models, choosing a modification of the Junction Tree VAE (JT-VAE) as the best one in this application. The proposed approach can be useful for computer-aided design of materials with energy level alignment favorable for efficient energy transfer, triplet harvesting, and exciton fusion processes, which are crucial for the development of the next generation OLED materials.

16.
Front Pharmacol ; 11: 269, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32362822

RESUMO

Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug molecules that could induce a desired change in gene expression. Our model-the Bidirectional Adversarial Autoencoder-explicitly separates cellular processes captured in gene expression changes into two feature sets: those related and unrelated to the drug incubation. The model uses related features to produce a drug hypothesis. We have validated our model on the LINCS L1000 dataset by generating molecular structures in the SMILES format for the desired transcriptional response. In the experiments, we have shown that the proposed model can generate novel molecular structures that could induce a given gene expression change or predict a gene expression difference after incubation of a given molecular structure. The code of the model is available at https://github.com/insilicomedicine/BiAAE.

17.
Front Pharmacol ; 11: 565644, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33390943

RESUMO

Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.

18.
Dentomaxillofac Radiol ; 48(4): 20180051, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-30835551

RESUMO

OBJECTIVES: Analysis of dental radiographs is an important part of the diagnostic process in daily clinical practice. Interpretation by an expert includes teeth detection and numbering. In this project, a novel solution based on convolutional neural networks (CNNs) is proposed that performs this task automatically for panoramic radiographs. METHODS: A data set of 1352 randomly chosen panoramic radiographs of adults was used to train the system. The CNN-based architectures for both teeth detection and numbering tasks were analyzed. The teeth detection module processes the radiograph to define the boundaries of each tooth. It is based on the state-of-the-art Faster R-CNN architecture. The teeth numbering module classifies detected teeth images according to the FDI notation. It utilizes the classical VGG-16 CNN together with the heuristic algorithm to improve results according to the rules for spatial arrangement of teeth. A separate testing set of 222 images was used to evaluate the performance of the system and to compare it to the expert level. RESULTS: For the teeth detection task, the system achieves the following performance metrics: a sensitivity of 0.9941 and a precision of 0.9945. For teeth numbering, its sensitivity is 0.9800 and specificity is 0.9994. Experts detect teeth with a sensitivity of 0.9980 and a precision of 0.9998. Their sensitivity for tooth numbering is 0.9893 and specificity is 0.9997. The detailed error analysis showed that the developed software system makes errors caused by similar factors as those for experts. CONCLUSIONS: The performance of the proposed computer-aided diagnosis solution is comparable to the level of experts. Based on these findings, the method has the potential for practical application and further evaluation for automated dental radiograph analysis. Computer-aided teeth detection and numbering simplifies the process of filling out digital dental charts. Automation could help to save time and improve the completeness of electronic dental records.


Assuntos
Redes Neurais de Computação , Radiografia Panorâmica , Dente , Adulto , Algoritmos , Diagnóstico por Computador , Humanos , Dente/diagnóstico por imagem
19.
J Healthc Eng ; 2017: 9451342, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29177027

RESUMO

Adverse drug reactions (ADRs) are an essential part of the analysis of drug use, measuring drug use benefits, and making policy decisions. Traditional channels for identifying ADRs are reliable but very slow and only produce a small amount of data. Text reviews, either on specialized web sites or in general-purpose social networks, may lead to a data source of unprecedented size, but identifying ADRs in free-form text is a challenging natural language processing problem. In this work, we propose a novel model for this problem, uniting recurrent neural architectures and conditional random fields. We evaluate our model with a comprehensive experimental study, showing improvements over state-of-the-art methods of ADR extraction.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Armazenamento e Recuperação da Informação/métodos , Redes Neurais de Computação , Adulto , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Farmacovigilância , Adulto Jovem
20.
J Comput Biol ; 19(5): 455-77, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22506599

RESUMO

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.


Assuntos
Algoritmos , Bactérias/genética , Genoma Bacteriano , Metagenômica/métodos , Análise de Célula Única/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa