Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
J Chem Inf Model ; 64(7): 2539-2553, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38185877

RESUMO

A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Proteínas , Desenvolvimento de Medicamentos , Descoberta de Drogas
2.
Anesth Analg ; 138(3): 645-654, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38364244

RESUMO

BACKGROUND: Transfusion of packed red blood cells (pRBCs) is still associated with risks. This study aims to determine whether renal function deterioration in the context of individual transfusions in individual patients can be predicted using machine learning. Recipient and donor characteristics linked to increased risk are identified. METHODS: This study was registered at ClinicalTrials.gov (NCT05466370) and was conducted after local ethics committee approval. We evaluated 3366 transfusion episodes from a university hospital between October 31, 2016, and August 31, 2020. Random forest models were tuned and trained via Python auto-sklearn package to predict acute kidney injury (AKI). The models included recipients' and donors' demographic parameters and laboratory values, donor questionnaire results, and the age of the pRBCs. Bootstrapping on the test dataset was used to calculate the means and standard deviations of various performance metrics. RESULTS: AKI as defined by a modified Kidney Disease Improving Global Outcomes (KDIGO) criterion developed after 17.4% transfusion episodes (base rate). AKI could be predicted with an area under the curve of the receiver operating characteristic (AUC-ROC) of 0.73 ± 0.02. The negative (NPV) and positive (PPV) predictive values were 0.90 ± 0.02 and 0.32 ± 0.03, respectively. Feature importance and relative risk analyses revealed that donor features were far less important than recipient features for predicting posttransfusion AKI. CONCLUSIONS: Surprisingly, only the recipients' characteristics played a decisive role in AKI prediction. Based on this result, we speculate that the selection of a specific pRBC may have less influence than recipient characteristics.


Assuntos
Injúria Renal Aguda , Rim , Humanos , Injúria Renal Aguda/diagnóstico , Injúria Renal Aguda/etiologia , Injúria Renal Aguda/terapia , Transfusão de Sangue , Estudos Retrospectivos , Medição de Risco/métodos , Curva ROC
3.
Water Resour Res ; 59(6): e2022WR033918, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38440056

RESUMO

Building accurate rainfall-runoff models is an integral part of hydrological science and practice. The variety of modeling goals and applications have led to a large suite of evaluation metrics for these models. Yet, hydrologists still put considerable trust into visual judgment, although it is unclear whether such judgment agrees or disagrees with existing quantitative metrics. In this study, we tasked 622 experts to compare and judge more than 14,000 pairs of hydrographs from 13 different models. Our results show that expert opinion broadly agrees with quantitative metrics and results in a clear preference for a Machine Learning model over traditional hydrological models. The expert opinions are, however, subject to significant amounts of inconsistency. Nevertheless, where experts agree, we can predict their opinion purely from quantitative metrics, which indicates that the metrics sufficiently encode human preferences in a small set of numbers. While there remains room for improvement of quantitative metrics, we suggest that the hydrologic community should reinforce their benchmarking efforts and put more trust in these metrics.

4.
J Chem Inf Model ; 62(9): 2111-2120, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35034452

RESUMO

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

5.
J Med Syst ; 46(5): 23, 2022 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-35348909

RESUMO

Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.


Assuntos
COVID-19 , COVID-19/diagnóstico , Teste para COVID-19 , Testes Hematológicos , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
6.
Mod Pathol ; 34(5): 895-903, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33184470

RESUMO

Recent advances in artificial intelligence, particularly in the field of deep learning, have enabled researchers to create compelling algorithms for medical image analysis. Histological slides of basal cell carcinomas (BCCs), the most frequent skin tumor, are accessed by pathologists on a daily basis and are therefore well suited for automated prescreening by neural networks for the identification of cancerous regions and swift tumor classification.In this proof-of-concept study, we implemented an accurate and intuitively interpretable artificial neural network (ANN) for the detection of BCCs in histological whole-slide images (WSIs). Furthermore, we identified and compared differences in the diagnostic histological features and recognition patterns relevant for machine learning algorithms vs. expert pathologists.An attention-ANN was trained with WSIs of BCCs to identify tumor regions (n = 820). The diagnosis-relevant regions used by the ANN were compared to regions of interest for pathologists, detected by eye-tracking techniques.This ANN accurately identified BCC tumor regions on images of histologic slides (area under the ROC curve: 0.993, 95% CI: 0.990-0.995; sensitivity: 0.965, 95% CI: 0.951-0.979; specificity: 0.910, 95% CI: 0.859-0.960). The ANN implicitly calculated a weight matrix, indicating the regions of a histological image that are important for the prediction of the network. Interestingly, compared to pathologists' eye-tracking results, machine learning algorithms rely on significantly different recognition patterns for tumor identification (p < 10-4).To conclude, we found on the example of BCC WSIs, that histopathological images can be efficiently and interpretably analyzed by state-of-the-art machine learning techniques. Neural networks and machine learning algorithms can potentially enhance diagnostic precision in digital pathology and uncover hitherto unused classification patterns.


Assuntos
Carcinoma Basocelular/patologia , Aprendizado de Máquina , Redes Neurais de Computação , Patologistas , Neoplasias Cutâneas/patologia , Pele/patologia , Algoritmos , Humanos
7.
Transfusion ; 60(9): 1977-1986, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32596877

RESUMO

BACKGROUND: The ability to predict transfusions arising during hospital admission might enable economized blood supply management and might furthermore increase patient safety by ensuring a sufficient stock of red blood cells (RBCs) for a specific patient. We therefore investigated the precision of four different machine learning-based prediction algorithms to predict transfusion, massive transfusion, and the number of transfusions in patients admitted to a hospital. STUDY DESIGN AND METHODS: This was a retrospective, observational study in three adult tertiary care hospitals in Western Australia between January 2008 and June 2017. Primary outcome measures for the classification tasks were the area under the curve for the receiver operating characteristics curve, the F1 score, and the average precision of the four machine learning algorithms used: neural networks (NNs), logistic regression (LR), random forests (RFs), and gradient boosting (GB) trees. RESULTS: Using our four predictive models, transfusion of at least 1 unit of RBCs could be predicted rather accurately (sensitivity for NN, LR, RF, and GB: 0.898, 0.894, 0.584, and 0.872, respectively; specificity: 0.958, 0.966, 0.964, 0.965). Using the four methods for prediction of massive transfusion was less successful (sensitivity for NN, LR, RF, and GB: 0.780, 0.721, 0.002, and 0.797, respectively; specificity: 0.994, 0.995, 0.993, 0.995). As a consequence, prediction of the total number of packed RBCs transfused was also rather inaccurate. CONCLUSION: This study demonstrates that the necessity for intrahospital transfusion can be forecasted reliably, however the amount of RBC units transfused during a hospital stay is more difficult to predict.


Assuntos
Tomada de Decisões Assistida por Computador , Hospitalização , Aprendizado de Máquina , Adulto , Transfusão de Sangue , Feminino , Humanos , Masculino , Valor Preditivo dos Testes , Estudos Retrospectivos , Austrália Ocidental
8.
Bioinformatics ; 34(9): 1538-1546, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29253077

RESUMO

Motivation: While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results: DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation: DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact: klambauer@bioinf.jku.at. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Biologia Computacional/métodos , Aprendizado Profundo , Perfilação da Expressão Gênica/métodos , Neoplasias/tratamento farmacológico , Software , Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética , Máquina de Vetores de Suporte
9.
J Chem Inf Model ; 59(3): 1163-1171, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30840449

RESUMO

Predicting the outcome of biological assays based on high-throughput imaging data is a highly promising task in drug discovery since it can tremendously increase hit rates and suggest novel chemical scaffolds. However, end-to-end learning with convolutional neural networks (CNNs) has not been assessed for the task biological assay prediction despite the success of these networks at visual recognition. We compared several CNNs trained directly on high-throughput imaging data to a) CNNs trained on cell-centric crops and to b) the current state-of-the-art: fully connected networks trained on precalculated morphological cell features. The comparison was performed on the Cell Painting data set, the largest publicly available data set of microscopic images of cells with approximately 30,000 compound treatments. We found that CNNs perform significantly better at predicting the outcome of assays than fully connected networks operating on precomputed morphological features of cells. Surprisingly, the best performing method could predict 32% of the 209 biological assays at high predictive performance (AUC > 0.9) indicating that the cell morphology changes contain a large amount of information about compound activities. Our results suggest that many biological assays could be replaced by high-throughput imaging together with convolutional neural networks and that the costly cell segmentation and feature extraction step can be replaced by convolutional neural networks.


Assuntos
Bioensaio , Biologia Computacional/métodos , Microscopia , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
10.
Drug Discov Today Technol ; 32-33: 55-63, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386095

RESUMO

There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.


Assuntos
Descoberta de Drogas , Modelos Moleculares , Humanos
11.
Bioinformatics ; 33(14): i59-i66, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881961

RESUMO

MOTIVATION: Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster. RESULTS: On 400 benchmark datasets and on three gene expression datasets with known clusters, RFN outperformed 13 other biclustering methods including FABIA. On data of the 1000 Genomes Project, RFN could identify DNA segments which indicate, that interbreeding with other hominins starting already before ancestors of modern humans left Africa. AVAILABILITY AND IMPLEMENTATION: https://github.com/bioinf-jku/librfn. CONTACT: djork-arne.clevert@bayer.com or hochreit@bioinf.jku.at.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina não Supervisionado , Evolução Molecular , Perfilação da Expressão Gênica/métodos , Genoma Humano , Genômica/métodos , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos
12.
J Chem Inf Model ; 58(9): 1736-1741, 2018 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-30118593

RESUMO

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. We propose an evaluation metric for generative models called Fréchet ChemNet distance (FCD). The advantage of the FCD over previous metrics is that it can detect whether generated molecules are diverse and have similar chemical and biological properties as real molecules.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Simulação por Computador , Bases de Dados Factuais , Modelos Moleculares , Software
13.
Hum Mutat ; 38(7): 889-897, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28449315

RESUMO

Targeted next-generation-sequencing (NGS) panels have largely replaced Sanger sequencing in clinical diagnostics. They allow for the detection of copy-number variations (CNVs) in addition to single-nucleotide variants and small insertions/deletions. However, existing computational CNV detection methods have shortcomings regarding accuracy, quality control (QC), incidental findings, and user-friendliness. We developed panelcn.MOPS, a novel pipeline for detecting CNVs in targeted NGS panel data. Using data from 180 samples, we compared panelcn.MOPS with five state-of-the-art methods. With panelcn.MOPS leading the field, most methods achieved comparably high accuracy. panelcn.MOPS reliably detected CNVs ranging in size from part of a region of interest (ROI), to whole genes, which may comprise all ROIs investigated in a given sample. The latter is enabled by analyzing reads from all ROIs of the panel, but presenting results exclusively for user-selected genes, thus avoiding incidental findings. Additionally, panelcn.MOPS offers QC criteria not only for samples, but also for individual ROIs within a sample, which increases the confidence in called CNVs. panelcn.MOPS is freely available both as R package and standalone software with graphical user interface that is easy to use for clinical geneticists without any programming experience. panelcn.MOPS combines high sensitivity and specificity with user-friendliness rendering it highly suitable for routine clinical diagnostics.


Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Algoritmos , Gráficos por Computador , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA , Software , Interface Usuário-Computador
14.
Bioinformatics ; 31(15): 2574-6, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25812745

RESUMO

KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections.


Assuntos
Antígeno HLA-A2/metabolismo , Modelos Teóricos , Fragmentos de Peptídeos/metabolismo , Análise de Sequência de Proteína/métodos , Software , Máquina de Vetores de Suporte , Algoritmos , Inteligência Artificial , Simulação por Computador , Humanos
15.
Bioinformatics ; 31(24): 3997-9, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26315911

RESUMO

UNLABELLED: Although the R platform and the add-on packages of the Bioconductor project are widely used in bioinformatics, the standard task of multiple sequence alignment has been neglected so far. The msa package, for the first time, provides a unified R interface to the popular multiple sequence alignment algorithms ClustalW, ClustalOmega and MUSCLE. The package requires no additional software and runs on all major platforms. Moreover, the msa package provides an R interface to the powerful package shade which allows for flexible and customizable plotting of multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: msa is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/msa.html. Further information and the R code of the example presented in this paper are available at http://www.bioinf.jku.at/software/msa/.


Assuntos
Alinhamento de Sequência/métodos , Software , Algoritmos , Animais
16.
Bioinformatics ; 31(20): 3392-4, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26088801

RESUMO

UNLABELLED: We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION: The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT: hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Software , Expressão Gênica/efeitos dos fármacos , Internet , Aprendizado de Máquina
17.
Nucleic Acids Res ; 41(22): e202, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24174545

RESUMO

Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority-152 000 IBD segments-are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genômica , Técnicas de Genotipagem , Humanos , Padrões de Herança
18.
Nucleic Acids Res ; 41(21): e198, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24049071

RESUMO

Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Animais , Projeto HapMap , Humanos , Fígado/metabolismo , Macaca mulatta , Pan troglodytes , Folhas de Planta/genética , Folhas de Planta/metabolismo , Zea mays/genética , Zea mays/metabolismo
19.
BMC Bioinformatics ; 15 Suppl 6: S4, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25078951

RESUMO

BACKGROUND: Cluster analysis is widely used to discover patterns in multi-dimensional data. Clustered heatmaps are the standard technique for visualizing one-way and two-way clustering results. In clustered heatmaps, rows and/or columns are reordered, resulting in a representation that shows the clusters as contiguous blocks. However, for biclustering results, where clusters can overlap, it is not possible to reorder the matrix in this way without duplicating rows and/or columns. RESULTS: We present Furby, an interactive visualization technique for analyzing biclustering results. Our contribution is twofold. First, the technique provides an overview of a biclustering result, showing the actual data that forms the individual clusters together with the information which rows and columns they share. Second, for fuzzy clustering results, the proposed technique additionally enables analysts to interactively set the thresholds that transform the fuzzy (soft) clustering into hard clusters that can then be investigated using heatmaps or bar charts. Changes in the membership value thresholds are immediately reflected in the visualization. We demonstrate the value of Furby by loading biclustering results applied to a multi-tissue dataset into the visualization. CONCLUSIONS: The proposed tool allows analysts to assess the overall quality of a biclustering result. Based on this high-level overview, analysts can then interactively explore the individual biclusters in detail. This novel way of handling fuzzy clustering results also supports analysts in finding the optimal thresholds that lead to the best clusters.


Assuntos
Análise por Conglomerados , Biologia Computacional/instrumentação , Algoritmos , Mineração de Dados , Internet , Análise de Sequência com Séries de Oligonucleotídeos
20.
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA