Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
Bioinformatics ; 40(Supplement_1): i501-i510, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940158

RESUMO

MOTIVATION: In many biomedical applications, we are confronted with paired groups of samples, such as treated versus control. The aim is to detect discriminating features, i.e. biomarkers, based on high-dimensional (omics-) data. This problem can be phrased more generally as a two-sample problem requiring statistical significance testing to establish differences, and interpretations to identify distinguishing features. The multivariate maximum mean discrepancy (MMD) test quantifies group-level differences, whereas statistically significantly associated features are usually found by univariate feature selection. Currently, few general-purpose methods simultaneously perform multivariate feature selection and two-sample testing. RESULTS: We introduce a sparse, interpretable, and optimized MMD test (SpInOpt-MMD) that enables two-sample testing and feature selection in the same experiment. SpInOpt-MMD is a versatile method and we demonstrate its application to a variety of synthetic and real-world data types including images, gene expression measurements, and text data. SpInOpt-MMD is effective in identifying relevant features in small sample sizes and outperforms other feature selection methods such as SHapley Additive exPlanations and univariate association analysis in several experiments. AVAILABILITY AND IMPLEMENTATION: The code and links to our public data are available at https://github.com/BorgwardtLab/spinoptmmd.


Assuntos
Biomarcadores , Humanos , Algoritmos , Biologia Computacional/métodos
2.
Bioinformatics ; 40(Supplement_1): i247-i256, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940165

RESUMO

MOTIVATION: Acute kidney injury (AKI) is a syndrome that affects a large fraction of all critically ill patients, and early diagnosis to receive adequate treatment is as imperative as it is challenging to make early. Consequently, machine learning approaches have been developed to predict AKI ahead of time. However, the prevalence of AKI is often underestimated in state-of-the-art approaches, as they rely on an AKI event annotation solely based on creatinine, ignoring urine output.We construct and evaluate early warning systems for AKI in a multi-disciplinary ICU setting, using the complete KDIGO definition of AKI. We propose several variants of gradient-boosted decision tree (GBDT)-based models, including a novel time-stacking based approach. A state-of-the-art LSTM-based model previously proposed for AKI prediction is used as a comparison, which was not specifically evaluated in ICU settings yet. RESULTS: We find that optimal performance is achieved by using GBDT with the time-based stacking technique (AUPRC = 65.7%, compared with the LSTM-based model's AUPRC = 62.6%), which is motivated by the high relevance of time since ICU admission for this task. Both models show mildly reduced performance in the limited training data setting, perform fairly across different subcohorts, and exhibit no issues in gender transfer.Following the official KDIGO definition substantially increases the number of annotated AKI events. In our study GBDTs outperform LSTM models for AKI prediction. Generally, we find that both model types are robust in a variety of challenging settings arising for ICU data. AVAILABILITY AND IMPLEMENTATION: The code to reproduce the findings of our manuscript can be found at: https://github.com/ratschlab/AKI-EWS.


Assuntos
Injúria Renal Aguda , Unidades de Terapia Intensiva , Humanos , Aprendizado de Máquina , Masculino , Feminino , Árvores de Decisões , Idoso , Pessoa de Meia-Idade
3.
Bioinformatics ; 39(39 Suppl 1): i523-i533, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387173

RESUMO

MOTIVATION: Complex phenotypes, such as many common diseases and morphological traits, are controlled by multiple genetic factors, namely genetic mutations and genes, and are influenced by environmental conditions. Deciphering the genetics underlying such traits requires a systemic approach, where many different genetic factors and their interactions are considered simultaneously. Many association mapping techniques available nowadays follow this reasoning, but have some severe limitations. In particular, they require binary encodings for the genetic markers, forcing the user to decide beforehand whether to use, e.g. a recessive or a dominant encoding. Moreover, most methods cannot include any biological prior or are limited to testing only lower-order interactions among genes for association with the phenotype, potentially missing a large number of marker combinations. RESULTS: We propose HOGImine, a novel algorithm that expands the class of discoverable genetic meta-markers by considering higher-order interactions of genes and by allowing multiple encodings for the genetic variants. Our experimental evaluation shows that the algorithm has a substantially higher statistical power compared to previous methods, allowing it to discover genetic mutations statistically associated with the phenotype at hand that could not be found before. Our method can exploit prior biological knowledge on gene interactions, such as protein-protein interaction networks, genetic pathways, and protein complexes, to restrict its search space. Since computing higher-order gene interactions poses a high computational burden, we also develop a more efficient search strategy and support computation to make our approach applicable in practice, leading to substantial runtime improvements compared to state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/BorgwardtLab/HOGImine.


Assuntos
Algoritmos , Mutação , Fenótipo , Mapas de Interação de Proteínas , Mapeamento de Interação de Proteínas , Estudo de Associação Genômica Ampla
4.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36610707

RESUMO

SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.


Assuntos
Bancos de Espécimes Biológicos , Software , Biologia Computacional , Documentação , Análise por Conglomerados
5.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37285313

RESUMO

MOTIVATION: While the search for associations between genetic markers and complex traits has led to the discovery of tens of thousands of trait-related genetic variants, the vast majority of these only explain a small fraction of the observed phenotypic variation. One possible strategy to overcome this while leveraging biological prior is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffer from a vast search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. RESULTS: To address the shortcomings of current approaches of network-based genome-wide association studies, we propose networkGWAS, a computationally efficient and statistically sound approach to network-based genome-wide association studies using mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated P-values, which are obtained through circular and degree-preserving network permutations. networkGWAS successfully detects known associations on diverse synthetic phenotypes, as well as known and novel genes in phenotypes from Saccharomycescerevisiae and Homo sapiens. It thereby enables the systematic combination of gene-based genome-wide association studies with biological network information. AVAILABILITY AND IMPLEMENTATION: https://github.com/BorgwardtLab/networkGWAS.git.


Assuntos
Estudo de Associação Genômica Ampla , Grupos Populacionais , Humanos , Marcadores Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único
6.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38001023

RESUMO

MOTIVATION: Large-scale clinical proteomics datasets of infectious pathogens, combined with antimicrobial resistance outcomes, have recently opened the door for machine learning models which aim to improve clinical treatment by predicting resistance early. However, existing prediction frameworks typically train a separate model for each antimicrobial and species in order to predict a pathogen's resistance outcome, resulting in missed opportunities for chemical knowledge transfer and generalizability. RESULTS: We demonstrate the effectiveness of multimodal learning over proteomic and chemical features by exploring two clinically relevant tasks for our proposed deep learning models: drug recommendation and generalized resistance prediction. By adopting this multi-view representation of the pathogenic samples and leveraging the scale of the available datasets, our models outperformed the previous single-drug and single-species predictive models by statistically significant margins. We extensively validated the multi-drug setting, highlighting the challenges in generalizing beyond the training data distribution, and quantitatively demonstrate how suitable representations of antimicrobial drugs constitute a crucial tool in the development of clinically relevant predictive models. AVAILABILITY AND IMPLEMENTATION: The code used to produce the results presented in this article is available at https://github.com/BorgwardtLab/MultimodalAMR.


Assuntos
Antibacterianos , Proteômica , Farmacorresistência Bacteriana , Aprendizado de Máquina
7.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37220903

RESUMO

MOTIVATION: Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed. RESULTS: We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data).


Assuntos
Aprendizado Profundo , Fenômica , Triticum/genética , Melhoramento Vegetal/métodos , Seleção Genética , Fenótipo , Genótipo , Genômica/métodos , Grão Comestível/genética
8.
Brief Bioinform ; 22(2): 1515-1530, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33169146

RESUMO

Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein-protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Descoberta de Drogas , Redes Reguladoras de Genes , Humanos
9.
Bioinformatics ; 38(Suppl 1): i101-i108, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758775

RESUMO

MOTIVATION: Sepsis is a leading cause of death and disability in children globally, accounting for ∼3 million childhood deaths per year. In pediatric sepsis patients, the multiple organ dysfunction syndrome (MODS) is considered a significant risk factor for adverse clinical outcomes characterized by high mortality and morbidity in the pediatric intensive care unit. The recent rapidly growing availability of electronic health records (EHRs) has allowed researchers to vastly develop data-driven approaches like machine learning in healthcare and achieved great successes. However, effective machine learning models which could make the accurate early prediction of the recovery in pediatric sepsis patients from MODS to a mild state and thus assist the clinicians in the decision-making process is still lacking. RESULTS: This study develops a machine learning-based approach to predict the recovery from MODS to zero or single organ dysfunction by 1 week in advance in the Swiss Pediatric Sepsis Study cohort of children with blood-culture confirmed bacteremia. Our model achieves internal validation performance on the SPSS cohort with an area under the receiver operating characteristic (AUROC) of 79.1% and area under the precision-recall curve (AUPRC) of 73.6%, and it was also externally validated on another pediatric sepsis patients cohort collected in the USA, yielding an AUROC of 76.4% and AUPRC of 72.4%. These results indicate that our model has the potential to be included into the EHRs system and contribute to patient assessment and triage in pediatric sepsis patient care. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/MODS-recovery. The data underlying this article is not publicly available for the privacy of individuals that participated in the study. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Insuficiência de Múltiplos Órgãos , Sepse , Criança , Estudos de Coortes , Humanos , Unidades de Terapia Intensiva Pediátrica , Insuficiência de Múltiplos Órgãos/diagnóstico , Insuficiência de Múltiplos Órgãos/etiologia , Curva ROC , Sepse/complicações , Sepse/diagnóstico
10.
Bioinformatics ; 37(1): 57-65, 2021 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-32573681

RESUMO

MOTIVATION: Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS: We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Heterogeneidade Genética , Mapas de Interação de Proteínas , Loci Gênicos , Humanos , Fenótipo , Software
11.
Nucleic Acids Res ; 48(D1): D1063-D1068, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31642487

RESUMO

Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.


Assuntos
Arabidopsis/genética , Biologia Computacional , Bases de Dados Genéticas , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Biologia Computacional/métodos , Técnicas de Inativação de Genes , Estudo de Associação Genômica Ampla/métodos , Genótipo , Mutação , Locos de Características Quantitativas , Característica Quantitativa Herdável , Análise de Sequência de RNA , Navegador
12.
Bioinformatics ; 36(Suppl_1): i508-i515, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657361

RESUMO

MOTIVATION: Gaining a comprehensive understanding of the genetics underlying cancer development and progression is a central goal of biomedical research. Its accomplishment promises key mechanistic, diagnostic and therapeutic insights. One major step in this direction is the identification of genes that drive the emergence of tumors upon mutation. Recent advances in the field of computational biology have shown the potential of combining genetic summary statistics that represent the mutational burden in genes with biological networks, such as protein-protein interaction networks, to identify cancer driver genes. Those approaches superimpose the summary statistics on the nodes in the network, followed by an unsupervised propagation of the node scores through the network. However, this unsupervised setting does not leverage any knowledge on well-established cancer genes, a potentially valuable resource to improve the identification of novel cancer drivers. RESULTS: We develop a novel node embedding that enables classification of cancer driver genes in a supervised setting. The embedding combines a representation of the mutation score distribution in a node's local neighborhood with network propagation. We leverage the knowledge of well-established cancer driver genes to define a positive class, resulting in a partially labeled dataset, and develop a cross-validation scheme to enable supervised prediction. The proposed node embedding followed by a supervised classification improves the predictive performance compared with baseline methods and yields a set of promising genes that constitute candidates for further biological validation. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/MoProEmbeddings. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Neoplasias , Humanos , Mutação , Neoplasias/genética , Oncogenes , Mapas de Interação de Proteínas
13.
Bioinformatics ; 36(Suppl_1): i30-i38, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657381

RESUMO

MOTIVATION: Microbial species identification based on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has become a standard tool in clinical microbiology. The resulting MALDI-TOF mass spectra also harbour the potential to deliver prediction results for other phenotypes, such as antibiotic resistance. However, the development of machine learning algorithms specifically tailored to MALDI-TOF MS-based phenotype prediction is still in its infancy. Moreover, current spectral pre-processing typically involves a parameter-heavy chain of operations without analyzing their influence on the prediction results. In addition, classification algorithms lack quantification of uncertainty, which is indispensable for predictions potentially influencing patient treatment. RESULTS: We present a novel prediction method for antimicrobial resistance based on MALDI-TOF mass spectra. First, we compare the complex conventional pre-processing to a new approach that exploits topological information and requires only a single parameter, namely the number of peaks of a spectrum to keep. Second, we introduce PIKE, the peak information kernel, a similarity measure specifically tailored to MALDI-TOF mass spectra which, combined with a Gaussian process classifier, provides well-calibrated uncertainty estimates about predictions. We demonstrate the utility of our approach by predicting antibiotic resistance of three clinically highly relevant bacterial species. Our method consistently outperforms competitor approaches, while demonstrating improved performance and security by rejecting out-of-distribution samples, such as bacterial species that are not represented in the training data. Ultimately, our method could contribute to an earlier and precise antimicrobial treatment in clinical patient care. AVAILABILITY AND IMPLEMENTATION: We make our code publicly available as an easy-to-use Python package under https://github.com/BorgwardtLab/maldi_PIKE.


Assuntos
Antibacterianos , Bactérias , Antibacterianos/farmacologia , Resistência Microbiana a Medicamentos , Humanos , Fenótipo , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
14.
Bioinformatics ; 36(Suppl_2): i840-i848, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381811

RESUMO

MOTIVATION: Temporal biomarker discovery in longitudinal data is based on detecting reoccurring trajectories, the so-called shapelets. The search for shapelets requires considering all subsequences in the data. While the accompanying issue of multiple testing has been mitigated in previous work, the redundancy and overlap of the detected shapelets results in an a priori unbounded number of highly similar and structurally meaningless shapelets. As a consequence, current temporal biomarker discovery methods are impractical and underpowered. RESULTS: We find that the pre- or post-processing of shapelets does not sufficiently increase the power and practical utility. Consequently, we present a novel method for temporal biomarker discovery: Statistically Significant Submodular Subset Shapelet Mining (S5M) that retrieves short subsequences that are (i) occurring in the data, (ii) are statistically significantly associated with the phenotype and (iii) are of manageable quantity while maximizing structural diversity. Structural diversity is achieved by pruning non-representative shapelets via submodular optimization. This increases the statistical power and utility of S5M compared to state-of-the-art approaches on simulated and real-world datasets. For patients admitted to the intensive care unit (ICU) showing signs of severe organ failure, we find temporal patterns in the sequential organ failure assessment score that are associated with in-ICU mortality. AVAILABILITY AND IMPLEMENTATION: S5M is an option in the python package of S3M: github.com/BorgwardtLab/S3M.


Assuntos
Pesquisa Biomédica , Biomarcadores , Humanos , Fenótipo , Projetos de Pesquisa
15.
Bioinformatics ; 35(15): 2680-2682, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30541062

RESUMO

SUMMARY: Combinatorial association mapping aims to assess the statistical association of higher-order interactions of genetic markers with a phenotype of interest. This article presents combinatorial association mapping (CASMAP), a software package that leverages recent advances in significant pattern mining to overcome the statistical and computational challenges that have hindered combinatorial association mapping. CASMAP can be used to perform region-based association studies and to detect higher-order epistatic interactions of genetic variants. Most importantly, unlike other existing significant pattern mining-based tools, CASMAP allows for the correction of categorical covariates such as age or gender, making it suitable for genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: The R and Python packages can be downloaded from our GitHub repository http://github.com/BorgwardtLab/CASMAP. The R package is also available on CRAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Fenótipo , Software
16.
Plant Cell ; 29(1): 5-19, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27986896

RESUMO

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.


Assuntos
Biologia Computacional/métodos , Genoma de Planta/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Flores/genética , Flores/crescimento & desenvolvimento , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador
17.
Nucleic Acids Res ; 46(D1): D1150-D1156, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29059333

RESUMO

The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10-4, of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
18.
J Am Soc Nephrol ; 30(11): 2262-2274, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31653784

RESUMO

BACKGROUND: Patients on organ transplant waiting lists are evaluated for preexisting alloimmunity to minimize episodes of acute and chronic rejection by regularly monitoring for changes in alloimmune status. There are few studies on how alloimmunity changes over time in patients on kidney allograft waiting lists, and an apparent lack of research-based evidence supporting currently used monitoring intervals. METHODS: To investigate the dynamics of alloimmune responses directed at HLA antigens, we retrospectively evaluated data on anti-HLA antibodies measured by the single-antigen bead assay from 627 waitlisted patients who subsequently received a kidney transplant at University Hospital Zurich, Switzerland, between 2008 and 2017. Our analysis focused on a filtered dataset comprising 467 patients who had at least two assay measurements. RESULTS: Within the filtered dataset, we analyzed potential changes in mean fluorescence intensity values (reflecting bound anti-HLA antibodies) between consecutive measurements for individual patients in relation to the time interval between measurements. Using multiple approaches, we found no correlation between these two factors. However, when we stratified the dataset on the basis of documented previous immunizing events (transplant, pregnancy, or transfusion), we found significant differences in the magnitude of change in alloimmune status, especially among patients with a previous transplant versus patients without such a history. Further efforts to cluster patients according to statistical properties related to alloimmune status kinetics were unsuccessful, indicating considerable complexity in individual variability. CONCLUSIONS: Alloimmune kinetics in patients on a kidney transplant waiting list do not appear to be related to the interval between measurements, but are instead associated with alloimmunization history. This suggests that an individualized strategy for alloimmune status monitoring may be preferable to currently used intervals.


Assuntos
Antígenos HLA/imunologia , Isoanticorpos/análise , Transplante de Rim , Listas de Espera , Feminino , Humanos , Cinética , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Fatores de Tempo
19.
Bioinformatics ; 34(17): i687-i696, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423082

RESUMO

Motivation: Methods based on summary statistics obtained from genome-wide association studies have gained considerable interest in genetics due to the computational cost and privacy advantages they present. Imputing missing summary statistics has therefore become a key procedure in many bioinformatics pipelines, but available solutions may rely on additional knowledge about the populations used in the original study and, as a result, may not always ensure feasibility or high accuracy of the imputation procedure. Results: We present ARDISS, a method to impute missing summary statistics in mixed-ethnicity cohorts through Gaussian Process Regression and automatic relevance determination. ARDISS is trained on an external reference panel and does not require information about allele frequencies of genotypes from the original study. Our method approximates the original GWAS population by a combination of samples from a reference panel relying exclusively on the summary statistics and without any external information. ARDISS successfully reconstructs the original composition of mixed-ethnicity cohorts and outperforms alternative solutions in terms of speed and imputation accuracy both for heterogeneous and homogeneous datasets. Availability and implementation: The proposed method is available at https://github.com/BorgwardtLab/ARDISS. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Etnicidade/genética , Estudos de Coortes , Frequência do Gene , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Software
20.
Bioinformatics ; 34(16): 2808-2816, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29528376

RESUMO

Motivation: Large-scale screenings of cancer cell lines with detailed molecular profiles against libraries of pharmacological compounds are currently being performed in order to gain a better understanding of the genetic component of drug response and to enhance our ability to recommend therapies given a patient's molecular profile. These comprehensive screens differ from the clinical setting in which (i) medical records only contain the response of a patient to very few drugs, (ii) drugs are recommended by doctors based on their expert judgment and (iii) selecting the most promising therapy is often more important than accurately predicting the sensitivity to all potential drugs. Current regression models for drug sensitivity prediction fail to account for these three properties. Results: We present a machine learning approach, named Kernelized Rank Learning (KRL), that ranks drugs based on their predicted effect per cell line (patient), circumventing the difficult problem of precisely predicting the sensitivity to the given drug. Our approach outperforms several state-of-the-art predictors in drug recommendation, particularly if the training dataset is sparse, and generalizes to patient data. Our work phrases personalized drug recommendation as a new type of machine learning problem with translational potential to the clinic. Availability and implementation: The Python implementation of KRL and scripts for running our experiments are available at https://github.com/BorgwardtLab/Kernelized-Rank-Learning. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Medicina de Precisão , Diretrizes para o Planejamento em Saúde , Humanos , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA