Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(8): 1444-1453, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39122953

RESUMO

Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.


Assuntos
Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , Algoritmos
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38446741

RESUMO

Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.


Assuntos
Aprendizado de Máquina
3.
Nucleic Acids Res ; 52(W1): W481-W488, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38783119

RESUMO

In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.


Assuntos
Reposicionamento de Medicamentos , Software , Reposicionamento de Medicamentos/métodos , Humanos , Internet , Descoberta de Drogas/métodos , Biologia de Sistemas/métodos , Biologia Computacional/métodos
4.
Nucleic Acids Res ; 2024 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-39175109

RESUMO

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.

5.
Platelets ; 35(1): 2358244, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38845541

RESUMO

Thromboembolic events are common in patients with essential thrombocythemia (ET). However, the pathophysiological mechanisms underlying the increased thrombotic risk remain to be determined. Here, we perform the first phenotypical characterization of platelet expression using single-cell mass cytometry in six ET patients and six age- and sex-matched healthy individuals. A large panel of 18 transmembrane regulators of platelet function and activation were analyzed, at baseline and after ex-vivo stimulation with thrombin receptor-activating peptide (TRAP). We detected a significant overexpression of the activation marker CD62P (p-Selectin) (p = .049) and the collagen receptor GPVI (p = .044) in non-stimulated ET platelets. In contrast, ET platelets had a lower expression of the integrin subunits of the fibrinogen receptor GPIIb/IIIa CD41 (p = .036) and CD61 (p = .044) and of the von Willebrand factor receptor CD42b (p = .044). Using the FlowSOM algorithm, we identified 2 subclusters of ET platelets with a prothrombotic expression profile, one of them (cluster 3) significantly overrepresented in ET (22.13% of the total platelets in ET, 2.94% in controls, p = .035). Platelet counts were significantly increased in ET compared to controls (p = .0123). In ET, MPV inversely correlated with platelet count (r=-0.96). These data highlight the prothrombotic phenotype of ET and postulate GPVI as a potential target to prevent thrombosis in these patients.


Essential thrombocythemia (ET) is a rare disease characterized by an increased number of platelets in the blood. As a complication, many of these patients develop a blood clot, which can be life-threatening. So far, the reason behind the higher risk of blood clots is unclear. In this study, we analyzed platelet surface markers that play a critical role in platelet function and platelet activation using a modern technology called mass cytometry. For this purpose, blood samples from 6 patients with ET and 6 healthy control individuals were analyzed. We found significant differences between ET platelets and healthy platelets. ET platelets had higher expression levels of p-Selectin (CD62P), a key marker of platelet activation, and of the collagen receptor GPVI, which is important for clot formation. These results may be driven by a specific platelet subcluster overrepresented in ET. Other surface markers, such as the fibrinogen receptor GPIIb/IIIa CD41, CD61, and the von Willebrand factor receptor CD42b, were lower expressed in ET platelets. When ET platelets were treated with the clotting factor thrombin (thrombin receptor-activating peptide, TRAP), we found a differential response in platelet activation compared to healthy platelets. In conclusion, our results show an increased activation and clotting potential of ET platelets. The platelet surface protein GPVI may be a potential drug target to prevent abnormal blood clotting in ET patients.


Assuntos
Plaquetas , Trombocitemia Essencial , Trombose , Humanos , Trombocitemia Essencial/metabolismo , Trombocitemia Essencial/complicações , Plaquetas/metabolismo , Masculino , Feminino , Trombose/metabolismo , Trombose/etiologia , Pessoa de Meia-Idade , Idoso , Citometria de Fluxo/métodos , Ativação Plaquetária , Estudos de Casos e Controles , Adulto
6.
Bioinform Adv ; 4(1): vbae032, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38464974

RESUMO

Summary: Transcriptome deconvolution has emerged as a reliable technique to estimate cell-type abundances from bulk RNA sequencing data. Unlike their human equivalents, methods to quantify the cellular composition of complex tissues from murine transcriptomics are sparse and sometimes not easy to use. We extended the immunedeconv R package to facilitate the deconvolution of mouse transcriptomics, enabling the quantification of murine immune-cell types using 13 different methods. Through immunedeconv, we further offer the possibility of tweaking cell signatures used by deconvolution methods, providing custom annotations tailored for specific cell types and tissues. These developments strongly facilitate the study of the immune-cell composition of mouse models and further open new avenues in the investigation of the cellular composition of other tissues and organisms. Availability and implementation: The R package and the documentation are available at https://github.com/omnideconv/immunedeconv.

7.
Sci Rep ; 14(1): 13525, 2024 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866945

RESUMO

The traditional nomenclature of enteroendocrine cells (EECs), established in 1977, applied the "one cell - one hormone" dogma, which distinguishes subpopulations based on the secretion of a specific hormone. These hormone-specific subpopulations included S cells for secretin (SCT), K cells for glucose-dependent insulinotropic polypeptide (GIP), N cells producing neurotensin (NTS), I cells producing cholecystokinin (CCK), D cells producing somatostatin (SST), and others. In the past 15 years, reinvestigations into murine and human organoid-derived EECs, however, strongly questioned this dogma and established that certain EECs coexpress multiple hormones. Using the Gut Cell Atlas, the largest available single-cell transcriptome dataset of human intestinal cells, this study consolidates that the original dogma is outdated not only for murine and human organoid-derived EECs, but also for primary human EECs, showing that the expression of certain hormones is not restricted to their designated cell type. Moreover, specific analyses into SCT-expressing cells reject the presence of any cell population that exhibits significantly elevated secretin expression compared to other cell populations, previously referred to as S cells. Instead, this investigation indicates that secretin production is realized jointly by other enteroendocrine subpopulations, validating corresponding observations in murine EECs also for human EECs. Furthermore, our findings corroborate that SCT expression peaks in mature EECs, in contrast, progenitor EECs exhibit markedly lower expression levels, supporting the hypothesis that SCT expression is a hallmark of EEC maturation.


Assuntos
Células Enteroendócrinas , Perfilação da Expressão Gênica , Secretina , Análise de Célula Única , Humanos , Células Enteroendócrinas/metabolismo , Secretina/metabolismo , Secretina/genética , Análise de Célula Única/métodos , Camundongos , Animais , Transcriptoma , Diferenciação Celular , Organoides/metabolismo , Organoides/citologia , Colecistocinina/metabolismo , Colecistocinina/genética , Somatostatina/metabolismo , Somatostatina/genética , Análise da Expressão Gênica de Célula Única
8.
Microb Genom ; 10(2)2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38421266

RESUMO

Molecular profiling techniques such as metagenomics, metatranscriptomics or metabolomics offer important insights into the functional diversity of the microbiome. In contrast, 16S rRNA gene sequencing, a widespread and cost-effective technique to measure microbial diversity, only allows for indirect estimation of microbial function. To mitigate this, tools such as PICRUSt2, Tax4Fun2, PanFP and MetGEM infer functional profiles from 16S rRNA gene sequencing data using different algorithms. Prior studies have cast doubts on the quality of these predictions, motivating us to systematically evaluate these tools using matched 16S rRNA gene sequencing, metagenomic datasets, and simulated data. Our contribution is threefold: (i) using simulated data, we investigate if technical biases could explain the discordance between inferred and expected results; (ii) considering human cohorts for type two diabetes, colorectal cancer and obesity, we test if health-related differential abundance measures of functional categories are concordant between 16S rRNA gene-inferred and metagenome-derived profiles and; (iii) since 16S rRNA gene copy number is an important confounder in functional profiles inference, we investigate if a customised copy number normalisation with the rrnDB database could improve the results. Our results show that 16S rRNA gene-based functional inference tools generally do not have the necessary sensitivity to delineate health-related functional changes in the microbiome and should thus be used with care. Furthermore, we outline important differences in the individual tools tested and offer recommendations for tool selection.


Assuntos
Metagenoma , Microbiota , Humanos , RNA Ribossômico 16S/genética , Genes de RNAr , Microbiota/genética , Algoritmos
9.
Bioinform Adv ; 4(1): vbae034, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38505804

RESUMO

Summary: Diseases can be caused by molecular perturbations that induce specific changes in regulatory interactions and their coordinated expression, also referred to as network rewiring. However, the detection of complex changes in regulatory connections remains a challenging task and would benefit from the development of novel nonparametric approaches. We develop a new ensemble method called BoostDiff (boosted differential regression trees) to infer a differential network discriminating between two conditions. BoostDiff builds an adaptively boosted (AdaBoost) ensemble of differential trees with respect to a target condition. To build the differential trees, we propose differential variance improvement as a novel splitting criterion. Variable importance measures derived from the resulting models are used to reflect changes in gene expression predictability and to build the output differential networks. BoostDiff outperforms existing differential network methods on simulated data evaluated in four different complexity settings. We then demonstrate the power of our approach when applied to real transcriptomics data in COVID-19, Crohn's disease, breast cancer, prostate adenocarcinoma, and stress response in Bacillus subtilis. BoostDiff identifies context-specific networks that are enriched with genes of known disease-relevant pathways and complements standard differential expression analyses. Availability and implementation: BoostDiff is available at https://github.com/scibiome/boostdiff_inference.

10.
bioRxiv ; 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38313260

RESUMO

RNA sequencing offers unique insights into transcriptome diversity, and a plethora of tools have been developed to analyze alternative splicing. One important task is to detect changes in the relative transcript abundance in differential transcript usage (DTU) analysis. The choice of the right analysis tool is non-trivial and depends on experimental factors such as the availability of single- or paired-end and bulk or single-cell data. To help users select the most promising tool for their task, we performed a comprehensive benchmark of DTU detection tools. We cover a wide array of experimental settings, using simulated bulk and single-cell RNA-seq data as well as real transcriptomics datasets, including time-series data. Our results suggest that DEXSeq, edgeR, and LimmaDS are better choices for paired-end data, while DSGseq and DEXSeq can be used for single-end data. In single-cell simulation settings, we showed that satuRn performs better than DTUrtle. In addition, we showed that Spycone is optimal for time series DTU/IS analysis based on the evidence provided using GO terms enrichment analysis.

11.
Sci Rep ; 14(1): 2808, 2024 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-38307916

RESUMO

Bulk RNA sequencing (RNA-seq) of blood is typically used for gene expression analysis in biomedical research but is still rarely used in clinical practice. In this study, we propose that RNA-seq should be considered a diagnostic tool, as it offers not only insights into aberrant gene expression and splicing but also delivers additional readouts on immune cell type composition as well as B-cell and T-cell receptor (BCR/TCR) repertoires. We demonstrate that RNA-seq offers insights into a patient's immune status via integrative analysis of RNA-seq data from patients infected with various SARS-CoV-2 variants (in total 196 samples with up to 200 million reads sequencing depth). We compare the results of computational cell-type deconvolution methods (e.g., MCP-counter, xCell, EPIC, quanTIseq) to complete blood count data, the current gold standard in clinical practice. We observe varying levels of lymphocyte depletion and significant differences in neutrophil levels between SARS-CoV-2 variants. Additionally, we identify B and T cell receptor (BCR/TCR) sequences using the tools MiXCR and TRUST4 to show that-combined with sequence alignments and BLASTp-they could be used to classify a patient's disease. Finally, we investigated the sequencing depth required for such analyses and concluded that 10 million reads per sample is sufficient. In conclusion, our study reveals that computational cell-type deconvolution and BCR/TCR methods using bulk RNA-seq analyses can supplement missing CBC data and offer insights into immune responses, disease severity, and pathogen-specific immunity, all achievable with a sequencing depth of 10 million reads per sample.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Perfilação da Expressão Gênica , Receptores de Antígenos de Linfócitos T/genética , Análise de Sequência de RNA/métodos , Imunidade
12.
Cell Host Microbe ; 32(4): 573-587.e5, 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38569545

RESUMO

Microbiota assembly in the infant gut is influenced by diet. Breastfeeding and human breastmilk oligosaccharides promote the colonization of beneficial bifidobacteria. Infant formulas are supplemented with bifidobacteria or complex oligosaccharides, notably galacto-oligosaccharides (GOS), to mimic breast milk. To compare microbiota development across feeding modes, this randomized controlled intervention study (German Clinical Trial DRKS00012313) longitudinally sampled infant stool during the first year of life, revealing similar fecal bacterial communities between formula- and breast-fed infants (N = 210) but differences across age. Infant formula containing GOS sustained high levels of bifidobacteria compared with formula containing B. longum and B. breve or placebo. Metabolite and bacterial profiling revealed 24-h oscillations and circadian networks. Rhythmicity in bacterial diversity, specific taxa, and functional pathways increased with age and was strongest following breastfeeding and GOS supplementation. Circadian rhythms in dominant taxa were further maintained ex vivo in a chemostat model. Hence, microbiota rhythmicity develops early in life and is impacted by diet.


Assuntos
Fórmulas Infantis , Microbiota , Feminino , Humanos , Lactente , Bifidobacterium , Aleitamento Materno , Ritmo Circadiano , Fezes/microbiologia , Fórmulas Infantis/microbiologia , Leite Humano , Oligossacarídeos/metabolismo
13.
Cell Host Microbe ; 32(8): 1347-1364.e10, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39013472

RESUMO

Mitochondrial dysfunction is associated with inflammatory bowel diseases (IBDs). To understand how microbial-metabolic circuits contribute to intestinal injury, we disrupt mitochondrial function in the epithelium by deleting the mitochondrial chaperone, heat shock protein 60 (Hsp60Δ/ΔIEC). This metabolic perturbation causes self-resolving tissue injury. Regeneration is disrupted in the absence of the aryl hydrocarbon receptor (Hsp60Δ/ΔIEC;AhR-/-) involved in intestinal homeostasis or inflammatory regulator interleukin (IL)-10 (Hsp60Δ/ΔIEC;Il10-/-), causing IBD-like pathology. Injury is absent in the distal colon of germ-free (GF) Hsp60Δ/ΔIEC mice, highlighting bacterial control of metabolic injury. Colonizing GF Hsp60Δ/ΔIEC mice with the synthetic community OMM12 reveals expansion of metabolically flexible Bacteroides, and B. caecimuris mono-colonization recapitulates the injury. Transcriptional profiling of the metabolically impaired epithelium reveals gene signatures involved in oxidative stress (Ido1, Nos2, Duox2). These signatures are observed in samples from Crohn's disease patients, distinguishing active from inactive inflammation. Thus, mitochondrial perturbation of the epithelium causes microbiota-dependent injury with discriminative inflammatory gene profiles relevant for IBD.


Assuntos
Chaperonina 60 , Microbioma Gastrointestinal , Mitocôndrias , Animais , Camundongos , Mitocôndrias/metabolismo , Humanos , Chaperonina 60/genética , Chaperonina 60/metabolismo , Doenças Inflamatórias Intestinais/microbiologia , Mucosa Intestinal/microbiologia , Mucosa Intestinal/metabolismo , Interleucina-10/genética , Interleucina-10/metabolismo , Estresse Oxidativo , Bacteroides/genética , Camundongos Endogâmicos C57BL , Camundongos Knockout , Receptores de Hidrocarboneto Arílico/metabolismo , Receptores de Hidrocarboneto Arílico/genética , Perfilação da Expressão Gênica , Intestinos/microbiologia , Intestinos/patologia , Modelos Animais de Doenças , Doença de Crohn/microbiologia
14.
Nat Comput Sci ; 1(3): 183-191, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38183187

RESUMO

Epigenetics studies inheritable and reversible modifications of DNA that allow cells to control gene expression throughout their development and in response to environmental conditions. In computational epigenomics, machine learning is applied to study various epigenetic mechanisms genome wide. Its aim is to expand our understanding of cell differentiation, that is their specialization, in health and disease. Thus far, most efforts focus on understanding the functional encoding of the genome and on unraveling cell-type heterogeneity. Here, we provide an overview of state-of-the-art computational methods and their underlying statistical concepts, which range from matrix factorization and regularized linear regression to deep learning methods. We further show how the rise of single-cell technology leads to new computational challenges and creates opportunities to further our understanding of epigenetic regulation.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa