Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
Add more filters










Publication year range
1.
Nat Methods ; 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38849569

ABSTRACT

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

2.
Nat Commun ; 15(1): 1227, 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38418480

ABSTRACT

Exploring the molecular basis of disease severity in rare disease scenarios is a challenging task provided the limitations on data availability. Causative genes have been described for Congenital Myasthenic Syndromes (CMS), a group of diverse minority neuromuscular junction (NMJ) disorders; yet a molecular explanation for the phenotypic severity differences remains unclear. Here, we present a workflow to explore the functional relationships between CMS causal genes and altered genes from each patient, based on multilayer network community detection analysis of complementary biomedical information provided by relevant data sources, namely protein-protein interactions, pathways and metabolomics. Our results show that CMS severity can be ascribed to the personalized impairment of extracellular matrix components and postsynaptic modulators of acetylcholine receptor (AChR) clustering. This work showcases how coupling multilayer network analysis with personalized -omics information provides molecular explanations to the varying severity of rare diseases; paving the way for sorting out similar cases in other rare diseases.


Subject(s)
Myasthenic Syndromes, Congenital , Humans , Myasthenic Syndromes, Congenital/genetics , Myasthenic Syndromes, Congenital/diagnosis , Neuromuscular Junction/metabolism , Rare Diseases/metabolism , Workflow , Receptors, Cholinergic/genetics , Receptors, Cholinergic/metabolism , Mutation
3.
Nucleic Acids Res ; 52(D1): D255-D264, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37971353

ABSTRACT

RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.


Subject(s)
Databases, Genetic , Escherichia coli K12 , Gene Expression Regulation, Bacterial , Computational Biology/methods , Escherichia coli K12/genetics , Internet , Transcription, Genetic
4.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38038272

ABSTRACT

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Subject(s)
Benchmarking , Proteomics , Workflow , Software , Proteins , Data Analysis
5.
Adv Healthc Mater ; 12(25): e2300150, 2023 10.
Article in English | MEDLINE | ID: mdl-37563883

ABSTRACT

Biomaterials research output has experienced an exponential increase over the last three decades. The majority of research is published in the form of scientific articles and is therefore available as unstructured text, making it a challenging input for computational processing. Computational tools are becoming essential to overcome this information overload. Among them, text mining systems present an attractive option for the automated extraction of information from text documents into structured datasets. This work presents the first automated system for biomaterial related information extraction from the National Library of Medicine's premier bibliographic database (MEDLINE) research abstracts into a searchable database. The system is a text mining pipeline that periodically retrieves abstracts from PubMed and identifies research and clinical studies of biomaterials. Thereafter, the pipeline identifies sixteen concept types of interest in the abstract using the Biomaterials Annotator, a tool for biomaterials Named Entity Recognition (NER). These concepts of interest, along with the abstract and relevant metadata are then deposited in DEBBIE, the Database of Experimental Biomaterials and their Biological Effect. DEBBIE is accessible through a web application that provides keyword searches and displays results in an intuitive and meaningful manner, aiming to facilitate an efficient mapping and organization of biomaterials information.


Subject(s)
Access to Information , Data Mining , United States , Data Mining/methods , PubMed , Databases, Factual , Software
7.
Sci Data ; 10(1): 292, 2023 05 19.
Article in English | MEDLINE | ID: mdl-37208467

ABSTRACT

The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good data stewardship and a prerequisite for reproducibility. Nowadays, FAIR guides data policy actions and professional practices in the public and private sectors. Despite such global endorsements, however, the FAIR Principles are aspirational, remaining elusive at best, and intimidating at worst. To address the lack of practical guidance, and help with capability gaps, we developed the FAIR Cookbook, an open, online resource of hands-on recipes for "FAIR doers" in the Life Sciences. Created by researchers and data managers professionals in academia, (bio)pharmaceutical companies and information service industries, the FAIR Cookbook covers the key steps in a FAIRification journey, the levels and indicators of FAIRness, the maturity model, the technologies, the tools and the standards available, as well as the skills required, and the challenges to achieve and improve data FAIRness. Part of the ELIXIR ecosystem, and recommended by funders, the FAIR Cookbook is open to contributions of new recipes.

8.
Cell Genom ; 3(1): 100244, 2023 Jan 11.
Article in English | MEDLINE | ID: mdl-36777183

ABSTRACT

Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.

10.
Nat Med ; 28(8): 1662-1671, 2022 08.
Article in English | MEDLINE | ID: mdl-35953718

ABSTRACT

Richter transformation (RT) is a paradigmatic evolution of chronic lymphocytic leukemia (CLL) into a very aggressive large B cell lymphoma conferring a dismal prognosis. The mechanisms driving RT remain largely unknown. We characterized the whole genome, epigenome and transcriptome, combined with single-cell DNA/RNA-sequencing analyses and functional experiments, of 19 cases of CLL developing RT. Studying 54 longitudinal samples covering up to 19 years of disease course, we uncovered minute subclones carrying genomic, immunogenetic and transcriptomic features of RT cells already at CLL diagnosis, which were dormant for up to 19 years before transformation. We also identified new driver alterations, discovered a new mutational signature (SBS-RT), recognized an oxidative phosphorylation (OXPHOS)high-B cell receptor (BCR)low-signaling transcriptional axis in RT and showed that OXPHOS inhibition reduces the proliferation of RT cells. These findings demonstrate the early seeding of subclones driving advanced stages of cancer evolution and uncover potential therapeutic targets for RT.


Subject(s)
Leukemia, Lymphocytic, Chronic, B-Cell , Lymphoma, Large B-Cell, Diffuse , Cell Transformation, Neoplastic/genetics , Disease Progression , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Lymphoma, Large B-Cell, Diffuse/genetics , Lymphoma, Large B-Cell, Diffuse/pathology
11.
Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35552456

ABSTRACT

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.


Subject(s)
Benchmarking , Genomics , Phylogeny , Genomics/methods , Proteome
12.
Nucleic Acids Res ; 50(D1): D1062-D1068, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34718760

ABSTRACT

PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.


Subject(s)
Databases, Genetic , Evolution, Molecular , Genome/genetics , Software , Animals , Humans , Knowledge Bases , Molecular Sequence Annotation , Phylogeny , Plants/genetics , Proteome/genetics
13.
Sci Data ; 8(1): 310, 2021 11 30.
Article in English | MEDLINE | ID: mdl-34848723

ABSTRACT

COVID-19 is an infectious disease caused by the SARS-CoV-2 virus, which has spread all over the world leading to a global pandemic. The fast progression of COVID-19 has been mainly related to the high contagion rate of the virus and the worldwide mobility of humans. In the absence of pharmacological therapies, governments from different countries have introduced several non-pharmaceutical interventions to reduce human mobility and social contact. Several studies based on Anonymized Mobile Phone Data have been published analysing the relationship between human mobility and the spread of coronavirus. However, to our knowledge, none of these data-sets integrates cross-referenced geo-localised data on human mobility and COVID-19 cases into one all-inclusive open resource. Herein we present COVID-19 Flow-Maps, a cross-referenced Geographic Information System that integrates regularly updated time-series accounting for population mobility and daily reports of COVID-19 cases in Spain at different scales of time spatial resolution. This integrated and up-to-date data-set can be used to analyse the human dynamics to guide and support the design of more effective non-pharmaceutical interventions.


Subject(s)
COVID-19/epidemiology , Geographic Information Systems , Travel , COVID-19/transmission , Cell Phone , Humans , Pandemics , Spain/epidemiology
14.
F1000Res ; 10: 897, 2021.
Article in English | MEDLINE | ID: mdl-34804501

ABSTRACT

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.


Subject(s)
Biological Science Disciplines , Computational Biology , Benchmarking , Software , Workflow
15.
F1000Res ; 102021.
Article in English | MEDLINE | ID: mdl-34249331

ABSTRACT

Background: Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.


Subject(s)
Ecosystem , Metadata , Genome , Genomics , Software
16.
Pharmaceuticals (Basel) ; 14(3)2021 Mar 08.
Article in English | MEDLINE | ID: mdl-33800393

ABSTRACT

eTRANSAFE is a research project funded within the Innovative Medicines Initiative (IMI), which aims at developing integrated databases and computational tools (the eTRANSAFE ToxHub) that support the translational safety assessment of new drugs by using legacy data provided by the pharmaceutical companies that participate in the project. The project objectives include the development of databases containing preclinical and clinical data, computational systems for translational analysis including tools for data query, analysis and visualization, as well as computational models to explain and predict drug safety events.

17.
Sci Data ; 8(1): 10, 2021 01 15.
Article in English | MEDLINE | ID: mdl-33452270

ABSTRACT

Rett syndrome (RTT) is a rare neurological disorder mostly caused by a genetic variation in MECP2. Making new MECP2 variants and the related phenotypes available provides data for better understanding of disease mechanisms and faster identification of variants for diagnosis. This is, however, currently hampered by the lack of interoperability between genotype-phenotype databases. Here, we demonstrate on the example of MECP2 in RTT that by making the genotype-phenotype data more Findable, Accessible, Interoperable, and Reusable (FAIR), we can facilitate prioritization and analysis of variants. In total, 10,968 MECP2 variants were successfully integrated. Among these variants 863 unique confirmed RTT causing and 209 unique confirmed benign variants were found. This dataset was used for comparison of pathogenicity predicting tools, protein consequences, and identification of ambiguous variants. Prediction tools generally recognised the RTT causing and benign variants, however, there was a broad range of overlap Nineteen variants were identified that were annotated as both disease-causing and benign, suggesting that there are additional factors in these cases contributing to disease development.


Subject(s)
Methyl-CpG-Binding Protein 2/genetics , Mutation , Rett Syndrome/etiology , DNA Mutational Analysis , Data Analysis , Humans , Rett Syndrome/genetics
18.
F1000Res ; 10: 80, 2021.
Article in English | MEDLINE | ID: mdl-35847383

ABSTRACT

Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain "live" (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines' implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.


Subject(s)
Benchmarking , High-Throughput Nucleotide Sequencing , Anti-Bacterial Agents/pharmacology , Computational Biology/methods , Drug Resistance, Bacterial/genetics , High-Throughput Nucleotide Sequencing/methods
19.
Clin Cancer Res ; 27(5): 1491-1504, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33262138

ABSTRACT

PURPOSE: Recurrent and/or metastatic unresectable cutaneous squamous cell carcinomas (cSCCs) are treated with chemotherapy or radiotherapy, but have poor clinical responses. A limited response (up to 45% of cases) to EGFR-targeted therapies was observed in clinical trials with patients with advanced and metastatic cSCC. Here, we analyze the molecular traits underlying the response to EGFR inhibitors, and the mechanisms responsible for cSCC resistance to EGFR-targeted therapy. EXPERIMENTAL DESIGN: We generated primary cell cultures and patient cSCC-derived xenografts (cSCC-PDXs) that recapitulate the histopathologic and molecular features of patient tumors. Response to gefitinib treatment was tested and gefitinib-resistant (GefR) cSCC-PDXs were developed. RNA sequence analysis was performed in matched untreated and GefR cSCC-PDXs to determine the mechanisms driving gefitinib resistance. RESULTS: cSCCs conserving epithelial traits exhibited strong activation of EGFR signaling, which promoted tumor cell proliferation, in contrast to mesenchymal-like cSCCs. Gefitinib treatment strongly blocked epithelial-like cSCC-PDX growth in the absence of EGFR and RAS mutations, whereas tumors carrying the E545K PIK3CA-activating mutation were resistant to treatment. A subset of initially responding tumors acquired resistance after long-term treatment, which was induced by the bypass from EGFR to FGFR signaling to allow tumor cell proliferation and survival upon gefitinib treatment. Pharmacologic inhibition of FGFR signaling overcame resistance to EGFR inhibitor, even in PIK3CA-mutated tumors. CONCLUSIONS: EGFR-targeted therapy may be appropriate for treating many epithelial-like cSCCs without PIK3CA-activating mutations. Combined EGFR- and FGFR-targeted therapy may be used to treat cSCCs that show intrinsic or acquired resistance to EGFR inhibitors.


Subject(s)
Drug Resistance, Neoplasm , Gefitinib/pharmacology , Gene Expression Regulation, Neoplastic/drug effects , Neoplasms, Glandular and Epithelial/drug therapy , Receptor, Fibroblast Growth Factor, Type 1/antagonists & inhibitors , Skin Neoplasms/drug therapy , Animals , Apoptosis , Carcinoma, Squamous Cell/drug therapy , Carcinoma, Squamous Cell/metabolism , Carcinoma, Squamous Cell/pathology , Cell Proliferation , ErbB Receptors/antagonists & inhibitors , ErbB Receptors/genetics , Humans , Male , Mice , Mice, Inbred NOD , Mice, SCID , Mutation , Neoplasms, Glandular and Epithelial/metabolism , Neoplasms, Glandular and Epithelial/pathology , Protein Kinase Inhibitors/pharmacology , Skin Neoplasms/metabolism , Skin Neoplasms/pathology , Tumor Cells, Cultured , Xenograft Model Antitumor Assays
SELECTION OF CITATIONS
SEARCH DETAIL
...