RESUMO
MOTIVATION: Tumor mutational burden (TMB) has been proposed as a predictive biomarker for immunotherapy response in cancer patients, as it is thought to enrich for tumors with high neoantigen load. TMB assessed by whole-exome sequencing is considered the gold standard but remains confined to research settings. In the clinical setting, targeted gene panels sampling various genomic sizes along with diverse strategies to estimate TMB were proposed and no real standard has emerged yet. RESULTS: We provide the community with TMBleR, a tool to measure the clinical impact of various strategies of panel-based TMB measurement. AVAILABILITY AND IMPLEMENTATION: R package and docker container (GPL-3 Open Source license): https://acc-bioinfo.github.io/TMBleR/. Graphical-user interface website: https://bioserver.ieo.it/shiny/app/tmbler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Neoplasias , Humanos , Mutação , Neoplasias/patologia , Imunoterapia , Biomarcadores Tumorais/genética , Biologia ComputacionalRESUMO
BACKGROUND: The deeper knowledge of non-small-cell lung cancer (NSCLC) biology and the discovery of driver molecular alterations have opened the era of precision medicine in lung oncology, thus significantly revolutionizing the diagnostic and therapeutic approach to NSCLC. In Italy, however, molecular assessment remains heterogeneous across the country, and numbers of patients accessing personalized treatments remain relatively low. Nationwide programs have demonstrated that the creation of consortia represent a successful strategy to increase the number of patients with a molecular classification. PATIENTS AND METHODS: The Alliance Against Cancer (ACC), a network of 25 Italian Research Institutes, has developed a targeted sequencing panel for the detection of genomic alterations in 182 genes in patients with a diagnosis of NSCLC (ACC lung panel). One thousand metastatic NSCLC patients will be enrolled onto a prospective trial designed to measure the sensitivity and specificity of the ACC lung panel as a tool for molecular screening compared to standard methods. RESULTS AND CONCLUSION: The ongoing trial is part of a nationwide strategy of ACC to develop infrastructures and improve competences to make the Italian research institutes independent for genomic profiling of cancer patients.
Assuntos
Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Neoplasias Pulmonares/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Detecção Precoce de Câncer , Genômica , Humanos , Itália , Neoplasias Pulmonares/genética , Programas de Rastreamento/métodos , Medicina de Precisão/métodos , Estudos Prospectivos , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: Genome browsers are widely used for locating interesting genomic regions, but their interactive use is obviously limited to inspecting short genomic portions. An ideal interaction is to provide patterns of regions on the browser, and then extract other genomic regions over the whole genome where such patterns occur, ranked by similarity. RESULTS: We developed SimSearch, an optimized pattern-search method and an open source plugin for the Integrated Genome Browser (IGB), to find genomic region sets that are similar to a given region pattern. It provides efficient visual genome-wide analytics computation in large datasets; the plugin supports intuitive user interactions for selecting an interesting pattern on IGB tracks and visualizing the computed occurrences of similar patterns along the entire genome. SimSearch also includes functions for the annotation and enrichment of results, and is enhanced with a Quickload repository including numerous epigenomic feature datasets from ENCODE and Roadmap Epigenomics. The paper also includes some use cases to show multiple genome-wide analyses of biological interest, which can be easily performed by taking advantage of the presented approach. CONCLUSIONS: The novel SimSearch method provides innovative support for effective genome-wide pattern search and visualization; its relevance and practical usefulness is demonstrated through a number of significant use cases of biological interest. The SimSearch IGB plugin, documentation, and code are freely available at https://deib-geco.github.io/simsearch-app/ and https://github.com/DEIB-GECO/simsearch-app/ .
Assuntos
Algoritmos , Epigenômica , Genoma , Reconhecimento Automatizado de Padrão , Navegador , Humanos , Anotação de Sequência Molecular , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismoRESUMO
The 17th International NETTAB workshop was held in Palermo, Italy, on October 16-18, 2017. The special topic for the meeting was "Methods, tools and platforms for Personalised Medicine in the Big Data Era", but the traditional topics of the meeting series were also included in the event. About 40 scientific contributions were presented, including four keynote lectures, five guest lectures, and many oral communications and posters. Also, three tutorials were organised before and after the workshop. Full papers from some of the best works presented in Palermo were submitted for this Supplement of BMC Bioinformatics. Here, we provide an overview of meeting aims and scope. We also shortly introduce selected papers that have been accepted for publication in this Supplement, for a complete presentation of the outcomes of the meeting.
Assuntos
Biologia Computacional/métodos , Atenção à Saúde , Genômica , Humanos , Itália , Neoplasias/genética , Medicina de PrecisãoRESUMO
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
RESUMO
BACKGROUND: Biologists generally interrogate genomics data using web-based genome browsers that have limited analytical potential. New generation genome browsers such as the Integrated Genome Browser (IGB) have largely overcome this limitation and permit customized analyses to be implemented using plugins. We illustrate the use of a plugin for IGB that exploits advanced visualization techniques to integrate the analysis of genomics data with network and structural approaches. RESULTS: We show how visualization technologies that combine both genomics and network biology can facilitate the selection of the key amino acid contacts from protein-protein and protein-drug interactions. Starting from the MDM2-P53 interaction, which is a high-value target for cancer therapy, and Nutlin, the parent small molecule of an MDM2 antagonist that is currently in clinical trials, we show that this method can be generalized to analyze how drugs and mutations can interfere with both protein-protein and drug-protein networks. We illustrate this point by two additional use-cases exploring the molecular basis of tamoxifen side effects and of drug resistance in chronic myeloid leukemia patients. CONCLUSIONS: Combined network and structure biology approaches provide key insights into both the genetic and the edgetic roles of variants in diseases. 3D interactomes facilitate the identification of disease-relevant interactions that can then be specifically targeted by drugs. Recent advances in molecular interaction and structure visualization tools have greatly simplified the mapping of mutated residues to molecular interaction interfaces. Such approaches can now also be integrated with genome visualization tools to enable comparative analyses of interaction contacts.
Assuntos
Gráficos por Computador , Redes Reguladoras de Genes/efeitos dos fármacos , Genoma Humano , Mutação/genética , Preparações Farmacêuticas/metabolismo , Mapas de Interação de Proteínas/efeitos dos fármacos , Proteínas/metabolismo , Bases de Dados Factuais , Genômica/métodos , HumanosRESUMO
BACKGROUND: The increasing availability of resequencing data has led to a better understanding of the most important genes in cancer development. Nevertheless, the mutational landscape of many tumor types is heterogeneous and encompasses a long tail of potential driver genes that are systematically excluded by currently available methods due to the low frequency of their mutations. We developed LowMACA (Low frequency Mutations Analysis via Consensus Alignment), a method that combines the mutations of various proteins sharing the same functional domains to identify conserved residues that harbor clustered mutations in multiple sequence alignments. LowMACA is designed to visualize and statistically assess potential driver genes through the identification of their mutational hotspots. RESULTS: We analyzed the Ras superfamily exploiting the known driver mutations of the trio K-N-HRAS, identifying new putative driver mutations and genes belonging to less known members of the Rho, Rab and Rheb subfamilies. Furthermore, we applied the same concept to a list of known and candidate driver genes, and observed that low confidence genes show similar patterns of mutation compared to high confidence genes of the same protein family. CONCLUSIONS: LowMACA is a software for the identification of gain-of-function mutations in putative oncogenic families, increasing the amount of information on functional domains and their possible role in cancer. In this context LowMACA emphasizes the role of genes mutated at low frequency otherwise undetectable by classical single gene analysis. LowMACA is an R package available at http://www.bioconductor.org/packages/release/bioc/html/LowMACA.html. It is also available as a GUI standalone downloadable at: https://cgsb.genomics.iit.it/wiki/projects/LowMACA.
Assuntos
Análise Mutacional de DNA/métodos , Mutação/genética , Neoplasias/genética , Neoplasias/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Software , Humanos , Proteínas/genéticaRESUMO
UNLABELLED: Prioritization of candidate genes emanating from large-scale screens requires integrated analyses at the genomics, molecular, network and structural biology levels. We have extended the Integrated Genome Browser (IGB) to facilitate these tasks. The graphical user interface greatly simplifies building disease networks and zooming in at atomic resolution to identify variations in molecular complexes that may affect molecular interactions in the context of genomic data. All results are summarized in genome tracks and can be visualized and analyzed at the transcript level. AVAILABILITY AND IMPLEMENTATION: The MI Bundle is a plugin for the IGB. The plugin, help, video and tutorial are available at http://cru.genomics.iit.it/igbmibundle/ and https://github.com/CRUiit/igb-mi-bundle/wiki. The source code is released under the Apache License, Version 2. CONTACT: arnaud.ceol@iit.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Genômica/métodos , Software , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Doença/genética , Redes Reguladoras de Genes , HumanosRESUMO
Enterohemorrhagic E. coli (EHEC) manipulate their human host through at least 39 effector proteins which hijack host processes through direct protein-protein interactions (PPIs). To identify their protein targets in the host cells, we performed yeast two-hybrid screens, allowing us to find 48 high-confidence protein-protein interactions between 15 EHEC effectors and 47 human host proteins. In comparison to other bacteria and viruses we found that EHEC effectors bind more frequently to hub proteins as well as to proteins that participate in a higher number of protein complexes. The data set includes six new interactions that involve the translocated intimin receptor (TIR), namely HPCAL1, HPCAL4, NCALD, ARRB1, PDE6D, and STK16. We compared these TIR interactions in EHEC and enteropathogenic E. coli (EPEC) and found that five interactions were conserved. Notably, the conserved interactions included those of serine/threonine kinase 16 (STK16), hippocalcin-like 1 (HPCAL1) as well as neurocalcin-delta (NCALD). These proteins co-localize with the infection sites of EPEC. Furthermore, our results suggest putative functions of poorly characterized effectors (EspJ, EspY1). In particular, we observed that EspJ is connected to the microtubule system while EspY1 appears to be involved in apoptosis/cell cycle regulation.
Assuntos
Adesinas Bacterianas/metabolismo , Escherichia coli Êntero-Hemorrágica/metabolismo , Proteínas de Escherichia coli/metabolismo , Interações Hospedeiro-Patógeno/fisiologia , Domínios e Motivos de Interação entre Proteínas/fisiologia , Receptores de Superfície Celular/metabolismo , Humanos , Neurocalcina/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Fatores de Transcrição/metabolismoRESUMO
BACKGROUND: Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). METHODS: SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. RESULTS: SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. CONCLUSIONS: SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos , Automação , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/instrumentação , Fluxo de TrabalhoRESUMO
Helicobacter pylori infections cause gastric ulcers and play a major role in the development of gastric cancer. In 2001, the first protein interactome was published for this species, revealing over 1500 binary protein interactions resulting from 261 yeast two-hybrid screens. Here we roughly double the number of previously published interactions using an ORFeome-based, proteome-wide yeast two-hybrid screening strategy. We identified a total of 1515 protein-protein interactions, of which 1461 are new. The integration of all the interactions reported in H. pylori results in 3004 unique interactions that connect about 70% of its proteome. Excluding interactions of promiscuous proteins we derived from our new data a core network consisting of 908 interactions. We compared our data set to several other bacterial interactomes and experimentally benchmarked the conservation of interactions using 365 protein pairs (interologs) of E. coli of which one third turned out to be conserved in both species.
Assuntos
Proteínas de Bactérias/metabolismo , Helicobacter pylori/metabolismo , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Sequência de Aminoácidos , Sequência Conservada , Fases de Leitura Aberta , Proteoma/análise , Proteômica , Técnicas do Sistema de Duplo-HíbridoRESUMO
Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (â¼70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, which approximately doubles the number of known binary PPIs in E. coli. Integration of binary PPI and genetic-interaction data revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that we could map in multiprotein complexes were informative regarding internal topology of complexes and indicated that interactions in complexes are substantially more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily important model microbe.
Assuntos
Proteínas de Escherichia coli , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/fisiologia , Proteômica/métodos , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Técnicas do Sistema de Duplo-HíbridoRESUMO
BACKGROUND: Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. RESULTS: Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. CONCLUSIONS: The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions.
Assuntos
Genômica/métodos , Animais , Genoma , Humanos , Internet , Camundongos , Saccharomyces cerevisiae/genética , Software , Peixe-Zebra/genéticaRESUMO
The database of 3D interacting domains (3did, available online for browsing and bulk download at http://3did.irbbarcelona.org) is a catalog of protein-protein interactions for which a high-resolution 3D structure is known. 3did collects and classifies all structural templates of domain-domain interactions in the Protein Data Bank, providing molecular details for such interactions. The current version also includes a pipeline for the discovery and annotation of novel domain-motif interactions. For every interaction, 3did identifies and groups different binding modes by clustering similar interfaces into 'interaction topologies'. By maintaining a constantly updated collection of domain-based structural interaction templates, 3did is a reference source of information for the structural characterization of protein interaction networks. 3did is updated every 6 months.
Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Internet , Modelos Moleculares , Mapeamento de Interação de Proteínas , Mapas de Interação de ProteínasRESUMO
Protein interaction maps are the key to understand the complex world of biological processes inside the cell. Public protein databases have already catalogued hundreds of thousands of experimentally discovered interactions, and struggle to curate all the existing information dispersed through the literature. However, to be most efficient, standard protocols need to be implemented for direct submission of new interaction sets directly into databases. At the same time, great efforts are invested to expand the coverage of the interaction space and unveil the molecular details of such interactions up to the atomistic level. The net result will be the definition of a detailed atlas spanning the universe of protein interactions to guide the everyday work of the biologist.
Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Biologia Computacional , Humanos , Proteínas/químicaRESUMO
Network-centered approaches are increasingly used to understand the fundamentals of biology. However, the molecular details contained in the interaction networks, often necessary to understand cellular processes, are very limited, and the experimental difficulties surrounding the determination of protein complex structures make computational modeling techniques paramount. Here we present Interactome3D, a resource for the structural annotation and modeling of protein-protein interactions. Through the integration of interaction data from the main pathway repositories, we provide structural details at atomic resolution for over 12,000 protein-protein interactions in eight model organisms. Unlike static databases, Interactome3D also allows biologists to upload newly discovered interactions and pathways in any species, select the best combination of structural templates and build three-dimensional models in a fully automated manner. Finally, we illustrate the value of Interactome3D through the structural annotation of the complement cascade pathway, rationalizing a potential common mechanism of action suggested for several disease-causing mutations.
Assuntos
Modelos Biológicos , Complexos Multiproteicos/química , Mapeamento de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Animais , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Complexos Multiproteicos/metabolismo , Conformação ProteicaRESUMO
The many ongoing genome sequencing initiatives are delivering comprehensive lists of the individual molecular components present in an organism, but these reveal little about how they work together. Follow-up initiatives are revealing thousands of interrelationships between gene products that need to be analyzed with novel bioinformatics approaches able to capture their complex emerging properties. Recently, we developed NetAligner, a novel network alignment tool that allows the identification of conserved protein complexes and pathways across organisms, providing valuable hints as to how those interaction networks evolved. NetAligner includes the prediction of likely conserved interactions, based on evolutionary distances, to counter the high number of missing interactions in current interactome networks, and a fast assessment of the statistical significance of individual alignment solutions, which increases its performance with respect to existing tools. The web server implementation of the NetAligner algorithm presented here features complex, pathway and interactome to interactome alignments for seven model organisms, namely Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli. The user can query complexes and pathways of arbitrary topology against a target species interactome, or directly compare two complete interactomes to identify conserved complexes and subnetworks. Alignment solutions can be downloaded or directly visualized in the browser. The NetAligner web server is publicly available at http://netaligner.irbbarcelona.org/.
Assuntos
Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas/métodos , Software , Animais , Gráficos por Computador , Humanos , Internet , CamundongosRESUMO
The database of three-dimensional interacting domains (3did) is a collection of protein interactions for which high-resolution three-dimensional structures are known. 3did exploits the availability of structural data to provide molecular details on interactions between two globular domains as well as novel domain-peptide interactions, derived using a recently published method from our lab. The interface residues are presented for each interaction type individually, plus global domain interfaces at which one or more partners (domains or peptides) bind. The 3did web server at http://3did.irbbarcelona.org visualizes these interfaces along with atomic details of individual interactions using Jmol. The complete contents are also available for download.