RESUMO
Dealing with sequence coordinates in different formats and reference genomes is challenging in genetic research. This complexity arises from the need to convert and harmonize datasets of different sources using alternating nomenclatures. Since manual processing is time-consuming and requires specialized knowledge, the Sequence Conversion and Analysis Toolbox (SeqCAT) was developed for daily work with genetic datasets. Our tool provides a range of functions designed to standardize and convert gene variant coordinates based on various sequence types. Its user-friendly web interface provides easy access to all functionalities, while the Application Programming Interface (API) enables automation within pipelines. SeqCAT provides access to human genomic, protein and transcript data, utilizing various data resources and packages and extending them with its own unique features. The platform covers a wide range of genetic research needs with its 14 different applications and 3 info points, including search for transcript and gene information, transition between reference genomes, variant mapping, and genetic event review. Notable examples are 'Convert Protein to DNA Position' for translation of amino acid changes into genomic single nucleotide variants, or 'Fusion Check' for frameshift determination in gene fusions. SeqCAT is an excellent resource for converting sequence coordinate data into the required formats and is available at: https://mtb.bioinf.med.uni-goettingen.de/SeqCAT/.
Assuntos
Genômica , Software , Humanos , Genômica/métodos , Genoma Humano , Análise de Sequência de DNA/métodos , Internet , Interface Usuário-ComputadorRESUMO
Motile multiciliated cells (MCCs) have critical roles in respiratory health and disease and are essential for cleaning inhaled pollutants and pathogens from airways. Despite their significance for human disease, the transcriptional control that governs multiciliogenesis remains poorly understood. Here we identify TP73, a p53 homolog, as governing the program for airway multiciliogenesis. Mice with TP73 deficiency suffer from chronic respiratory tract infections due to profound defects in ciliogenesis and complete loss of mucociliary clearance. Organotypic airway cultures pinpoint TAp73 as necessary and sufficient for basal body docking, axonemal extension, and motility during the differentiation of MCC progenitors. Mechanistically, cross-species genomic analyses and complete ciliary rescue of knockout MCCs identify TAp73 as the conserved central transcriptional integrator of multiciliogenesis. TAp73 directly activates the key regulators FoxJ1, Rfx2, Rfx3, and miR34bc plus nearly 50 structural and functional ciliary genes, some of which are associated with human ciliopathies. Our results position TAp73 as a novel central regulator of MCC differentiation.
Assuntos
Diferenciação Celular/genética , Cílios/genética , Regulação da Expressão Gênica/genética , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Mucosa Respiratória/citologia , Animais , Células Cultivadas , Técnicas de Inativação de Genes , Camundongos , Infecções Respiratórias/genética , Infecções Respiratórias/fisiopatologiaRESUMO
BACKGROUND: Chronic kidney disease (CKD), a major public health problem with differing disease etiologies, leads to complications, comorbidities, polypharmacy, and mortality. Monitoring disease progression and personalized treatment efforts are crucial for long-term patient outcomes. Physicians need to integrate different data levels, e.g., clinical parameters, biomarkers, and drug information, with medical knowledge. Clinical decision support systems (CDSS) can tackle these issues and improve patient management. Knowledge about the awareness and implementation of CDSS in Germany within the field of nephrology is scarce. PURPOSE: Nephrologists' attitude towards any CDSS and potential CDSS features of interest, like adverse event prediction algorithms, is important for a successful implementation. This survey investigates nephrologists' experiences with and expectations towards a useful CDSS for daily medical routine in the outpatient setting. METHODS: The 38-item questionnaire survey was conducted either by telephone or as a do-it-yourself online interview amongst nephrologists across all of Germany. Answers were collected and analysed using the Electronic Data Capture System REDCap, as well as Stata SE 15.1, and Excel. The survey consisted of four modules: experiences with CDSS (M1), expectations towards a helpful CDSS (M2), evaluation of adverse event prediction algorithms (M3), and ethical aspects of CDSS (M4). Descriptive statistical analyses of all questions were conducted. RESULTS: The study population comprised 54 physicians, with a response rate of about 80-100% per question. Most participants were aged between 51-60 years (45.1%), 64% were male, and most participants had been working in nephrology out-patient clinics for a median of 10.5 years. Overall, CDSS use was poor (81.2%), often due to lack of knowledge about existing CDSS. Most participants (79%) believed CDSS to be helpful in the management of CKD patients with a high willingness to try out a CDSS. Of all adverse event prediction algorithms, prediction of CKD progression (97.8%) and in-silico simulations of disease progression when changing, e. g., lifestyle or medication (97.7%) were rated most important. The spectrum of answers on ethical aspects of CDSS was diverse. CONCLUSION: This survey provides insights into experience with and expectations of out-patient nephrologists on CDSS. Despite the current lack of knowledge on CDSS, the willingness to integrate CDSS into daily patient care, and the need for adverse event prediction algorithms was high.
Assuntos
Sistemas de Apoio a Decisões Clínicas , Insuficiência Renal Crônica , Humanos , Masculino , Pessoa de Meia-Idade , Feminino , Nefrologistas , Motivação , Insuficiência Renal Crônica/terapia , Inquéritos e Questionários , Progressão da DoençaRESUMO
BACKGROUND: Most of the known genes required for developmental processes have been identified by genetic screens in a few well-studied model organisms, which have been considered representative of related species, and informative-to some degree-for human biology. The fruit fly Drosophila melanogaster is a prime model for insect genetics, and while conservation of many gene functions has been observed among bilaterian animals, a plethora of data show evolutionary divergence of gene function among more closely-related groups, such as within the insects. A quantification of conservation versus divergence of gene functions has been missing, without which it is unclear how representative data from model systems actually are. RESULTS: Here, we systematically compare the gene sets required for a number of homologous but divergent developmental processes between fly and beetle in order to quantify the difference of the gene sets. To that end, we expanded our RNAi screen in the red flour beetle Tribolium castaneum to cover more than half of the protein-coding genes. Then we compared the gene sets required for four different developmental processes between beetle and fly. We found that around 50% of the gene functions were identified in the screens of both species while for the rest, phenotypes were revealed only in fly (~ 10%) or beetle (~ 40%) reflecting both technical and biological differences. Accordingly, we were able to annotate novel developmental GO terms for 96 genes studied in this work. With this work, we publish the final dataset for the pupal injection screen of the iBeetle screen reaching a coverage of 87% (13,020 genes). CONCLUSIONS: We conclude that the gene sets required for a homologous process diverge more than widely believed. Hence, the insights gained in flies may be less representative for insects or protostomes than previously thought, and work in complementary model systems is required to gain a comprehensive picture. The RNAi screening resources developed in this project, the expanding transgenic toolkit, and our large-scale functional data make T. castaneum an excellent model system in that endeavor.
Assuntos
Besouros , Tribolium , Animais , Besouros/genética , Drosophila , Drosophila melanogaster/genética , Pupa , Interferência de RNA , Tribolium/genéticaRESUMO
BACKGROUND: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for genome-wide RNAi screening have become available in this model. All these techniques depend on a high quality genome assembly and precise gene models. However, the first version of the genome assembly was generated by Sanger sequencing, and with a small set of RNA sequence data limiting annotation quality. RESULTS: Here, we present an improved genome assembly (Tcas5.2) and an enhanced genome annotation resulting in a new official gene set (OGS3) for Tribolium castaneum, which significantly increase the quality of the genomic resources. By adding large-distance jumping library DNA sequencing to join scaffolds and fill small gaps, the gaps in the genome assembly were reduced and the N50 increased to 4753kbp. The precision of the gene models was enhanced by the use of a large body of RNA-Seq reads of different life history stages and tissue types, leading to the discovery of 1452 novel gene sequences. We also added new features such as alternative splicing, well defined UTRs and microRNA target predictions. For quality control, 399 gene models were evaluated by manual inspection. The current gene set was submitted to Genbank and accepted as a RefSeq genome by NCBI. CONCLUSIONS: The new genome assembly (Tcas5.2) and the official gene set (OGS3) provide enhanced genomic resources for genetic work in Tribolium castaneum. The much improved information on transcription start sites supports transgenic and gene editing approaches. Further, novel types of information such as splice variants and microRNA target genes open additional possibilities for analysis.
Assuntos
Genes de Insetos , Genoma de Inseto , Genômica , Tribolium/genética , Animais , Sítios de Ligação , Biologia Computacional/métodos , Genômica/métodos , MicroRNAs/genética , Anotação de Sequência Molecular , Filogenia , Interferência de RNA , Reprodutibilidade dos TestesRESUMO
The iBeetle-Base provides access to sequence and phenotype information for genes of the beetle Tribolium castaneum. It has been updated including more and updated data and new functions. RNAi phenotypes are now available for >50% of the genes, which represents an expansion of 60% compared to the previous version. Gene sequence information has been updated based on the new official gene set OGS3 and covers all genes. Interoperability with FlyBase has been enhanced: First, gene information pages of homologous genes are interlinked between both databases. Second, some steps of a new query pipeline allow transforming gene lists from either species into lists with related gene IDs, names or GO terms. This facilitates the comparative analysis of gene functions between fly and beetle. The backend of the pipeline is implemented as endpoints of a RESTful interface, such that it can be reused by other projects or tools. A novel online interface allows the community to propose GO terms for their gene of interest expanding the range of animals where GO terms are defined. iBeetle-Base is available at http://ibeetle-base.uni-goettingen.de/.
Assuntos
Bases de Dados Genéticas , Tribolium/genética , Animais , Ontologia Genética , Fenótipo , Interferência de RNA , Interface Usuário-ComputadorRESUMO
TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs), available online at http://tfclass.bioinf.med.uni-goettingen.de. The classification scheme of TFClass was originally derived for human TFs and is expanded here to the whole taxonomic class of mammalia. Combining information from different resources, checking manually the retrieved mammalian TFs sequences and applying extensive phylogenetic analyses, >39 000 TFs from up to 41 mammalian species were assigned to the Superclasses, Classes, Families and Subfamilies of TFClass. As a result, TFClass now provides the corresponding sequence collection in FASTA format, sequence logos and phylogenetic trees at different classification levels, predicted TF binding sites for human, mouse, dog and cow genomes as well as links to several external databases. In particular, all those TFs that are also documented in the TRANSFAC® database (FACTOR table) have been linked and can be freely accessed. TRANSFAC® FACTOR can also be queried through an own search interface.
Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Animais , Sítios de Ligação , Bovinos , Cães , Humanos , Mamíferos , Camundongos , Filogenia , Domínios Proteicos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Interface Usuário-ComputadorRESUMO
TFClass aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. They were assigned to their (sub-)families as instances at two different levels, the corresponding TF genes and individual gene products (protein isoforms). In the present version, all mouse and rat orthologs have been linked to the human TFs, and the mouse orthologs have been arranged in an independent ontology. Many TFs were assigned with typical DNA-binding patterns and positional weight matrices derived from high-throughput in-vitro binding studies. Predicted TF binding sites from human gene upstream sequences are now also attached to each human TF whenever a PWM was available for this factor or one of his paralogs. TFClass is freely available at http://tfclass.bioinf.med.uni-goettingen.de/ through a web interface and for download in OBO format.
Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Animais , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Humanos , Internet , Camundongos , Estrutura Terciária de Proteína , Ratos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismoRESUMO
The iBeetle-Base (http://ibeetle-base.uni-goettingen.de) makes available annotations of RNAi phenotypes, which were gathered in a large scale RNAi screen in the red flour beetle Tribolium castaneum (iBeetle screen). In addition, it provides access to sequence information and links for all Tribolium castaneum genes. The iBeetle-Base contains the annotations of phenotypes of several thousands of genes knocked down during embryonic and metamorphic epidermis and muscle development in addition to phenotypes linked to oogenesis and stink gland biology. The phenotypes are described according to the EQM (entity, quality, modifier) system using controlled vocabularies and the Tribolium morphological ontology (TrOn). Furthermore, images linked to the respective annotations are provided. The data are searchable either for specific phenotypes using a complex 'search for morphological defects' or a 'quick search' for gene names and IDs. The red flour beetle Tribolium castaneum has become an important model system for insect functional genetics and is a representative of the most species rich taxon, the Coleoptera, which comprise several devastating pests. It is used for studying insect typical development, the evolution of development and for research on metabolism and pest control. Besides Drosophila, Tribolium is the first insect model organism where large scale unbiased screens have been performed.
Assuntos
Bases de Dados Genéticas , Genes de Insetos , Interferência de RNA , Tribolium/genética , Animais , Feminino , Internet , Fenótipo , Tribolium/anatomia & histologia , Tribolium/embriologia , Interface Usuário-ComputadorRESUMO
BACKGROUND: Insect pest control is challenged by insecticide resistance and negative impact on ecology and health. One promising pest specific alternative is the generation of transgenic plants, which express double stranded RNAs targeting essential genes of a pest species. Upon feeding, the dsRNA induces gene silencing in the pest resulting in its death. However, the identification of efficient RNAi target genes remains a major challenge as genomic tools and breeding capacity is limited in most pest insects impeding whole-animal-high-throughput-screening. RESULTS: We use the red flour beetle Tribolium castaneum as a screening platform in order to identify the most efficient RNAi target genes. From about 5,000 randomly screened genes of the iBeetle RNAi screen we identify 11 novel and highly efficient RNAi targets. Our data allowed us to determine GO term combinations that are predictive for efficient RNAi target genes with proteasomal genes being most predictive. Finally, we show that RNAi target genes do not appear to act synergistically and that protein sequence conservation does not correlate with the number of potential off target sites. CONCLUSIONS: Our results will aid the identification of RNAi target genes in many pest species by providing a manageable number of excellent candidate genes to be tested and the proteasome as prime target. Further, the identified GO term combinations will help to identify efficient target genes from organ specific transcriptomes. Our off target analysis is relevant for the sequence selection used in transgenic plants.
Assuntos
Genes de Insetos , Controle Biológico de Vetores , Complexo de Endopeptidases do Proteassoma/metabolismo , Interferência de RNA , Tribolium/genética , Animais , Sequência de Bases , Análise por Conglomerados , Sequência Conservada , Ontologia GenéticaRESUMO
TFClass (http://tfclass.bioinf.med.uni-goettingen.de/) provides a comprehensive classification of human transcription factors based on their DNA-binding domains. Transcription factors constitute a large functional family of proteins directly regulating the activity of genes. Most of them are sequence-specific DNA-binding proteins, thus reading out the information encoded in cis-regulatory DNA elements of promoters, enhancers and other regulatory regions of a genome. TFClass is a database that classifies human transcription factors by a six-level classification schema, four of which are abstractions according to different criteria, while the fifth level represents TF genes and the sixth individual gene products. Altogether, nine superclasses have been identified, comprising 40 classes and 111 families. Counted by genes, 1558 human TFs have been classified so far or >2900 different TFs when including their isoforms generated by alternative splicing or protein processing events. With this classification, we hope to provide a basis for deciphering protein-DNA recognition codes; moreover, it can be used for constructing expanded transcriptional networks by inferring additional TF-target gene relations.
Assuntos
Bases de Dados de Proteínas , Fatores de Transcrição/classificação , Proteínas de Ligação a DNA/química , Humanos , Internet , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína , Fatores de Transcrição/químicaRESUMO
BACKGROUND: An increasing human population, the emergence of resistances against pesticides and their potential impact on the environment call for the development of new eco-friendly pest control strategies. RNA interference (RNAi)-based pesticides have emerged as a new option with the first products entering the market. Essentially, double-stranded RNAs targeting essential genes of pests are either expressed in the plants or sprayed on their surface. Upon feeding, pests mount an RNAi response and die. However, it has remained unclear whether RNAi-based insecticides should target the same pathways as classic pesticides or whether the different mode-of-action would favor other processes. Moreover, there is no consensus on the best genes to be targeted. RESULTS: We performed a genome-wide screen in the red flour beetle to identify 905 RNAi target genes. Based on a validation screen and clustering, we identified the 192 most effective target genes in that species. The transfer to oral application in other beetle pests revealed a list of 34 superior target genes, which are an excellent starting point for application in other pests. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analyses of our genome-wide dataset revealed that genes with high efficacy belonged mainly to basic cellular processes such as gene expression and protein homeostasis - processes not targeted by classic insecticides. CONCLUSION: Our work revealed the best target genes and target processes for RNAi-based pest control and we propose a procedure to transfer our short list of superior target genes to other pests. © 2024 The Author(s). Pest Management Science published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.
RESUMO
INTRODUCTION: Whole Exome Sequencing (WES) has emerged as an efficient tool in clinical cancer diagnostics to broaden the scope from panel-based diagnostics to screening of all genes and enabling robust determination of complex biomarkers in a single analysis. METHODS: To assess concordance, six formalin-fixed paraffin-embedded (FFPE) tissue specimens and four commercial reference standards were analyzed by WES as matched tumor-normal DNA at 21 NGS centers in Germany, each employing local wet-lab and bioinformatics. Somatic and germline variants, copy-number alterations (CNAs), and complex biomarkers were investigated. Somatic variant calling was performed in 494 diagnostically relevant cancer genes. The raw data were collected and re-analyzed with a central bioinformatic pipeline to separate wet- and dry-lab variability. RESULTS: The mean positive percentage agreement (PPA) of somatic variant calling was 76 % while the positive predictive value (PPV) was 89 % in relation to a consensus list of variants found by at least five centers. Variant filtering was identified as the main cause for divergent variant calls. Adjusting filter criteria and re-analysis increased the PPA to 88 % for all and 97 % for the clinically relevant variants. CNA calls were concordant for 82 % of genomic regions. Homologous recombination deficiency (HRD), tumor mutational burden (TMB), and microsatellite instability (MSI) status were concordant for 94 %, 93 %, and 93 % of calls, respectively. Variability of CNAs and complex biomarkers did not decrease considerably after harmonization of the bioinformatic processing and was hence attributed mainly to wet-lab differences. CONCLUSION: Continuous optimization of bioinformatic workflows and participating in round robin tests are recommended.
Assuntos
Benchmarking , Variações do Número de Cópias de DNA , Sequenciamento do Exoma , Neoplasias , Medicina de Precisão , Humanos , Sequenciamento do Exoma/métodos , Alemanha , Medicina de Precisão/métodos , Medicina de Precisão/normas , Neoplasias/genética , Biomarcadores Tumorais/genética , Biologia Computacional/métodosRESUMO
NGS is increasingly used in precision medicine, but an automated sequencing pipeline that can detect different types of variants (single nucleotide - SNV, copy number - CNV, structural - SV) and does not rely on normal samples as germline comparison is needed. To address this, we developed Onkopipe, a Snakemake-based pipeline that integrates quality control, read alignments, BAM pre-processing, and variant calling tools to detect SNV, CNV, and SV in a unified VCF format without matched normal samples. Onkopipe is containerized and provides features such as reproducibility, parallelization, and easy customization, enabling the analysis of genomic data in precision medicine. Our validation and evaluation demonstrate high accuracy and concordance, making Onkopipe a valuable open-source resource for molecular tumor boards. Onkopipe is being shared as an open source project and is available at https://gitlab.gwdg.de/MedBioinf/mtb/onkopipe.
Assuntos
DNA , Medicina de Precisão , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Sequência de BasesRESUMO
Formation of the mammalian primitive streak appears to rely on cell proliferation to a minor extent only, but compensating cell movements have not yet been directly observed. This study analyses individual cell migration and proliferation simultaneously, using multiphoton and differential interference contrast time-lapse microscopy of late pregastrulation rabbit blastocysts. Epiblast cells in the posterior gastrula extension area accumulated medially and displayed complex planar movements including U-turns and a novel type of processional cell movement. In the same area metaphase plates tended to be aligned parallel to the anterior-posterior axis, and statistical analysis showed that rotations of metaphase plates causing preferred orientation were near-complete 8 min before anaphase onset; in some cases, rotations were strikingly rapid, achieving up to 45° per min. The mammalian primitive streak appears to be formed initially with its typically minimal anteroposterior elongation by a combination of oriented cell divisions with dedicated planar cell movements.
Assuntos
Divisão Celular/fisiologia , Movimento Celular/fisiologia , Embrião de Mamíferos/citologia , Embrião de Mamíferos/fisiologia , Linha Primitiva/citologia , Animais , Blastocisto/citologia , Blastocisto/fisiologia , Polaridade Celular , Proliferação de Células , Células Cultivadas , Gastrulação , Humanos , Microscopia de Fluorescência por Excitação Multifotônica , Microscopia de Interferência , Linha Primitiva/fisiologia , Coelhos , Imagem com Lapso de Tempo/métodosRESUMO
Next-generation sequencing methods continuously provide clinicians and researchers in precision oncology with growing numbers of genomic variants found in cancer. However, manually interpreting the list of variants to identify reliable targets is an inefficient and cumbersome process that does not scale with the increasing number of cases. Support by computer systems is needed for the analysis of large scale experiments and clinical studies to identify new targets and therapies, and user-friendly applications are needed in molecular tumor boards to support clinicians in their decision-making processes. The MTB-Report tool annotates, filters and sorts genetic variants with information from public databases, providing evidence on actionable variants in both scenarios. A web interface supports medical doctors in the tumor board, and a command line mode allows batch processing of large datasets. The MTB-Report tool is available as an R implementation as well as a Docker image to provide a tool that runs out-of-the-box. Moreover, containerization ensures a stable application that delivers reproducible results over time. A public version of the web interface is available at: http://mtb.bioinf.med.uni-goettingen.de/mtb-report.
Assuntos
Neoplasias , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Oncologia , Neoplasias/genética , Medicina de PrecisãoRESUMO
Untargeted metabolomics is a promising tool for identifying novel disease biomarkers and unraveling underlying pathomechanisms. Nuclear magnetic resonance (NMR) spectroscopy is particularly suited for large-scale untargeted metabolomics studies due to its high reproducibility and cost effectiveness. Here, one-dimensional (1D) 1H NMR experiments offer good sensitivity at reasonable measurement times. Their subsequent data analysis requires sophisticated data preprocessing steps, including the extraction of NMR features corresponding to specific metabolites. We developed a novel 1D NMR feature extraction procedure, called Bucket Fuser (BF), which is based on a regularized regression framework with fused group LASSO terms. The performance of the BF procedure was demonstrated using three independent NMR datasets and was benchmarked against existing state-of-the-art NMR feature extraction methods. BF dynamically constructs NMR metabolite features, the widths of which can be adjusted via a regularization parameter. BF consistently improved metabolite signal extraction, as demonstrated by our correlation analyses with absolutely quantified metabolites. It also yielded a higher proportion of statistically significant metabolite features in our differential metabolite analyses. The BF algorithm is computationally efficient and it can deal with small sample sizes. In summary, the Bucket Fuser algorithm, which is available as a supplementary python code, facilitates the fast and dynamic extraction of 1D NMR signals for the improved detection of metabolic biomarkers.
RESUMO
Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.
RESUMO
Precision oncology utilizing molecular biomarkers for targeted therapies is one of the hopes to treat cancer. The availability of patient specific molecular profiling through next-generation sequencing, though, increases the amount of available data per patient to an extent that computational support is required to identify potential driver alterations for targeted therapies and rational decision-making in molecular tumor boards (MTBs). For some genetic variants evidence-based drug recommendations are available in public databases, but for the majority, the variants of unknown significance (VUS), this clinical information is missing. Additionally, for most of these variants no information about the functional impact on the protein is accessible. To acquire maximal functional evidence for VUS, the VUS-Predict pipeline collects estimations about the effect of a VUS by integrating multiple pre-existing tools. Pre-existing tools implement different approaches for their predictions, which are summarized by our newly developed tool with a common score and classification in neutral or deleterious variants. The primary tools are chosen based on their sensitivity and specificity on well-known variants of the transcription factor TP53. Resulting negative and positive predictive values are used to calibrate the VUS-Predict pipeline. Further, the pipeline is evaluated using data from public cancer databases and cases of the MTB in Göttingen, both also in comparison with the ensemble method REVEL. The results show that VUS-Predict has clear advantages in a clinical setting due to clear and traceable predictions. In particular, VUS outperforms REVEL in the real-life setting of a MTB. Likewise, an evaluation on variants of public cancer databases confirms the good results of VUS-Predict and shows the need for a reliable gold standard and unambiguous results of the tools under test.