RESUMO
Genomics studies routinely confront researchers with long lists of tumor alterations detected in patients. Such lists are difficult to interpret since only a minority of the alterations are relevant biomarkers for diagnosis and for designing therapeutic strategies. PanDrugs is a methodology that facilitates the interpretation of tumor molecular alterations and guides the selection of personalized treatments. To do so, PanDrugs scores gene actionability and drug feasibility to provide a prioritized evidence-based list of drugs. Here, we introduce PanDrugs2, a major upgrade of PanDrugs that, in addition to somatic variant analysis, supports a new integrated multi-omics analysis which simultaneously combines somatic and germline variants, copy number variation and gene expression data. Moreover, PanDrugs2 now considers cancer genetic dependencies to extend tumor vulnerabilities providing therapeutic options for untargetable genes. Importantly, a novel intuitive report to support clinical decision-making is generated. PanDrugs database has been updated, integrating 23 primary sources that support >74K drug-gene associations obtained from 4642 genes and 14 659 unique compounds. The database has also been reimplemented to allow semi-automatic updates to facilitate maintenance and release of future versions. PanDrugs2 does not require login and is freely available at https://www.pandrugs.org/.
Assuntos
Multiômica , Neoplasias , Humanos , Variações do Número de Cópias de DNA , Genômica/métodos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/patologia , Medicina de Precisão/métodosRESUMO
BACKGROUND: The initial version of SEDA assists life science researchers without programming skills with the preparation of DNA and protein sequence FASTA files for multiple bioinformatics applications. However, the initial version of SEDA lacks a command-line interface for more advanced users and does not allow the creation of automated analysis pipelines. RESULTS: The present paper discusses the updates of the new SEDA release, including the addition of a complete command-line interface, new functionalities like gene annotation, a framework for automated pipelines, and improved integration in Linux environments. CONCLUSION: SEDA is an open-source Java application and can be installed using the different distributions available ( https://www.sing-group.org/seda/download.html ) as well as through a Docker image ( https://hub.docker.com/r/pegi3s/seda ). It is released under a GPL-3.0 license, and its source code is publicly accessible on GitHub ( https://github.com/sing-group/seda ). The software version at the time of submission is archived at Zenodo (version v1.6.0, http://doi.org/10.5281/zenodo.10201605 ).
Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Análise de DadosRESUMO
SARS-CoV-2 amino acid variants that contribute to an increased transmissibility or to host immune system escape are likely to increase in frequency due to positive selection and may be identified using different methods, such as codeML, FEL, FUBAR, and MEME. Nevertheless, when using different methods, the results do not always agree. The sampling scheme used in different studies may partially explain the differences that are found, but there is also the possibility that some of the identified positively selected amino acid sites are false positives. This is especially important in the context of very large-scale projects where hundreds of analyses have been performed for the same protein-coding gene. To account for these issues, in this work, we have identified positively selected amino acid sites in SARS-CoV-2 and 15 other coronavirus species, using both codeML and FUBAR, and compared the location of such sites in the different species. Moreover, we also compared our results to those that are available in the COV2Var database and the frequency of the 10 most frequent variants and predicted protein location to identify those sites that are supported by multiple lines of evidence. Amino acid changes observed at these sites should always be of concern. The information reported for SARS-CoV-2 can also be used to identify variants of concern in other coronaviruses.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Aminoácidos/genéticaRESUMO
MOTIVATION: Drug immunomodulation modifies the response of the immune system and can be therapeutically exploited in pathologies such as cancer and autoimmune diseases. RESULTS: DREIMT is a new hypothesis-generation web tool, which performs drug prioritization analysis for immunomodulation. DREIMT provides significant immunomodulatory drugs targeting up to 70 immune cells subtypes through a curated database that integrates 4960 drug profiles and â¼2600 immune gene expression signatures. The tool also suggests potential immunomodulatory drugs targeting user-supplied gene expression signatures. Final output includes drug-signature association scores, FDRs and downloadable plots and results tables. AVAILABILITYAND IMPLEMENTATION: http://www.dreimt.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Reposicionamento de Medicamentos , Transcriptoma , Bases de Dados Factuais , Bases de Dados de Produtos Farmacêuticos , ImunomodulaçãoRESUMO
BACKGROUND: L-ascorbate (Vitamin C) is an important antioxidant and co-factor in eukaryotic cells, and in mammals it is indispensable for brain development and cognitive function. Vertebrates usually become L-ascorbate auxothrophs when the last enzyme of the synthetic pathway, an L-gulonolactone oxidase (GULO), is lost. Since Protostomes were until recently thought not to have a GULO gene, they were considered to be auxothrophs for Vitamin C. RESULTS: By performing phylogenetic analyses with tens of non-Bilateria and Protostomian genomes, it is shown, that a GULO gene is present in the non-Bilateria Placozoa, Myxozoa (here reported for the first time) and Anthozoa groups, and in Protostomians, in the Araneae family, the Gastropoda class, the Acari subclass (here reported for the first time), and the Priapulida, Annelida (here reported for the first time) and Brachiopoda phyla lineages. GULO is an old gene that predates the separation of Animals and Fungi, although it could be much older. We also show that within Protostomes, GULO has been lost multiple times in large taxonomic groups, namely the Pancrustacea, Nematoda, Platyhelminthes and Bivalvia groups, a pattern similar to that reported for Vertebrate species. Nevertheless, we show that Drosophila melanogaster seems to be capable of synthesizing L-ascorbate, likely through an alternative pathway, as recently reported for Caenorhabditis elegans. CONCLUSIONS: Non-Bilaterian and Protostomians seem to be able to synthesize Vitamin C either through the conventional animal pathway or an alternative pathway, but in this animal group, not being able to synthesize L-ascorbate seems to be the exception rather than the rule.
Assuntos
Ácido Ascórbico/metabolismo , Eucariotos/enzimologia , Eucariotos/genética , Evolução Molecular , L-Gulonolactona Oxidase/genética , Animais , Drosophila melanogaster/genética , Eucariotos/classificação , Eucariotos/metabolismo , Genoma , L-Gulonolactona Oxidase/química , L-Gulonolactona Oxidase/metabolismo , Modelos Moleculares , Filogenia , Vertebrados/classificação , Vertebrados/genéticaRESUMO
Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
Assuntos
Internet , Sistemas de Gerenciamento de Base de Dados , Interface Usuário-ComputadorRESUMO
BACKGROUND: The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. RESULTS: geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. CONCLUSIONS: geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Internet , Software , Doença/genética , Humanos , Interface Usuário-ComputadorRESUMO
The integration of ultrasound (US)-assisted sample processing on-chip in a lab-on-a-valve (LOV) format for automated high-throughput shotgun proteomic assays is herein presented for the first time. The proof of concept of this system was demonstrated with the analysis of three proteins and sera from patients with lymphoma or myeloma.
Assuntos
Biomarcadores Tumorais/análise , Espectrometria de Massas/métodos , Procedimentos Analíticos em Microchip/métodos , Técnicas Analíticas Microfluídicas/métodos , Desnaturação Proteica , HumanosRESUMO
The vast amount of genome sequence data that is available, and that is predicted to drastically increase in the near future, can only be efficiently dealt with by building automated pipelines. Indeed, the Earth Biogenome Project will produce high-quality reference genome sequences for all 1.8 million named living eukaryote species, providing unprecedented insight into the evolution of genes and gene families, and thus on biological issues. Here, new modules for gene annotation, further BLAST search algorithms, further multiple sequence alignment methods, the adding of reference sequences, further tree rooting methods, the estimation of rates of synonymous and nonsynonymous substitutions, and the identification of positively selected amino acid sites, have been added to auto-phylo (version 2), a recently developed software to address biological problems using phylogenetic inferences. Additionally, we present auto-phylo-pipeliner, a graphical user interface application that further facilitates the creation and running of auto-phylo pipelines. Inferences on S-RNase specificity, are critical for both cross-based breeding and for the establishment of pollination requirements. Therefore, as a test case, we develop an auto-phylo pipeline to identify amino acid sites under positive selection, that are, in principle, those determining S-RNase specificity, starting from both non-annotated Prunus genomes and sequences available in public databases.
Assuntos
Filogenia , Software , Algoritmos , Alinhamento de Sequência , Seleção Genética , Aminoácidos/genética , Aminoácidos/químicaRESUMO
Background: PolyDeep is a computer-aided detection and classification (CADe/x) system trained to detect and classify polyps. During colonoscopy, CADe/x systems help endoscopists to predict the histology of colonic lesions. Objective: To compare the diagnostic performance of PolyDeep and expert endoscopists for the optical diagnosis of colorectal polyps on still images. Methods: PolyDeep Image Classification (PIC) is an in vitro diagnostic test study. The PIC database contains NBI images of 491 colorectal polyps with histological diagnosis. We evaluated the diagnostic performance of PolyDeep and four expert endoscopists for neoplasia (adenoma, sessile serrated lesion, traditional serrated adenoma) and adenoma characterization and compared them with the McNemar test. Receiver operating characteristic curves were constructed to assess the overall discriminatory ability, comparing the area under the curve of endoscopists and PolyDeep with the chi- square homogeneity areas test. Results: The diagnostic performance of the endoscopists and PolyDeep in the characterization of neoplasia is similar in terms of sensitivity (PolyDeep: 89.05%; E1: 91.23%, p=0.5; E2: 96.11%, p<0.001; E3: 86.65%, p=0.3; E4: 91.26% p=0.3) and specificity (PolyDeep: 35.53%; E1: 33.80%, p=0.8; E2: 34.72%, p=1; E3: 39.24%, p=0.8; E4: 46.84%, p=0.2). The overall discriminative ability also showed no statistically significant differences (PolyDeep: 0.623; E1: 0.625, p=0.8; E2: 0.654, p=0.2; E3: 0.629, p=0.9; E4: 0.690, p=0.09). In the optical diagnosis of adenomatous polyps, we found that PolyDeep had a significantly higher sensitivity and a significantly lower specificity. The overall discriminative ability of adenomatous lesions by expert endoscopists is significantly higher than PolyDeep (PolyDeep: 0.582; E1: 0.685, p < 0.001; E2: 0.677, p < 0.0001; E3: 0.658, p < 0.01; E4: 0.694, p < 0.0001). Conclusion: PolyDeep and endoscopists have similar diagnostic performance in the optical diagnosis of neoplastic lesions. However, endoscopists have a better global discriminatory ability than PolyDeep in the optical diagnosis of adenomatous polyps.
RESUMO
S-RNase-based gametophytic self-incompatibility evolved once before the split of the Asteridae and Rosidae. In Prunus (tribe Amygdaloideae of Rosaceae), the self-incompatibility S-pollen is a single F-box gene that presents the expected evolutionary signatures. In Malus and Pyrus (subtribe Pyrinae of Rosaceae), however, clusters of F-box genes (called SFBBs) have been described that are expressed in pollen only and are linked to the S-RNase gene. Although polymorphic, SFBB genes present levels of diversity lower than those of the S-RNase gene. They have been suggested as putative S-pollen genes, in a system of non-self recognition by multiple factors. Subsets of allelic products of the different SFBB genes interact with non-self S-RNases, marking them for degradation, and allowing compatible pollinations. This study performed a detailed characterization of SFBB genes in Sorbus aucuparia (Pyrinae) to address three predictions of the non-self recognition by multiple factors model. As predicted, the number of SFBB genes was large to account for the many S-RNase specificities. Secondly, like the S-RNase gene, the SFBB genes were old. Thirdly, amino acids under positive selection-those that could be involved in specificity determination-were identified when intra-haplotype SFBB genes were analysed using codon models. Overall, the findings reported here support the non-self recognition by multiple factors model.
Assuntos
Genes de Plantas/genética , Pólen/genética , Autoincompatibilidade em Angiospermas/genética , Sorbus/fisiologia , Sequência de Bases , Evolução Biológica , Genes de Plantas/fisiologia , Haplótipos/genética , Modelos Genéticos , Dados de Sequência Molecular , Filogenia , Pólen/fisiologia , Autoincompatibilidade em Angiospermas/fisiologia , Sorbus/genéticaRESUMO
In this work we have critically revised and updated the literature dealing with wine quality control based on protein or peptide mass spectrometry-based fingerprinting. A number of pitfalls in the experimental design of most work dealing with this subject are highlighted along with recommendations on how to circumvent them. As a general trend, the conclusions reported to date in the literature of the topic are inconclusive mainly due to the (i) low number of representative samples, (ii) lack of basic analytical concepts, and (iii) lack of adequate statistical and software tools. In addition, we have critically revised the sample treatments commonly used to separate proteins from wines, emphasizing that the majority of literature is devoted to white wines, probably because of difficulties in isolating the protein content in red wines.
Assuntos
Mapeamento de Peptídeos/métodos , Proteínas/análise , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Vinho/análise , Proteínas/química , Controle de QualidadeRESUMO
Next-generation sequencing (NGS) technologies are making sequence data available on an unprecedented scale. In this context, new catalogs of Single Nucleotide Polymorphism and mutations generated by resequencing studies are usually stored in genome position files (e.g. Variant Call Format, SAMTools pileup, BED, GFF) comprising of large lists of genomic positions, which are difficult to handle by researchers. Here, we present PileLineGUI, a novel desktop application primarily designed for manipulating, browsing and analysing genome position files (GPF), with specific support to somatic mutation finding studies. The developed tool also integrates a new genome browser module specially designed for inspecting GPFs. PileLineGUI is free, multiplatform and designed to be intuitively used by biomedical researchers. PileLineGUI is available at: http://sing.ei.uvigo.es/pileline/pilelinegui.html.
Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Mutação , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Deep learning object-detection models are being successfully applied to develop computer-aided diagnosis systems for aiding polyp detection during colonoscopies. Here, we evidence the need to include negative samples for both (i) reducing false positives during the polyp-finding phase, by including images with artifacts that may confuse the detection models (e.g., medical instruments, water jets, feces, blood, excessive proximity of the camera to the colon wall, blurred images, etc.) that are usually not included in model development datasets, and (ii) correctly estimating a more realistic performance of the models. By retraining our previously developed YOLOv3-based detection model with a dataset that includes 15% of additional not-polyp images with a variety of artifacts, we were able to generally improve its F1 performance in our internal test datasets (from an average F1 of 0.869 to 0.893), which now include such type of images, as well as in four public datasets that include not-polyp images (from an average F1 of 0.695 to 0.722).
RESUMO
EvoPPI (http://evoppi.i3s.up.pt), a meta-database for protein-protein interactions (PPI), has been upgraded (EvoPPI3) to accept new types of data, namely, PPI from patients, cell lines, and animal models, as well as data from gene modifier experiments, for nine neurodegenerative polyglutamine (polyQ) diseases caused by an abnormal expansion of the polyQ tract. The integration of the different types of data allows users to easily compare them, as here shown for Ataxin-1, the polyQ protein involved in spinocerebellar ataxia type 1 (SCA1) disease. Using all available datasets and the data here obtained for Drosophila melanogaster wt and exp Ataxin-1 mutants (also available at EvoPPI3), we show that, in humans, the Ataxin-1 network is much larger than previously thought (380 interactors), with at least 909 interactors. The functional profiling of the newly identified interactors is similar to the ones already reported in the main PPI databases. 16 out of 909 interactors are putative novel SCA1 therapeutic targets, and all but one are already being studied in the context of this disease. The 16 proteins are mainly involved in binding and catalytic activity (mainly kinase activity), functional features already thought to be important in the SCA1 disease.
Assuntos
Drosophila melanogaster , Ataxias Espinocerebelares , Animais , Humanos , Ataxina-1/genética , Ataxina-1/metabolismo , Drosophila melanogaster/genética , Ataxias Espinocerebelares/genética , Ataxias Espinocerebelares/metabolismoRESUMO
ALTER is an open web-based tool to transform between different multiple sequence alignment formats. The originality of ALTER lies in the fact that it focuses on the specifications of mainstream alignment and analysis programs rather than on the conversion among more or less specific formats. In addition, ALTER is capable of identify and remove identical sequences during the transformation process. Besides its user-friendly environment, ALTER allows access to its functionalities in a programmatic way through a Representational State Transfer web service. ALTER's front-end and its API are freely available at http://sing.ei.uvigo.es/ALTER/ and http://sing.ei.uvigo.es/ALTER/api/, respectively.
Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Análise de Sequência de Proteína , Software , InternetRESUMO
Colorectal cancer is one of the most frequent malignancies. Colonoscopy is the de facto standard for precancerous lesion detection in the colon, i.e., polyps, during screening studies or after facultative recommendation. In recent years, artificial intelligence, and especially deep learning techniques such as convolutional neural networks, have been applied to polyp detection and localization in order to develop real-time CADe systems. However, the performance of machine learning models is very sensitive to changes in the nature of the testing instances, especially when trying to reproduce results for totally different datasets to those used for model development, i.e., inter-dataset testing. Here, we report the results of testing of our previously published polyp detection model using ten public colonoscopy image datasets and analyze them in the context of the results of other 20 state-of-the-art publications using the same datasets. The F1-score of our recently published model was 0.88 when evaluated on a private test partition, i.e., intra-dataset testing, but it decayed, on average, by 13.65% when tested on ten public datasets. In the published research, the average intra-dataset F1-score is 0.91, and we observed that it also decays in the inter-dataset setting to an average F1-score of 0.83.
RESUMO
SEDA (SEquence DAtaset builder) is a multiplatform desktop application for the manipulation of FASTA files containing DNA or protein sequences. The convenient graphical user interface gives access to a collection of simple (filtering, sorting, or file reformatting, among others) and advanced (BLAST searching, protein domain annotation, gene annotation, and sequence alignment) utilities not present in similar applications, which eases the work of life science researchers working with DNA and/or protein sequences, especially those who have no programming skills. This paper presents general guidelines on how to build efficient data handling protocols using SEDA, as well as practical examples on how to prepare high-quality datasets for single gene phylogenetic studies, the characterization of protein families, or phylogenomic studies. The user-friendliness of SEDA also relies on two important features: (i) the availability of easy-to-install distributable versions and installers of SEDA, including a Docker image for Linux, and (ii) the facility with which users can manage large datasets. SEDA is open-source, with GNU General Public License v3.0 license, and publicly available at GitHub (https://github.com/sing-group/seda). SEDA installers and documentation are available at https://www.sing-group.org/seda/.
Assuntos
Proteínas , Software , Sequência de Aminoácidos , Filogenia , Alinhamento de SequênciaRESUMO
The use of ultrasonic probe, in conjunction with immobilized trypsin, has been explored in this work for potential enhancement of protein digestion. Several solid supports commonly used to immobilize trypsin were subjected to different ultrasonication amplitudes and time in order to investigate their mechanical resistance to ultrasonic energy when provided by the ultrasonic probe. Glass beads and magnetic particles were found to remain intact in most conditions studied. It was found that immobilized trypsin cannot be reused after ultrasonication since the enzymatic activity was greatly diminished. For comparative purposes, vortex shaking was also explored for protein cleavage. Four standard proteins--bovine serum albumin, α-lactalbumin, carbonic anhydrase and ovalbumin--were successfully identified using peptide mass fingerprint, or peptide fragment fingerprint. In addition, the performance of the classical protein cleavage (overnight, 12 h) and the ultrasonic methods was found to be similar when the digestion of a complex proteome, human plasma, was assessed through 18-O quantification. The digestion yields found were 90-117% for the ultrasonic and 5-21% for the vortex when those methods were compared with the classical overnight digestion.
Assuntos
Enzimas Imobilizadas/metabolismo , Proteínas/análise , Proteínas/metabolismo , Tripsina/metabolismo , Ultrassom/instrumentação , Animais , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/metabolismo , Bovinos , Desenho de Equipamento , Humanos , Lactalbumina/análise , Lactalbumina/metabolismo , Fragmentos de Peptídeos/análise , Fragmentos de Peptídeos/metabolismo , Soroalbumina Bovina/análise , Soroalbumina Bovina/metabolismo , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodosRESUMO
BACKGROUND: Genomic position (GP) files currently used in next-generation sequencing (NGS) studies are always difficult to manipulate due to their huge size and the lack of appropriate tools to properly manage them. The structure of these flat files is based on representing one line per position that has been covered by at least one aligned read, imposing significant restrictions from a computational performance perspective. RESULTS: PileLine implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of GP files produced by NGS experiments. PileLine tools are coded in Java and run on both UNIX (Linux, Mac OS) and Windows platforms. The set of tools comprising PileLine are designed to be memory efficient by performing fast seek on-disk operations over sorted GP files. CONCLUSIONS: Our novel toolbox has been extensively tested taking into consideration performance issues. It is publicly available at http://sourceforge.net/projects/pilelinetools under the GNU LGPL license. Full documentation including common use cases and guided analysis workflows is available at http://sing.ei.uvigo.es/pileline.