Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Front Oncol ; 14: 1393815, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38846970

RESUMO

Background: PolyDeep is a computer-aided detection and classification (CADe/x) system trained to detect and classify polyps. During colonoscopy, CADe/x systems help endoscopists to predict the histology of colonic lesions. Objective: To compare the diagnostic performance of PolyDeep and expert endoscopists for the optical diagnosis of colorectal polyps on still images. Methods: PolyDeep Image Classification (PIC) is an in vitro diagnostic test study. The PIC database contains NBI images of 491 colorectal polyps with histological diagnosis. We evaluated the diagnostic performance of PolyDeep and four expert endoscopists for neoplasia (adenoma, sessile serrated lesion, traditional serrated adenoma) and adenoma characterization and compared them with the McNemar test. Receiver operating characteristic curves were constructed to assess the overall discriminatory ability, comparing the area under the curve of endoscopists and PolyDeep with the chi- square homogeneity areas test. Results: The diagnostic performance of the endoscopists and PolyDeep in the characterization of neoplasia is similar in terms of sensitivity (PolyDeep: 89.05%; E1: 91.23%, p=0.5; E2: 96.11%, p<0.001; E3: 86.65%, p=0.3; E4: 91.26% p=0.3) and specificity (PolyDeep: 35.53%; E1: 33.80%, p=0.8; E2: 34.72%, p=1; E3: 39.24%, p=0.8; E4: 46.84%, p=0.2). The overall discriminative ability also showed no statistically significant differences (PolyDeep: 0.623; E1: 0.625, p=0.8; E2: 0.654, p=0.2; E3: 0.629, p=0.9; E4: 0.690, p=0.09). In the optical diagnosis of adenomatous polyps, we found that PolyDeep had a significantly higher sensitivity and a significantly lower specificity. The overall discriminative ability of adenomatous lesions by expert endoscopists is significantly higher than PolyDeep (PolyDeep: 0.582; E1: 0.685, p < 0.001; E2: 0.677, p < 0.0001; E3: 0.658, p < 0.01; E4: 0.694, p < 0.0001). Conclusion: PolyDeep and endoscopists have similar diagnostic performance in the optical diagnosis of neoplastic lesions. However, endoscopists have a better global discriminatory ability than PolyDeep in the optical diagnosis of adenomatous polyps.

2.
BMC Bioinformatics ; 25(1): 200, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38802733

RESUMO

BACKGROUND: The initial version of SEDA assists life science researchers without programming skills with the preparation of DNA and protein sequence FASTA files for multiple bioinformatics applications. However, the initial version of SEDA lacks a command-line interface for more advanced users and does not allow the creation of automated analysis pipelines. RESULTS: The present paper discusses the updates of the new SEDA release, including the addition of a complete command-line interface, new functionalities like gene annotation, a framework for automated pipelines, and improved integration in Linux environments. CONCLUSION: SEDA is an open-source Java application and can be installed using the different distributions available ( https://www.sing-group.org/seda/download.html ) as well as through a Docker image ( https://hub.docker.com/r/pegi3s/seda ). It is released under a GPL-3.0 license, and its source code is publicly accessible on GitHub ( https://github.com/sing-group/seda ). The software version at the time of submission is archived at Zenodo (version v1.6.0, http://doi.org/10.5281/zenodo.10201605 ).


Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , Análise de Dados
3.
J Integr Bioinform ; 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38529929

RESUMO

The vast amount of genome sequence data that is available, and that is predicted to drastically increase in the near future, can only be efficiently dealt with by building automated pipelines. Indeed, the Earth Biogenome Project will produce high-quality reference genome sequences for all 1.8 million named living eukaryote species, providing unprecedented insight into the evolution of genes and gene families, and thus on biological issues. Here, new modules for gene annotation, further BLAST search algorithms, further multiple sequence alignment methods, the adding of reference sequences, further tree rooting methods, the estimation of rates of synonymous and nonsynonymous substitutions, and the identification of positively selected amino acid sites, have been added to auto-phylo (version 2), a recently developed software to address biological problems using phylogenetic inferences. Additionally, we present auto-phylo-pipeliner, a graphical user interface application that further facilitates the creation and running of auto-phylo pipelines. Inferences on S-RNase specificity, are critical for both cross-based breeding and for the establishment of pollination requirements. Therefore, as a test case, we develop an auto-phylo pipeline to identify amino acid sites under positive selection, that are, in principle, those determining S-RNase specificity, starting from both non-annotated Prunus genomes and sequences available in public databases.

4.
Int J Mol Sci ; 25(4)2024 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-38397104

RESUMO

SARS-CoV-2 amino acid variants that contribute to an increased transmissibility or to host immune system escape are likely to increase in frequency due to positive selection and may be identified using different methods, such as codeML, FEL, FUBAR, and MEME. Nevertheless, when using different methods, the results do not always agree. The sampling scheme used in different studies may partially explain the differences that are found, but there is also the possibility that some of the identified positively selected amino acid sites are false positives. This is especially important in the context of very large-scale projects where hundreds of analyses have been performed for the same protein-coding gene. To account for these issues, in this work, we have identified positively selected amino acid sites in SARS-CoV-2 and 15 other coronavirus species, using both codeML and FUBAR, and compared the location of such sites in the different species. Moreover, we also compared our results to those that are available in the COV2Var database and the frequency of the 10 most frequent variants and predicted protein location to identify those sites that are supported by multiple lines of evidence. Amino acid changes observed at these sites should always be of concern. The information reported for SARS-CoV-2 can also be used to identify variants of concern in other coronaviruses.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Aminoácidos/genética
5.
Nucleic Acids Res ; 51(W1): W411-W418, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37207338

RESUMO

Genomics studies routinely confront researchers with long lists of tumor alterations detected in patients. Such lists are difficult to interpret since only a minority of the alterations are relevant biomarkers for diagnosis and for designing therapeutic strategies. PanDrugs is a methodology that facilitates the interpretation of tumor molecular alterations and guides the selection of personalized treatments. To do so, PanDrugs scores gene actionability and drug feasibility to provide a prioritized evidence-based list of drugs. Here, we introduce PanDrugs2, a major upgrade of PanDrugs that, in addition to somatic variant analysis, supports a new integrated multi-omics analysis which simultaneously combines somatic and germline variants, copy number variation and gene expression data. Moreover, PanDrugs2 now considers cancer genetic dependencies to extend tumor vulnerabilities providing therapeutic options for untargetable genes. Importantly, a novel intuitive report to support clinical decision-making is generated. PanDrugs database has been updated, integrating 23 primary sources that support >74K drug-gene associations obtained from 4642 genes and 14 659 unique compounds. The database has also been reimplemented to allow semi-automatic updates to facilitate maintenance and release of future versions. PanDrugs2 does not require login and is freely available at https://www.pandrugs.org/.


Assuntos
Multiômica , Neoplasias , Humanos , Variações do Número de Cópias de DNA , Genômica/métodos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/patologia , Medicina de Precisão/métodos
6.
Diagnostics (Basel) ; 13(5)2023 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-36900110

RESUMO

Deep learning object-detection models are being successfully applied to develop computer-aided diagnosis systems for aiding polyp detection during colonoscopies. Here, we evidence the need to include negative samples for both (i) reducing false positives during the polyp-finding phase, by including images with artifacts that may confuse the detection models (e.g., medical instruments, water jets, feces, blood, excessive proximity of the camera to the colon wall, blurred images, etc.) that are usually not included in model development datasets, and (ii) correctly estimating a more realistic performance of the models. By retraining our previously developed YOLOv3-based detection model with a dataset that includes 15% of additional not-polyp images with a variety of artifacts, we were able to generally improve its F1 performance in our internal test datasets (from an average F1 of 0.869 to 0.893), which now include such type of images, as well as in four public datasets that include not-polyp images (from an average F1 of 0.695 to 0.722).

7.
J Integr Bioinform ; 20(2)2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-36848492

RESUMO

EvoPPI (http://evoppi.i3s.up.pt), a meta-database for protein-protein interactions (PPI), has been upgraded (EvoPPI3) to accept new types of data, namely, PPI from patients, cell lines, and animal models, as well as data from gene modifier experiments, for nine neurodegenerative polyglutamine (polyQ) diseases caused by an abnormal expansion of the polyQ tract. The integration of the different types of data allows users to easily compare them, as here shown for Ataxin-1, the polyQ protein involved in spinocerebellar ataxia type 1 (SCA1) disease. Using all available datasets and the data here obtained for Drosophila melanogaster wt and exp Ataxin-1 mutants (also available at EvoPPI3), we show that, in humans, the Ataxin-1 network is much larger than previously thought (380 interactors), with at least 909 interactors. The functional profiling of the newly identified interactors is similar to the ones already reported in the main PPI databases. 16 out of 909 interactors are putative novel SCA1 therapeutic targets, and all but one are already being studied in the context of this disease. The 16 proteins are mainly involved in binding and catalytic activity (mainly kinase activity), functional features already thought to be important in the SCA1 disease.


Assuntos
Drosophila melanogaster , Ataxias Espinocerebelares , Animais , Humanos , Ataxina-1/genética , Ataxina-1/metabolismo , Drosophila melanogaster/genética , Ataxias Espinocerebelares/genética , Ataxias Espinocerebelares/metabolismo
8.
Diagnostics (Basel) ; 12(4)2022 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-35453946

RESUMO

Colorectal cancer is one of the most frequent malignancies. Colonoscopy is the de facto standard for precancerous lesion detection in the colon, i.e., polyps, during screening studies or after facultative recommendation. In recent years, artificial intelligence, and especially deep learning techniques such as convolutional neural networks, have been applied to polyp detection and localization in order to develop real-time CADe systems. However, the performance of machine learning models is very sensitive to changes in the nature of the testing instances, especially when trying to reproduce results for totally different datasets to those used for model development, i.e., inter-dataset testing. Here, we report the results of testing of our previously published polyp detection model using ten public colonoscopy image datasets and analyze them in the context of the results of other 20 state-of-the-art publications using the same datasets. The F1-score of our recently published model was 0.88 when evaluated on a private test partition, i.e., intra-dataset testing, but it decayed, on average, by 13.65% when tested on ten public datasets. In the published research, the average intra-dataset F1-score is 0.91, and we observed that it also decays in the inter-dataset setting to an average F1-score of 0.83.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1850-1860, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33237866

RESUMO

SEDA (SEquence DAtaset builder) is a multiplatform desktop application for the manipulation of FASTA files containing DNA or protein sequences. The convenient graphical user interface gives access to a collection of simple (filtering, sorting, or file reformatting, among others) and advanced (BLAST searching, protein domain annotation, gene annotation, and sequence alignment) utilities not present in similar applications, which eases the work of life science researchers working with DNA and/or protein sequences, especially those who have no programming skills. This paper presents general guidelines on how to build efficient data handling protocols using SEDA, as well as practical examples on how to prepare high-quality datasets for single gene phylogenetic studies, the characterization of protein families, or phylogenomic studies. The user-friendliness of SEDA also relies on two important features: (i) the availability of easy-to-install distributable versions and installers of SEDA, including a Docker image for Linux, and (ii) the facility with which users can manage large datasets. SEDA is open-source, with GNU General Public License v3.0 license, and publicly available at GitHub (https://github.com/sing-group/seda). SEDA installers and documentation are available at https://www.sing-group.org/seda/.


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Filogenia , Alinhamento de Sequência
10.
PeerJ Comput Sci ; 7: e593, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34239974

RESUMO

Compi is an application framework to develop end-user, pipeline-based applications with a primary emphasis on: (i) user interface generation, by automatically generating a command-line interface based on the pipeline specific parameter definitions; (ii) application packaging, with compi-dk, which is a version-control-friendly tool to package the pipeline application and its dependencies into a Docker image; and (iii) application distribution provided through a public repository of Compi pipelines, named Compi Hub, which allows users to discover, browse and reuse them easily. By addressing these three aspects, Compi goes beyond traditional workflow engines, having been specially designed for researchers who want to take advantage of common workflow engine features (such as automatic job scheduling or logging, among others) while keeping the simplicity and readability of shell scripts without the need to learn a new programming language. Here we discuss the design of various pipelines developed with Compi to describe its main functionalities, as well as to highlight the similarities and differences with similar tools that are available. An open-source distribution under the Apache 2.0 License is available from GitHub (available at https://github.com/sing-group/compi). Documentation and installers are available from https://www.sing-group.org/compi. A specific repository for Compi pipelines is available from Compi Hub (available at https://www.sing-group.org/compihub.

11.
Interdiscip Sci ; 13(2): 334-343, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34009546

RESUMO

The identification of clinically relevant bacterial amino acid changes can be performed using different methods aimed at the identification of genes showing positively selected amino acid sites (PSS). Nevertheless, such analyses are time consuming, and the frequency of genes showing evidence for PSS can be low. Therefore, the development of a pipeline that allows the quick and efficient identification of the set of genes that show PSS is of interest. Here, we present Auto-PSS-Genome, a Compi-based pipeline distributed as a Docker image, that automates the process of identifying genes that show PSS using three different methods, namely codeML, FUBAR, and omegaMap. Auto-PSS-Genome accepts as input a set of FASTA files, one per genome, containing all coding sequences, thus minimizing the work needed to conduct positively selected sites analyses. The Auto-PSS-Genome pipeline identifies orthologous gene sets and corrects for multiple possible problems in input FASTA files that may prevent the automated identification of genes showing PSS. A FASTA file containing all coding sequences can also be given as an external global reference, thus easing the comparison of results across species, when gene names are different. In this work, we use Auto-PSS-Genome to analyse Mycobacterium leprae (that causes leprosy), and the closely related species M. haemophilum, that mainly causes ulcerating skin infections and arthritis in persons who are severely immunocompromised, and in children causes cervical and perihilar lymphadenitis. The genes identified in these two species as showing PSS may be those that are partially responsible for virulence and resistance to drugs.


Assuntos
Aminoácidos/química , Bactérias , Criança , Genoma Bacteriano , Humanos , Mycobacterium leprae/genética , Virulência
12.
Bioinformatics ; 37(4): 578-579, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32818254

RESUMO

MOTIVATION: Drug immunomodulation modifies the response of the immune system and can be therapeutically exploited in pathologies such as cancer and autoimmune diseases. RESULTS: DREIMT is a new hypothesis-generation web tool, which performs drug prioritization analysis for immunomodulation. DREIMT provides significant immunomodulatory drugs targeting up to 70 immune cells subtypes through a curated database that integrates 4960 drug profiles and ∼2600 immune gene expression signatures. The tool also suggests potential immunomodulatory drugs targeting user-supplied gene expression signatures. Final output includes drug-signature association scores, FDRs and downloadable plots and results tables. AVAILABILITYAND IMPLEMENTATION: http://www.dreimt.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Reposicionamento de Medicamentos , Transcriptoma , Bases de Dados Factuais , Bases de Dados de Produtos Farmacêuticos , Imunomodulação
13.
Front Immunol ; 11: 1470, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32760401

RESUMO

A better understanding of the response against Tuberculosis (TB) infection is required to accurately identify the individuals with an active or a latent TB infection (LTBI) and also those LTBI patients at higher risk of developing active TB. In this work, we have used the information obtained from studying the gene expression profile of active TB patients and their infected -LTBI- or uninfected -NoTBI- contacts, recruited in Spain and Mozambique, to build a class-prediction model that identifies individuals with a TB infection profile. Following this approach, we have identified several genes and metabolic pathways that provide important information of the immune mechanisms triggered against TB infection. As a novelty of our work, a combination of this class-prediction model and the direct measurement of different immunological parameters, was used to identify a subset of LTBI contacts (called TB-like) whose transcriptional and immunological profiles are suggestive of infection with a higher probability of developing active TB. Validation of this novel approach to identifying LTBI individuals with the highest risk of active TB disease merits further longitudinal studies on larger cohorts in TB endemic areas.


Assuntos
Tuberculose Latente/diagnóstico , Modelos Imunológicos , Análise de Sequência de RNA/métodos , Linfócitos T/imunologia , Tuberculose/diagnóstico , Doença Aguda , Adulto , Idoso , Células Cultivadas , Progressão da Doença , Feminino , Humanos , Interferon gama/metabolismo , Tuberculose Latente/genética , Tuberculose Latente/imunologia , Ativação Linfocitária , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Tuberculose/genética , Tuberculose/imunologia
14.
BMC Med Genomics ; 12(1): 145, 2019 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-31655597

RESUMO

BACKGROUND: Wild-type (wt) polyglutamine (polyQ) regions are implicated in stabilization of protein-protein interactions (PPI). Pathological polyQ expansion, such as that in human Ataxin-1 (ATXN1), that causes spinocerebellar ataxia type 1 (SCA1), results in abnormal PPI. For ATXN1 a larger number of interactors has been reported for the expanded (82Q) than the wt (29Q) protein. METHODS: To understand how the expanded polyQ affects PPI, protein structures were predicted for wt and expanded ATXN1, as well as, for 71 ATXN1 interactors. Then, the binding surfaces of wt and expanded ATXN1 with the reported interactors were inferred. RESULTS: Our data supports that the polyQ expansion alters the ATXN1 conformation and that it enhances the strength of interaction with ATXN1 partners. For both ATXN1 variants, the number of residues at the predicted binding interface are greater after the polyQ, mainly due to the AXH domain. Moreover, the difference in the interaction strength of the ATXN1 variants was due to an increase in the number of interactions at the N-terminal region, before the polyQ, for the expanded form. CONCLUSIONS: There are three regions at the AXH domain that are essential for ATXN1 PPI. The N-terminal region is responsible for the strength of the PPI with the ATXN1 variants. How the predicted motifs in this region affect PPI is discussed, in the context of ATXN1 post-transcriptional modifications.


Assuntos
Ataxina-1/metabolismo , Motivos de Aminoácidos , Animais , Ataxina-1/química , Ataxina-1/genética , Sítios de Ligação , Humanos , Simulação de Acoplamento Molecular , Peptídeos/metabolismo , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Estrutura Terciária de Proteína , Ataxias Espinocerebelares/genética , Ataxias Espinocerebelares/patologia
15.
Front Plant Sci ; 10: 879, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31379893

RESUMO

Non-self gametophytic self-incompatibility (GSI) recognition system is characterized by the presence of multiple F-box genes tandemly located in the S-locus, that regulate pollen specificity. This reproductive barrier is present in Solanaceae, Plantaginacea and Maleae (Rosaceae), but only in Petunia functional assays have been performed to get insight on how this recognition mechanism works. In this system, each of the encoded S-pollen proteins (called SLFs in Solanaceae and Plantaginaceae /SFBBs in Maleae) recognizes and interacts with a sub-set of non-self S-pistil proteins, called S-RNases, mediating their ubiquitination and degradation. In Petunia there are 17 SLF genes per S-haplotype, making impossible to determine experimentally each SLF specificity. Moreover, domain -swapping experiments are unlikely to be performed in large scale to determine S-pollen and S-pistil specificities. Phylogenetic analyses of the Petunia SLFs and those from two Solanum genomes, suggest that diversification of SLFs predate the two genera separation. Here we first identify putative SLF genes from nine Solanum and 10 Nicotiana genomes to determine how many gene lineages are present in the three genera, and the rate of origin of new SLF gene lineages. The use of multiple genomes per genera precludes the effect of incompleteness of the genome at the S-locus. The similar number of gene lineages in the three genera implies a comparable effective population size for these species, and number of specificities. The rate of origin of new specificities is one per 10 million years. Moreover, here we determine the amino acids positions under positive selection, those involved in SLF specificity recognition, using 10 Petunia S-haplotypes with more than 11 SLF genes. These 16 amino acid positions account for the differences of self-incompatible (SI) behavior described in the literature. When SLF and S-RNase proteins are divided according to the SI behavior, and the positively selected amino acids classified according to hydrophobicity, charge, polarity and size, we identified fixed differences between SI groups. According to the in silico 3D structure of the two proteins these amino acid positions interact. Therefore, this methodology can be used to infer SLF/S-RNase specificity recognition.

16.
BMC Evol Biol ; 19(1): 126, 2019 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-31215418

RESUMO

BACKGROUND: L-ascorbate (Vitamin C) is an important antioxidant and co-factor in eukaryotic cells, and in mammals it is indispensable for brain development and cognitive function. Vertebrates usually become L-ascorbate auxothrophs when the last enzyme of the synthetic pathway, an L-gulonolactone oxidase (GULO), is lost. Since Protostomes were until recently thought not to have a GULO gene, they were considered to be auxothrophs for Vitamin C. RESULTS: By performing phylogenetic analyses with tens of non-Bilateria and Protostomian genomes, it is shown, that a GULO gene is present in the non-Bilateria Placozoa, Myxozoa (here reported for the first time) and Anthozoa groups, and in Protostomians, in the Araneae family, the Gastropoda class, the Acari subclass (here reported for the first time), and the Priapulida, Annelida (here reported for the first time) and Brachiopoda phyla lineages. GULO is an old gene that predates the separation of Animals and Fungi, although it could be much older. We also show that within Protostomes, GULO has been lost multiple times in large taxonomic groups, namely the Pancrustacea, Nematoda, Platyhelminthes and Bivalvia groups, a pattern similar to that reported for Vertebrate species. Nevertheless, we show that Drosophila melanogaster seems to be capable of synthesizing L-ascorbate, likely through an alternative pathway, as recently reported for Caenorhabditis elegans. CONCLUSIONS: Non-Bilaterian and Protostomians seem to be able to synthesize Vitamin C either through the conventional animal pathway or an alternative pathway, but in this animal group, not being able to synthesize L-ascorbate seems to be the exception rather than the rule.


Assuntos
Ácido Ascórbico/metabolismo , Eucariotos/enzimologia , Eucariotos/genética , Evolução Molecular , L-Gulonolactona Oxidase/genética , Animais , Drosophila melanogaster/genética , Eucariotos/classificação , Eucariotos/metabolismo , Genoma , L-Gulonolactona Oxidase/química , L-Gulonolactona Oxidase/metabolismo , Modelos Moleculares , Filogenia , Vertebrados/classificação , Vertebrados/genética
17.
Interdiscip Sci ; 11(1): 57-67, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30712176

RESUMO

Nowadays, bioinformatics is one of the most important areas in modern biology and the creation of high-quality scientific software supporting this recent research area is one of the core activities of many researchers. In this context, high-quality sequence datasets are needed to perform inferences on the evolution of species, genes, and gene families, or to get evidence for adaptive amino acid evolution, among others. Nevertheless, sequence data are very often spread over several databases, many useful genomes and transcriptomes are non-annotated, the available annotation is not for the desired coding sequence isoform, and/or is unlikely to be accurate. Moreover, although the FASTA text-based format is quite simple and usable by most software applications, there are a number of issues that may be critical depending on the software used to analyse such files. Therefore, researchers without training in informatics often use a fraction of all available data. The above issues can be addressed using already available software applications, but there is no easy-to-use single piece of software that allows performing all these tasks within the same graphical interface, such as the one here presented, named BDBM (Blast DataBase Manager). BDBM can be used to efficiently get gene sequences from annotated and non-annotated genomes and transcriptomes. Moreover, it can be used to look for alternatives to existing annotations and to easily create reliable custom databases. Such databases are essential to prepare high-quality datasets. The analyses that we have performed on the Coffea canephora genome using BDBM aimed at the identification of the S-locus region (that harbours the genes involved in gametophytic self-incompatibility) led to the conclusion that there are two likely regions, one on chromosome 2 (around region 6600000-6650000), and another on chromosome 5 (around 15830000-15930000). Such findings are discussed in the context of the Rubiaceae gametophytic self-incompatibility evolution.


Assuntos
Coffea/genética , Biologia Computacional , Bases de Dados Genéticas , Software , Análise de Sequência de DNA
18.
Interdiscip Sci ; 11(1): 45-56, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30707359

RESUMO

Protein-protein interaction (PPI) data is essential to elucidate the complex molecular relationships in living systems, and thus understand the biological functions at cellular and systems levels. The complete map of PPIs that can occur in a living organism is called the interactome. For animals, PPI data is stored in multiple databases (e.g., BioGRID, CCSB, DroID, FlyBase, HIPPIE, HitPredict, HomoMINT, INstruct, Interactome3D, mentha, MINT, and PINA2) with different formats. This makes PPI comparisons difficult to perform, especially between species, since orthologous proteins may have different names. Moreover, there is only a partial overlap between databases, even when considering a single species. The EvoPPI ( http://evoppi.i3s.up.pt ) web application presented in this paper allows comparison of data from the different databases at the species level, or between species using a BLAST approach. We show its usefulness by performing a comparative study of the interactome of the nine polyglutamine (polyQ) disease proteins, namely androgen receptor (AR), atrophin-1 (ATN1), ataxin 1 (ATXN1), ataxin 2 (ATXN2), ataxin 3 (ATXN3), ataxin 7 (ATXN7), calcium voltage-gated channel subunit alpha1 A (CACNA1A), Huntingtin (HTT), and TATA-binding protein (TBP). Here we show that none of the human interactors of these proteins is common to all nine interactomes. Only 15 proteins are common to at least 4 of these polyQ disease proteins, and 40% of these are involved in ubiquitin protein ligase-binding function. The results obtained in this study suggest that polyQ disease proteins are involved in different functional networks. Comparisons with Mus musculus PPIs are also made for AR and TBP, using EvoPPI BLAST search approach (a unique feature of EvoPPI), with the goal of understanding why there is a significant excess of common interactors for these proteins in humans.


Assuntos
Doenças Neurodegenerativas/metabolismo , Peptídeos/metabolismo , Mapas de Interação de Proteínas , Humanos , Internet , Ligação Proteica
19.
Interdiscip Sci ; 11(1): 1-9, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30511150

RESUMO

Useful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA; http://www.sing-group.org/seda/index.html ). The first protocol is a substantial improvement over one recently published (López-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88-96 (2019)[1]), which was used to study the evolution of GULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation "Remove isoforms". This protocol can be used to easily show that putative functional GULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model species had been used. The second protocol allowed us to identify positively selected amino acid sites in a set of 19 primate HLA immunity genes. Interestingly, the proteins encoded by MHC class II genes can show just as many positively selected amino acid sites as those encoded by classical MHC class I genes. Although a significant percentage of codons, which can be as high as 14.8%, are evolving under positive selection, the main mode of evolution of HLA immunity genes is purifying selection. Using a large number of primate species, the probability of missing the identification of positively selected amino acid sites is lower. Both projects were performed in less than one week, and most of the time was spent running the analyses rather than preparing the files. Such protocols can be easily adapted to answer many other questions using a phylogenetic approach.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Filogenia , Algoritmos , Animais , Software
20.
PLoS One ; 13(9): e0204474, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30235322

RESUMO

Modern bioinformatics and computational biology are fields of study driven by the availability of effective software required for conducting appropriate research tasks. Apart from providing reliable and fast implementations of different data analysis algorithms, these software applications should also be clear and easy to use through proper user interfaces, providing appropriate data management and visualization capabilities. In this regard, the user experience obtained by interacting with these applications via their Graphical User Interfaces (GUI) is a key factor for their final success and real utility for researchers. Despite the existence of different packages and applications focused on advanced data visualization, there is a lack of specific libraries providing pertinent GUI components able to help scientific bioinformatics software developers. To that end, this paper introduces GC4S, a bioinformatics-oriented collection of high-level, extensible, and reusable Java GUI elements specifically designed to speed up bioinformatics software development. Within GC4S, developers of new applications can focus on the specific GUI requirements of their projects, relying on GC4S for generalities and abstractions. GC4S is free software distributed under the terms of GNU Lesser General Public License and both source code and documentation are publicly available at http://www.sing-group.org/gc4s.


Assuntos
Biologia Computacional , Gráficos por Computador , Interface Usuário-Computador , Acesso à Informação , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA