Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Biochim Biophys Acta Proteins Proteom ; 1872(2): 140985, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38122964

ABSTRACT

MOTIVATION: The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms. METHODOLOGY: We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein. METHODS: In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge. EVALUATION: We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks. CONTRIBUTIONS: FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.


Subject(s)
Proteins , Sequence Analysis, Protein , Databases, Protein , Proteins/metabolism , Molecular Sequence Annotation , Sequence Analysis, Protein/methods , Gene Ontology
2.
Biochim Biophys Acta Gene Regul Mech ; 1865(1): 194767, 2022 01.
Article in English | MEDLINE | ID: mdl-34749004

ABSTRACT

BACKGROUND: Research in Bioinformatics generates tools and datasets in Bioinformatics at a very fast rate. Meanwhile, a lot of effort is going into making these resources findable and reusable to improve resource discovery by researchers in the course of their work. PURPOSE: This paper proposes a semi-automated tool to assess a resource according to the Findability, Accessibility, Interoperability and Reusability (FAIR) criteria. The aim is to create a portal that presents the assessment score together with a report that researchers can use to gauge a resource. METHOD: Our system uses internet searches to automate the process of generating FAIR scores. The process is semi-automated in that if a particular property of the FAIR scores has not been captured by AutoFAIR, a user is able to amend and supply the information to complete the assessment. RESULTS: We compare our results against FAIRshake that was used as the benchmark tool for comparing the assessments. The results show that AutoFAIR was able to match the FAIR criteria in FAIRshake with minimal intervention from the user. CONCLUSIONS: We show that AutoFAIR can be a good repository for storing metadata about tools and datasets, together with comprehensive reports detailing the assessments of the resources. Moreover, AutoFAIR is also able to score workflows, giving an overall indication of the FAIRness of the resources used in a scientific study.


Subject(s)
Computational Biology , Metadata
3.
Biochim Biophys Acta Gene Regul Mech ; 1865(1): 194768, 2022 01.
Article in English | MEDLINE | ID: mdl-34757206

ABSTRACT

As computational modeling becomes more essential to analyze and understand biological regulatory mechanisms, governance of the many databases and knowledge bases that support this domain is crucial to guarantee reliability and interoperability of resources. To address this, the COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various steps in the knowledge management process that focuses on understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to standardize and update existing knowledge management workflows and involve end-users in the process of designing the Gene Regulation Knowledge Commons (GRKC). Here the GREEKC consortium describes its main achievements in improving this GRKC.


Subject(s)
Gene Expression Regulation , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...