Your browser doesn't support javascript.
loading
GOThresher: a program to remove annotation biases from protein function annotation datasets.
Joshi, Parnal; Banerjee, Sagnik; Hu, Xiao; Khade, Pranav M; Friedberg, Iddo.
Afiliación
  • Joshi P; Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA.
  • Banerjee S; Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA.
  • Hu X; Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA.
  • Khade PM; Department of Statistics, Iowa State University, Ames, IA 50011, USA.
  • Friedberg I; Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA.
Bioinformatics ; 39(1)2023 01 01.
Article en En | MEDLINE | ID: mdl-36688705
ABSTRACT
MOTIVATION Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets.

RESULTS:

We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases. AVAILABILITY AND IMPLEMENTATION GOThresher is written in Python and released via PyPI https//pypi.org/project/gothresher/ and on the Bioconda Anaconda channel https//anaconda.org/bioconda/gothresher. The source code is hosted on GitHub https//github.com/FriedbergLab/GOThresher and distributed under the GPL 3.0 license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Biología Computacional / Genómica Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Biología Computacional / Genómica Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos