|

ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes.

Gunasekera, Richard S; Raja, Komal K B; Hewapathirana, Suresh; Tundrea, Emanuel; Gunasekera, Vinodh; Galbadage, Thushara; Nelson, Paul A.

PLoS One ; 18(10): e0291260, 2023.

Article En | MEDLINE | ID: mdl-37879070

With the numerous genomes sequenced today, it has been revealed that a noteworthy percentage of genes in a given taxon of organisms in the phylogenetic tree of life do not have orthologous sequences in other taxa. These sequences are commonly referred to as "orphans" or "ORFans" if found as single occurrences in a single species or as "taxonomically restricted genes" (TRGs) when found at higher taxonomic levels. Quantitative and collective studies of these genes are necessary for understanding their biological origins. However, the current software for identifying orphan genes is limited in its functionality, database search range, and very complex algorithmically. Thus, researchers studying orphan genes must harvest their data from many disparate sources. ORFanID is a graphical web-based search engine that facilitates the efficient identification of both orphan genes and TRGs at all taxonomic levels, from DNA or amino acid sequences in the NCBI database cluster and other large bioinformatics repositories. The software allows users to identify genes that are unique to any taxonomic rank, from species to domain, using NCBI systematic classifiers. It provides control over NCBI database search parameters, and the results are presented in a spreadsheet as well as a graphical display. The tables in the software are sortable, and results can be filtered using the fuzzy search functionality. The visual presentation can be expanded and collapsed by the taxonomic tree to its various branches. Example results from searches on five species and gene expression data from specific orphan genes are provided in the Supplementary Information.

Search Engine , Software , Phylogeny , Genome , Internet

The ProteomeXchange consortium at 10 years: 2023 update.

Deutsch, Eric W; Bandeira, Nuno; Perez-Riverol, Yasset; Sharma, Vagisha; Carver, Jeremy J; Mendoza, Luis; Kundu, Deepti J; Wang, Shengbo; Bandla, Chakradhar; Kamatchinathan, Selvakumar; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 51(D1): D1539-D1548, 2023 01 06.

Article En | MEDLINE | ID: mdl-36370099

Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.

Proteomics , Software , Humans , Databases, Protein , Mass Spectrometry , Proteomics/methods , Computational Biology/methods

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Perez-Riverol, Yasset; Bai, Jingwen; Bandla, Chakradhar; García-Seisdedos, David; Hewapathirana, Suresh; Kamatchinathan, Selvakumar; Kundu, Deepti J; Prakash, Ananth; Frericks-Zipper, Anika; Eisenacher, Martin; Walzer, Mathias; Wang, Shengbo; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.

Article En | MEDLINE | ID: mdl-34723319

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.

Databases, Protein , Metadata/statistics & numerical data , Molecular Sequence Annotation/statistics & numerical data , Peptides/chemistry , Proteins/chemistry , Software , Amino Acid Sequence , Bibliometrics , Datasets as Topic , Humans , Information Storage and Retrieval , Internet , Mass Spectrometry , Peptides/genetics , Peptides/metabolism , Proteins/genetics , Proteins/metabolism , Proteomics/instrumentation , Proteomics/methods , Sequence Alignment

The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.

Deutsch, Eric W; Bandeira, Nuno; Sharma, Vagisha; Perez-Riverol, Yasset; Carver, Jeremy J; Kundu, Deepti J; García-Seisdedos, David; Jarnuczak, Andrew F; Hewapathirana, Suresh; Pullman, Benjamin S; Wertz, Julie; Sun, Zhi; Kawano, Shin; Okuda, Shujiro; Watanabe, Yu; Hermjakob, Henning; MacLean, Brendan; MacCoss, Michael J; Zhu, Yunping; Ishihama, Yasushi; Vizcaíno, Juan A.

Nucleic Acids Res ; 48(D1): D1145-D1152, 2020 01 08.

Article En | MEDLINE | ID: mdl-31686107

The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including 'big data' approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.

Computational Biology/methods , Databases, Protein , Proteomics/methods , Big Data , Data Mining , Software , Software Design , Web Browser

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Perez-Riverol, Yasset; Csordas, Attila; Bai, Jingwen; Bernal-Llinares, Manuel; Hewapathirana, Suresh; Kundu, Deepti J; Inuganti, Avinash; Griss, Johannes; Mayer, Gerhard; Eisenacher, Martin; Pérez, Enrique; Uszkoreit, Julian; Pfeuffer, Julianus; Sachsenberg, Timo; Yilmaz, Sule; Tiwary, Shivani; Cox, Jürgen; Audain, Enrique; Walzer, Mathias; Jarnuczak, Andrew F; Ternent, Tobias; Brazma, Alvis; Vizcaíno, Juan Antonio.

Nucleic Acids Res ; 47(D1): D442-D450, 2019 01 08.

Article En | MEDLINE | ID: mdl-30395289

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

Databases, Protein , Mass Spectrometry , Proteomics , Peptides/chemistry , Software