iProX in 2021: connecting proteomics data sharing with big data.

Chen, Tao; Ma, Jie; Liu, Yi; Chen, Zhiguang; Xiao, Nong; Lu, Yutong; Fu, Yinjin; Yang, Chunyuan; Li, Mansheng; Wu, Songfeng; Wang, Xue; Li, Dongsheng; He, Fuchu; Hermjakob, Henning; Zhu, Yunping

Chen, Tao; Ma, Jie; Liu, Yi; Chen, Zhiguang; Xiao, Nong; Lu, Yutong; Fu, Yinjin; Yang, Chunyuan; Li, Mansheng; Wu, Songfeng; Wang, Xue; Li, Dongsheng; He, Fuchu; Hermjakob, Henning; Zhu, Yunping.

Chen T; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Ma J; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Liu Y; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Chen Z; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.
Xiao N; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.
Lu Y; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.
Fu Y; School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.
Yang C; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Li M; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Wu S; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Wang X; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Li D; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
He F; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Hermjakob H; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.
Zhu Y; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Nucleic Acids Res ; 50(D1): D1522-D1527, 2022 01 07.

Article en En | MEDLINE | ID: mdl-34871441

RESUMEN

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.

Asunto(s)

Bases de Datos de Proteínas; Proteoma/genética; Proteómica; Programas Informáticos; Macrodatos; Biología Computacional/normas; Difusión de la Información

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Proteoma / Bases de Datos de Proteínas / Proteómica Tipo de estudio: Prognostic_studies Idioma: En Año: 2022 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google