Your browser doesn't support javascript.
loading
PractiCPP: a deep learning approach tailored for extremely imbalanced datasets in cell-penetrating peptide prediction.
Shi, Kexin; Xiong, Yuanpeng; Wang, Yu; Deng, Yifan; Wang, Wenjia; Jing, Bingyi; Gao, Xin.
Afiliación
  • Shi K; Syneron Technology, Guangzhou 510000, China.
  • Xiong Y; Individualized Interdisciplinary Program (Data Science and Analytics), The Hong Kong University of Science and Technology, Hong Kong SAR, China.
  • Wang Y; Syneron Technology, Guangzhou 510000, China.
  • Deng Y; Syneron Technology, Guangzhou 510000, China.
  • Wang W; Syneron Technology, Guangzhou 510000, China.
  • Jing B; Data Science and Analytics Thrust, The Hong Kong University of Science and Technology (Guangzhou), Nansha, Guangzhou, 511400, Guangdong, China.
  • Gao X; Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518000, China.
Bioinformatics ; 40(2)2024 02 01.
Article en En | MEDLINE | ID: mdl-38305405
ABSTRACT
MOTIVATION Effective drug delivery systems are paramount in enhancing pharmaceutical outcomes, particularly through the use of cell-penetrating peptides (CPPs). These peptides are gaining prominence due to their ability to penetrate eukaryotic cells efficiently without inflicting significant damage to the cellular membrane, thereby ensuring optimal drug delivery. However, the identification and characterization of CPPs remain a challenge due to the laborious and time-consuming nature of conventional methods, despite advances in proteomics. Current computational models, however, are predominantly tailored for balanced datasets, an approach that falls short in real-world applications characterized by a scarcity of known positive CPP instances.

RESULTS:

To navigate this shortfall, we introduce PractiCPP, a novel deep-learning framework tailored for CPP prediction in highly imbalanced data scenarios. Uniquely designed with the integration of hard negative sampling and a sophisticated feature extraction and prediction module, PractiCPP facilitates an intricate understanding and learning from imbalanced data. Our extensive computational validations highlight PractiCPP's exceptional ability to outperform existing state-of-the-art methods, demonstrating remarkable accuracy, even in datasets with an extreme positive-to-negative ratio of 11000. Furthermore, through methodical embedding visualizations, we have established that models trained on balanced datasets are not conducive to practical, large-scale CPP identification, as they do not accurately reflect real-world complexities. In summary, PractiCPP potentially offers new perspectives in CPP prediction methodologies. Its design and validation, informed by real-world dataset constraints, suggest its utility as a valuable tool in supporting the acceleration of drug delivery advancements. AVAILABILITY AND IMPLEMENTATION The source code of PractiCPP is available on Figshare at https//doi.org/10.6084/m9.figshare.25053878.v1.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Péptidos de Penetración Celular / Aprendizaje Profundo Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Péptidos de Penetración Celular / Aprendizaje Profundo Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China
...