Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Anal Chem ; 96(21): 8474-8483, 2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-38739687

RESUMEN

Ultraviolet photodissociation (UVPD) mass spectrometry unlocks insights into the protein structure and sequence through fragmentation patterns. While N- and C-terminal fragments are traditionally relied upon, this work highlights the critical role of internal fragments in achieving near-complete sequencing of protein. Previous limitations of internal fragment utilization, owing to their abundance and potential for random matching, are addressed here with the development of Panda-UV, a novel software tool combining spectral calibration, and Pearson correlation coefficient scoring for confident fragment assignment. Panda-UV showcases its power through comprehensive benchmarks on three model proteins. The inclusion of internal fragments boosts identified fragment numbers by 26% and enhances average protein sequence coverage to a remarkable 93% for intact proteins, unlocking the hidden region of the largest protein carbonic anhydrase II in model proteins. Notably, an average of 65% of internal fragments can be identified in multiple replicates, demonstrating the high confidence of the fragments Panda-UV provided. Finally, the sequence coverages of mAb subunits can be increased up to 86% and the complementary determining regions (CDRs) are nearly completely sequenced in a single experiment. The source codes of Panda-UV are available at https://github.com/PHOENIXcenter/Panda-UV.


Asunto(s)
Espectrometría de Masas , Programas Informáticos , Rayos Ultravioleta , Proteínas/química , Proteínas/análisis , Secuencia de Aminoácidos , Animales
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38647153

RESUMEN

Computational drug repositioning, which involves identifying new indications for existing drugs, is an increasingly attractive research area due to its advantages in reducing both overall cost and development time. As a result, a growing number of computational drug repositioning methods have emerged. Heterogeneous network-based drug repositioning methods have been shown to outperform other approaches. However, there is a dearth of systematic evaluation studies of these methods, encompassing performance, scalability and usability, as well as a standardized process for evaluating new methods. Additionally, previous studies have only compared several methods, with conflicting results. In this context, we conducted a systematic benchmarking study of 28 heterogeneous network-based drug repositioning methods on 11 existing datasets. We developed a comprehensive framework to evaluate their performance, scalability and usability. Our study revealed that methods such as HGIMC, ITRPCA and BNNR exhibit the best overall performance, as they rely on matrix completion or factorization. HINGRL, MLMC, ITRPCA and HGIMC demonstrate the best performance, while NMFDR, GROBMC and SCPMF display superior scalability. For usability, HGIMC, DRHGCN and BNNR are the top performers. Building on these findings, we developed an online tool called HN-DREP (http://hn-drep.lyhbio.com/) to facilitate researchers in viewing all the detailed evaluation results and selecting the appropriate method. HN-DREP also provides an external drug repositioning prediction service for a specific disease or drug by integrating predictions from all methods. Furthermore, we have released a Snakemake workflow named HN-DRES (https://github.com/lyhbio/HN-DRES) to facilitate benchmarking and support the extension of new methods into the field.


Asunto(s)
Benchmarking , Reposicionamiento de Medicamentos , Reposicionamiento de Medicamentos/métodos , Humanos , Biología Computacional/métodos , Programas Informáticos , Algoritmos
3.
Sci Rep ; 13(1): 20444, 2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-37993475

RESUMEN

Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.


Asunto(s)
Investigación Biomédica , Polimorfismo de Nucleótido Simple , Algoritmos , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
4.
Artículo en Inglés | MEDLINE | ID: mdl-37864708

RESUMEN

Detecting low-abundance mutations is of particular interest in the fields of biology and medical science. However, most currently available molecular assays have limited sensitivity for the detection of low-abundance mutations. Here, we established a platform for detecting low-level DNA mutations with high sensitivity and accuracy by combining enhanced-ice-COLD-PCR (E-ice-COLD-PCR) and pyrosequencing with di-base addition (PDBA). The PDBA assay was performed by selectively adding one di-base (AG, CT, AC, GT, AT, or GC) instead of one base (A, T, C, or G) into the reaction at a time during sequencing primer extension and thus enabling to increase the sequencing intensity. A specific E-ice-COLD-PCR/PDBA assay was developed for the detection of the most frequent BRAF V600E mutation to verify the feasibility of our method. E-ice-COLD-PCR/PDBA assay permitted the reliable detection of down to 0.007% of mutant alleles in a wild-type background. Furthermore, it required only a small amount of starting material (20 pg) to sensitively detect and identify low-abundance mutations, thus increasing the screening capabilities in limited DNA material. The E-ice-COLD-PCR/PDBA assay was applied in the current study to clinical formalin-fixed paraffin-embedded (FFPE) and plasma samples, and it enabled the detection of BRAF V600E mutations in samples that appeared as a wild type using PCR/conventional pyrosequencing (CP) and E-ice-COLD-PCR/CP. E-ice-COLD-PCR/PDBA assay is a rapid, cost-effective, and highly sensitive method that could improve the detection of low-abundance mutations in routine clinical use.

5.
Sheng Wu Gong Cheng Xue Bao ; 39(9): 3579-3593, 2023 Sep 25.
Artículo en Chino | MEDLINE | ID: mdl-37805839

RESUMEN

Data-independent acquisition (DIA) is a high-throughput, unbiased mass spectrometry data acquisition method which has good quantitative reproducibility and is friendly to low-abundance proteins. It becomes the preferred choice for clinical proteomic studies especially for large cohort studies in recent years. The mass-spectrometry (MS)/MS spectra generated by DIA is usually heavily mixed with fragment ion information of multiple peptides, which makes the protein identification and quantification more difficult. Currently, DIA data analysis methods fall into two main categories, namely peptide-centric and spectrum-centric. The peptide-centric strategy is more sensitive for identification and more accurate for quantification. Thus, it has become the mainstream strategy for DIA data analysis, which includes four key steps: building a spectral library, extracting ion chromatogram, feature scoring and statistical quality control. This work reviews the peptide-centric DIA data analysis procedure, introduces the corresponding algorithms and software tools, and summarizes the improvements for the existing algorithms. Finally, the future development directions are discussed.


Asunto(s)
Péptidos , Proteómica , Humanos , Proteómica/métodos , Reproducibilidad de los Resultados , Péptidos/química , Programas Informáticos , Algoritmos , Espectrometría de Masas en Tándem/métodos , Proteoma/análisis
6.
Bioinform Adv ; 3(1): vbad057, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37128577

RESUMEN

Summary: De novo peptide sequencing for tandem mass spectrometry data is not only a key technology for novel peptide identification, but also a precedent task for many downstream tasks, such as vaccine and antibody studies. In recent years, neural network models for de novo peptide sequencing have manifested a remarkable ability to accommodate various data sources and outperformed conventional peptide identification tools. However, the excellent model is computationally expensive, taking up to 1 week to process about 400 000 spectrums. This article presents PGPointNovo, a novel neural network-based tool for parallel de novo peptide sequencing. PGPointNovo uses data parallelization technology to accelerate training and inference and optimizes the training obstacles caused by large batch sizes. The results of extensive experiments conducted on multiple datasets of different sizes demonstrate that compared with PointNovo the excellent neural network-based de novo peptide sequencing tool, PGPointNovo, accelerates de novo peptide sequencing by up to 7.35× without precision or recall compromises. Availability and implementation: The source code and the parameter settings are available at https://github.com/shallFun4Learning/PGPointNovo. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

7.
Sheng Wu Gong Cheng Xue Bao ; 39(4): 1815-1824, 2023 Apr 25.
Artículo en Chino | MEDLINE | ID: mdl-37154341

RESUMEN

Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.


Asunto(s)
Antibacterianos , Péptidos Catiónicos Antimicrobianos , Antibacterianos/farmacología , Antibacterianos/química , Secuencia de Aminoácidos , Péptidos Catiónicos Antimicrobianos/farmacología , Péptidos Catiónicos Antimicrobianos/química , Péptidos Antimicrobianos , Procesamiento de Lenguaje Natural
8.
Comput Math Methods Med ; 2022: 7300788, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36479313

RESUMEN

Hepatocellular carcinoma (LIHC) is the fifth common cancer worldwide, and it requires effective diagnosis and treatment to prevent aggressive metastasis. The purpose of this study was to construct a machine learning-based diagnostic model for the diagnosis of liver cancer. Using weighted correlation network analysis (WGCNA), univariate analysis, and Lasso-Cox regression analysis, protein-protein interactions network analysis is used to construct gene networks from transcriptome data of hepatocellular carcinoma patients and find hub genes for machine learning. The five models, including gradient boosting, random forest, support vector machine, logistic regression, and integrated learning, were to identify a multigene prediction model of patients. Immunological assessment, TP53 gene mutation and promoter methylation level analysis, and KEGG pathway analysis were performed on these groups. Potential drug molecular targets for the corresponding hepatocellular carcinomas were obtained by molecular docking for analysis, resulting in the screening of 2 modules that may be relevant to the survival of hepatocellular carcinoma patients, and the construction of 5 diagnostic models and multiple interaction networks. The modes of action of drug-molecule interactions that may be effective against hepatocellular carcinoma core genes CCNA2, CCNB1, and CDK1 were investigated. This study is expected to provide research ideas for early diagnosis of hepatocellular carcinoma.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Simulación del Acoplamiento Molecular , Aprendizaje Automático
9.
Sheng Wu Gong Cheng Xue Bao ; 38(10): 3616-3627, 2022 Oct 25.
Artículo en Chino | MEDLINE | ID: mdl-36305397

RESUMEN

Cancer is a heterogeneous disease with complex mechanisms that requires targeted precision medicine strategies. The growth of precision medicine is indispensable from the rapid development of genomics. However, genomics has certain limitations in molecular phenotype analysis, proteogenomics thus arose at the right time. Proteogenomics is the merging of proteomics and genomics. This review describes the limitations of genomic analysis and highlights the importance of proteogenomics to re-understand precision oncology from a proteogenomic perspective. In addition, the application of proteogenomics in precision oncology is briefly introduced, the related public data projects are described, and finally, the challenges that need to be addressed at this stage are proposed.


Asunto(s)
Neoplasias , Proteogenómica , Humanos , Medicina de Precisión , Neoplasias/genética , Proteómica , Genómica
10.
Front Oncol ; 12: 847706, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35651795

RESUMEN

Gastric cancer (GC) is one of the most common malignant tumors with a high mortality rate worldwide and lacks effective methods for prognosis prediction. Postoperative adjuvant chemotherapy is the first-line treatment for advanced gastric cancer, but only a subgroup of patients benefits from it. Here, we used 833 formalin-fixed, paraffin-embedded resected tumor samples from patients with TNM stage II/III GC and established a proteomic subtyping workflow using 100 deep-learned features. Two proteomic subtypes (S-I and S-II) with overall survival differences were identified. S-I has a better survival rate and is sensitive to chemotherapy. Patients in the S-I who received adjuvant chemotherapy had a significant improvement in the 5-year overall survival rate compared with patients who received surgery alone (65.3% vs 52.6%; log-rank P = 0.014), but no improvement was observed in the S-II (54% vs 51%; log-rank P = 0.96). These results were verified in an independent validation set. Furthermore, we also evaluated the superiority and scalability of the deep learning-based workflow in cancer molecular subtyping, exhibiting its great utility and potential in prognosis prediction and therapeutic decision-making.

11.
Biol Direct ; 17(1): 13, 2022 06 05.
Artículo en Inglés | MEDLINE | ID: mdl-35659725

RESUMEN

BACKGROUND: The evolution of spliceosomal introns has been widely studied among various eukaryotic groups. Researchers nearly reached the consensuses on the pattern and the mechanisms of intron losses and gains across eukaryotes. However, according to previous studies that analyzed a few genes or genomes, Nematoda seems to be an eccentric group. RESULTS: Taking advantage of the recent accumulation of sequenced genomes, we extensively analyzed the intron losses and gains using 104 nematode genomes across all the five Clades of the phylum. Nematodes have a wide range of intron density, from less than one to more than nine per kbp coding sequence. The rates of intron losses and gains exhibit significant heterogeneity both across different nematode lineages and across different evolutionary stages of the same lineage. The frequency of intron losses far exceeds that of intron gains. Five pieces of evidence supporting the model of cDNA-mediated intron loss have been observed in ten Caenorhabditis species, the dominance of the precise intron losses, frequent loss of adjacent introns, high-level expression of the intron-lost genes, preferential losses of short introns, and the preferential losses of introns close to 3'-ends of genes. Like studies in most eukaryotic groups, we cannot find the source sequences for the limited number of intron gains detected in the Caenorhabditis genomes. CONCLUSIONS: These results indicate that nematodes are a typical eukaryotic group rather than an outlier in intron evolution.


Asunto(s)
Nematodos , Animales , Secuencia de Bases , Eucariontes/genética , Evolución Molecular , Intrones , Nematodos/genética , Filogenia , Empalmosomas/genética
12.
Front Plant Sci ; 13: 839457, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35242159

RESUMEN

Plant circadian clock coordinates endogenous transcriptional rhythms with diurnal changes of environmental cues. OsPRR37, a negative component in the rice circadian clock, reportedly regulates transcriptome rhythms, and agronomically important traits. However, the underlying regulatory mechanisms of OsPRR37-output genes remain largely unknown. In this study, whole genome bisulfite sequencing and high-throughput RNA sequencing were applied to verify the role of DNA methylation in the transcriptional control of OsPRR37-output genes. We found that the overexpression of OsPRR37 suppressed rice growth and altered cytosine methylations in CG and CHG sequence contexts in but not the CHH context (H represents A, T, or C). In total, 35 overlapping genes were identified, and 25 of them showed negative correlation between the methylation level and gene expression. The promoter of the hexokinase gene OsHXK1 was hypomethylated at both CG and CHG sites, and the expression of OsHXK1 was significantly increased. Meanwhile, the leaf starch content was consistently lower in OsPRR37 overexpression lines than in the recipient parent Guangluai 4. Further analysis with published data of time-course transcriptomes revealed that most overlapping genes showed peak expression phases from dusk to dawn. The genes involved in DNA methylation, methylation maintenance, and DNA demethylation were found to be actively expressed around dusk. A DNA glycosylase, namely ROS1A/DNG702, was probably the upstream candidate that demethylated the promoter of OsHXK1. Taken together, our results revealed that CG and CHG methylation contribute to the transcriptional regulation of OsPRR37-output genes, and hypomethylation of OsHXK1 leads to decreased starch content and reduced plant growth in rice.

13.
J Proteomics ; 232: 104070, 2021 02 10.
Artículo en Inglés | MEDLINE | ID: mdl-33307250

RESUMEN

Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.


Asunto(s)
Aprendizaje Profundo , Espectrometría de Masas en Tándem , Algoritmos , Análisis por Conglomerados , Bases de Datos de Proteínas , Humanos , Proteómica , Programas Informáticos
14.
Ann Transplant ; 25: e925013, 2020 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-32883945

RESUMEN

BACKGROUND Oncolytic viruses (OVs) can specifically infect and kill tumor cells. Adeno-associated virus (AAV) is a widely-studied OV. This study aimed to construct a tumor-targeted recombinant AAV using genetic engineering technology. MATERIAL AND METHODS The transgene plasmid pAAV-HE1B19K-TE1A was constructed with 4 genes (hTERT, E1A, HKII, and E1B19K) and co-transfected with pAAV-RC and pHelper to tumor cells (HepG2, A549, BGC-803) and normal cells (HUVEC). rAAV was verified with fluorescence microscopy. Quantitative PCR (qPCR) assay was used to test the titer of rAAV in each cell line. Apoptosis was analyzed using qPCR and Western blot assay. MTT was used to detect the effect of rAAV on cell viability. RESULTS The pAAV-HE1B19K-TE1A transgene plasmid was successfully structured. pAAV-HE1B19K-TE1A was highly expressed in all tumor cells. The titers of pAAV-HE1B19K-TE1A in HepG2, A549, and BGC-803 were 7.4×107, 1.4×108, and 1.1×108 gc/µl, respectively. pAAV-HE1B19K-TE1A significantly decreased cell viability of tumor cells compared to that in HUVEC (p<0.05). pAAV-HE1B19K-TE1A remarkably triggered cleaved caspase 3 (C-caspase 3) activity in tumor cells compared to that in untransfected tumor cells (p<0.05). pAAV-HE1B19K-TE1A significantly induced release of cytochrome C (Cyto C) in tumor cells compared to that in untransfected tumor cells (p<0.05). pAAV-HE1B19K-TE1A demonstrated no toxicity to vital tissues of animals. CONCLUSIONS Tumor-targeted rAAV was successfully produced using the Helper-free system with recombinant plasmid, demonstrating high efficacy in decreasing viability of tumor cells without adverse effects on normal cells.


Asunto(s)
Antineoplásicos/administración & dosificación , Supervivencia Celular/efectos de los fármacos , Dependovirus , Células Hep G2/efectos de los fármacos , Células Endoteliales de la Vena Umbilical Humana/efectos de los fármacos , Humanos , Transfección
15.
Proteomics ; 20(21-22): e1900344, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32643271

RESUMEN

Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Proteómica , Humanos , Espectrometría de Masas , Proteoma
16.
Proteomics ; 20(21-22): e1900345, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32574431

RESUMEN

Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics. Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition search engines, but also for building spectral libraries for data-independent acquisition analysis. Different tools with their unique algorithms and implementations may result in different performances. Hence, it is necessary to systematically evaluate these tools to find out their preferences and intrinsic differences. In this study, multiple datasets with different collision energies, enzymes, instruments, and species, are used to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools, as well as, the machine learning-based tool MS2PIP. The evaluations may provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Algoritmos , Aprendizaje Automático , Motor de Búsqueda
17.
Int J Mol Sci ; 21(2)2020 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-31940793

RESUMEN

Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.


Asunto(s)
Redes Neurales de la Computación , Mapeo de Interacción de Proteínas/métodos , Animales , Sitios de Unión , Conjuntos de Datos como Asunto/normas , Humanos , Unión Proteica , Mapeo de Interacción de Proteínas/normas , Reproducibilidad de los Resultados
18.
Curr Mol Med ; 20(6): 429-441, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31782363

RESUMEN

BACKGROUND: Bipolar disorder (BD) is a type of chronic emotional disorder with a complex genetic structure. However, its genetic molecular mechanism is still unclear, which makes it insufficient to be diagnosed and treated. METHODS AND RESULTS: In this paper, we proposed a model for predicting BD based on single nucleotide polymorphisms (SNPs) screening by genome-wide association study (GWAS), which was constructed by a convolutional neural network (CNN) that predicted the probability of the disease. According to the difference of GWAS threshold, two sets of data were named: group P001 and group P005. And different convolutional neural networks are set for the two sets of data. The training accuracy of the model trained with group P001 data is 96%, and the test accuracy is 91%. The training accuracy of the model trained with group P005 data is 94.5%, and the test accuracy is 92%. At the same time, we used gradient weighted class activation mapping (Grad-CAM) to interpret the prediction model, indirectly to identify high-risk SNPs of BD. In the end, we compared these high-risk SNPs with human gene annotation information. CONCLUSION: The model prediction results of the group P001 yielded 137 risk genes, of which 22 were reported to be associated with the occurrence of BD. The model prediction results of the group P005 yielded 407 risk genes, of which 51 were reported to be associated with the occurrence of BD.


Asunto(s)
Trastorno Bipolar/genética , Estudio de Asociación del Genoma Completo/métodos , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Anotación de Secuencia Molecular
19.
Nucleic Acids Res ; 47(D1): D1211-D1217, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30252093

RESUMEN

Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Proteoma/metabolismo , Proteómica/métodos , Animales , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Metadatos/estadística & datos numéricos , Interfaz Usuario-Computador
20.
Sheng Wu Gong Cheng Xue Bao ; 34(10): 1567-1578, 2018 Oct 25.
Artículo en Chino | MEDLINE | ID: mdl-30394024

RESUMEN

Mass spectrometry and database searching are necessary to identify proteins and peptides. With the rapid development of mass spectrometry technology, mass spectrometry data in proteomics are acquired very quickly, providing a powerful method to identify large-scale proteins and peptides, making mass spectrometry data-based proteomics research more and more into the mainstream. The traditional database searching method has many limitations to identify post-translational modifications of peptides. This paper systematically reviews the development, theoretical concept and applications of spectral network method, and the advantages of spectral network library to identify peptides.


Asunto(s)
Péptidos/química , Procesamiento Proteico-Postraduccional , Proteínas/química , Bases de Datos de Proteínas , Espectrometría de Masas , Proteómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...