Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38647153

RESUMEN

Computational drug repositioning, which involves identifying new indications for existing drugs, is an increasingly attractive research area due to its advantages in reducing both overall cost and development time. As a result, a growing number of computational drug repositioning methods have emerged. Heterogeneous network-based drug repositioning methods have been shown to outperform other approaches. However, there is a dearth of systematic evaluation studies of these methods, encompassing performance, scalability and usability, as well as a standardized process for evaluating new methods. Additionally, previous studies have only compared several methods, with conflicting results. In this context, we conducted a systematic benchmarking study of 28 heterogeneous network-based drug repositioning methods on 11 existing datasets. We developed a comprehensive framework to evaluate their performance, scalability and usability. Our study revealed that methods such as HGIMC, ITRPCA and BNNR exhibit the best overall performance, as they rely on matrix completion or factorization. HINGRL, MLMC, ITRPCA and HGIMC demonstrate the best performance, while NMFDR, GROBMC and SCPMF display superior scalability. For usability, HGIMC, DRHGCN and BNNR are the top performers. Building on these findings, we developed an online tool called HN-DREP (http://hn-drep.lyhbio.com/) to facilitate researchers in viewing all the detailed evaluation results and selecting the appropriate method. HN-DREP also provides an external drug repositioning prediction service for a specific disease or drug by integrating predictions from all methods. Furthermore, we have released a Snakemake workflow named HN-DRES (https://github.com/lyhbio/HN-DRES) to facilitate benchmarking and support the extension of new methods into the field.


Asunto(s)
Benchmarking , Reposicionamiento de Medicamentos , Reposicionamiento de Medicamentos/métodos , Humanos , Biología Computacional/métodos , Programas Informáticos , Algoritmos
2.
Anal Chem ; 96(21): 8474-8483, 2024 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-38739687

RESUMEN

Ultraviolet photodissociation (UVPD) mass spectrometry unlocks insights into the protein structure and sequence through fragmentation patterns. While N- and C-terminal fragments are traditionally relied upon, this work highlights the critical role of internal fragments in achieving near-complete sequencing of protein. Previous limitations of internal fragment utilization, owing to their abundance and potential for random matching, are addressed here with the development of Panda-UV, a novel software tool combining spectral calibration, and Pearson correlation coefficient scoring for confident fragment assignment. Panda-UV showcases its power through comprehensive benchmarks on three model proteins. The inclusion of internal fragments boosts identified fragment numbers by 26% and enhances average protein sequence coverage to a remarkable 93% for intact proteins, unlocking the hidden region of the largest protein carbonic anhydrase II in model proteins. Notably, an average of 65% of internal fragments can be identified in multiple replicates, demonstrating the high confidence of the fragments Panda-UV provided. Finally, the sequence coverages of mAb subunits can be increased up to 86% and the complementary determining regions (CDRs) are nearly completely sequenced in a single experiment. The source codes of Panda-UV are available at https://github.com/PHOENIXcenter/Panda-UV.


Asunto(s)
Espectrometría de Masas , Programas Informáticos , Rayos Ultravioleta , Proteínas/química , Proteínas/análisis , Secuencia de Aminoácidos , Animales
3.
Nucleic Acids Res ; 47(D1): D1211-D1217, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30252093

RESUMEN

Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Proteoma/metabolismo , Proteómica/métodos , Animales , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Metadatos/estadística & datos numéricos , Interfaz Usuario-Computador
4.
Proteomics ; 20(21-22): e1900345, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32574431

RESUMEN

Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics. Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition search engines, but also for building spectral libraries for data-independent acquisition analysis. Different tools with their unique algorithms and implementations may result in different performances. Hence, it is necessary to systematically evaluate these tools to find out their preferences and intrinsic differences. In this study, multiple datasets with different collision energies, enzymes, instruments, and species, are used to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools, as well as, the machine learning-based tool MS2PIP. The evaluations may provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Algoritmos , Aprendizaje Automático , Motor de Búsqueda
5.
Proteomics ; 20(21-22): e1900344, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32643271

RESUMEN

Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Proteómica , Humanos , Espectrometría de Masas , Proteoma
6.
Int J Mol Sci ; 21(2)2020 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-31940793

RESUMEN

Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.


Asunto(s)
Redes Neurales de la Computación , Mapeo de Interacción de Proteínas/métodos , Animales , Sitios de Unión , Conjuntos de Datos como Asunto/normas , Humanos , Unión Proteica , Mapeo de Interacción de Proteínas/normas , Reproducibilidad de los Resultados
7.
Int J Mol Sci ; 19(1)2018 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-29329273

RESUMEN

Depression as a common complication of brain tumors. Is there a possible common pathogenesis for depression and glioma? The most serious major depressive disorder (MDD) and glioblastoma (GBM) in both diseases are studied, to explore the common pathogenesis between the two diseases. In this article, we first rely on transcriptome data to obtain reliable and useful differentially expressed genes (DEGs) by differential expression analysis. Then, we used the transcriptomics of DEGs to find out and analyze the common pathway of MDD and GBM from three directions. Finally, we determine the important biological pathways that are common to MDD and GBM by statistical knowledge. Our findings provide the first direct transcriptomic evidence that common pathway in two diseases for the common pathogenesis of the human MDD and GBM. Our results provide a new reference methods and values for the study of the pathogenesis of depression and glioblastoma.


Asunto(s)
Trastorno Depresivo Mayor/genética , Regulación de la Expresión Génica , Glioblastoma/genética , Transcriptoma/genética , Algoritmos , Minería de Datos , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Mapas de Interacción de Proteínas/genética
8.
BMC Genomics ; 18(Suppl 2): 143, 2017 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-28361671

RESUMEN

BACKGROUND: The mass spectrometry based technical pipeline has provided a high-throughput, high-sensitivity and high-resolution platform for post-genomic biology. Varied models and algorithms are implemented by different tools to improve proteomics data analysis. The target-decoy searching strategy has become the most popular strategy to control false identification in peptide and protein identifications. While this strategy can estimate the false discovery rate (FDR) within a dataset, it cannot directly evaluate the false positive matches in target identifications. RESULTS: As a supplement to target-decoy strategy, the entrapment sequence method was introduced to assess the key steps of mass spectrometry data analysis process, database search engines and quality control methods. Using the entrapment sequences as the standard, we evaluated five database search engines for both the origanal scores and reprocessed scores, as well as four quality control methods in term of quantity and quality aspects. Our results showed that the latest developed search engine MS-GF+ and percolator-embeded quality control method PepDistiller performed best in all tools respectively. Combined with efficient quality control methods, the search engines can improve the low sensitivity of their original scores. Moreover, based on the entrapment sequence method, we proved that filtering the identifications separately could increase the number of identified peptides while improving the confidence level. CONCLUSION: In this study, we have proved that the entrapment sequence method could be an useful strategy to assess the key steps of the mass spectrometry data analysis process. Its applications can be extended to all steps of the common workflow, such as the protein assembling methods and data integration methods.


Asunto(s)
Proteínas Arqueales/aislamiento & purificación , Péptidos/aislamiento & purificación , Proteómica/métodos , Motor de Búsqueda , Análisis de Secuencia de Proteína/métodos , Proteínas Arqueales/química , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Humanos , Péptidos/química , Pyrococcus furiosus/química , Control de Calidad , Espectrometría de Masas en Tándem
9.
Int J Mol Sci ; 18(12)2017 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-29257106

RESUMEN

Bipolar disorder is a common and severe mental illness with unsolved pathophysiology. A genome-wide association study (GWAS) has been used to find a number of risk genes, but it is difficult for a GWAS to find genes indirectly associated with a disease. To find core hub genes, we introduce a network analysis after the GWAS was conducted. Six thousand four hundred fifty eight single nucleotide polymorphisms (SNPs) with p < 0.01 were sifted out from Wellcome Trust Case Control Consortium (WTCCC) dataset and mapped to 2045 genes, which are then compared with the protein-protein network. One hundred twelve genes with a degree >17 were chosen as hub genes from which five significant modules and four core hub genes (FBXL13, WDFY2, bFGF, and MTHFD1L) were found. These core hub genes have not been reported to be directly associated with BD but may function by interacting with genes directly related to BD. Our method engenders new thoughts on finding genes indirectly associated with, but important for, complex diseases.


Asunto(s)
Trastorno Bipolar/genética , Redes Reguladoras de Genes , Polimorfismo de Nucleótido Simple , Proteínas F-Box/genética , Factores de Crecimiento de Fibroblastos/genética , Estudio de Asociación del Genoma Completo , Humanos , Péptidos y Proteínas de Señalización Intracelular/genética , Metilenotetrahidrofolato Deshidrogenasa (NADP)/genética , Antígenos de Histocompatibilidad Menor/genética
10.
Yi Chuan ; 36(7): 669-78, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25076031

RESUMEN

Phylogenomics is a new phylogenetic field that aims to rebuild phylogenetic relationship of organisms using whole genome data. It can effectively eliminate the impact of horizontal gene transfer and variant evolutionary rates on phylogeny. According to the genome data type they are based on, these methods can be classified into five groups: multi-gene based, gene content based, gene order based, K-string based, and metabolic pathway based. The mechanism, speed, accuracy, applicable range and their application of these methods are summarized. The prospects of phylogenomics and challenges that it is faced with are also discussed.


Asunto(s)
Eucariontes/genética , Genómica/tendencias , Filogenia , Animales , Genoma , Genómica/instrumentación , Genómica/métodos , Humanos
11.
Sheng Wu Gong Cheng Xue Bao ; 39(9): 3579-3593, 2023 Sep 25.
Artículo en Zh | MEDLINE | ID: mdl-37805839

RESUMEN

Data-independent acquisition (DIA) is a high-throughput, unbiased mass spectrometry data acquisition method which has good quantitative reproducibility and is friendly to low-abundance proteins. It becomes the preferred choice for clinical proteomic studies especially for large cohort studies in recent years. The mass-spectrometry (MS)/MS spectra generated by DIA is usually heavily mixed with fragment ion information of multiple peptides, which makes the protein identification and quantification more difficult. Currently, DIA data analysis methods fall into two main categories, namely peptide-centric and spectrum-centric. The peptide-centric strategy is more sensitive for identification and more accurate for quantification. Thus, it has become the mainstream strategy for DIA data analysis, which includes four key steps: building a spectral library, extracting ion chromatogram, feature scoring and statistical quality control. This work reviews the peptide-centric DIA data analysis procedure, introduces the corresponding algorithms and software tools, and summarizes the improvements for the existing algorithms. Finally, the future development directions are discussed.


Asunto(s)
Péptidos , Proteómica , Humanos , Proteómica/métodos , Reproducibilidad de los Resultados , Péptidos/química , Programas Informáticos , Algoritmos , Espectrometría de Masas en Tándem/métodos , Proteoma/análisis
12.
Artículo en Inglés | MEDLINE | ID: mdl-37864708

RESUMEN

Detecting low-abundance mutations is of particular interest in the fields of biology and medical science. However, most currently available molecular assays have limited sensitivity for the detection of low-abundance mutations. Here, we established a platform for detecting low-level DNA mutations with high sensitivity and accuracy by combining enhanced-ice-COLD-PCR (E-ice-COLD-PCR) and pyrosequencing with di-base addition (PDBA). The PDBA assay was performed by selectively adding one di-base (AG, CT, AC, GT, AT, or GC) instead of one base (A, T, C, or G) into the reaction at a time during sequencing primer extension and thus enabling to increase the sequencing intensity. A specific E-ice-COLD-PCR/PDBA assay was developed for the detection of the most frequent BRAF V600E mutation to verify the feasibility of our method. E-ice-COLD-PCR/PDBA assay permitted the reliable detection of down to 0.007% of mutant alleles in a wild-type background. Furthermore, it required only a small amount of starting material (20 pg) to sensitively detect and identify low-abundance mutations, thus increasing the screening capabilities in limited DNA material. The E-ice-COLD-PCR/PDBA assay was applied in the current study to clinical formalin-fixed paraffin-embedded (FFPE) and plasma samples, and it enabled the detection of BRAF V600E mutations in samples that appeared as a wild type using PCR/conventional pyrosequencing (CP) and E-ice-COLD-PCR/CP. E-ice-COLD-PCR/PDBA assay is a rapid, cost-effective, and highly sensitive method that could improve the detection of low-abundance mutations in routine clinical use.

13.
Sci Rep ; 13(1): 20444, 2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-37993475

RESUMEN

Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.


Asunto(s)
Investigación Biomédica , Polimorfismo de Nucleótido Simple , Algoritmos , Frecuencia de los Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
14.
Bioinform Adv ; 3(1): vbad057, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37128577

RESUMEN

Summary: De novo peptide sequencing for tandem mass spectrometry data is not only a key technology for novel peptide identification, but also a precedent task for many downstream tasks, such as vaccine and antibody studies. In recent years, neural network models for de novo peptide sequencing have manifested a remarkable ability to accommodate various data sources and outperformed conventional peptide identification tools. However, the excellent model is computationally expensive, taking up to 1 week to process about 400 000 spectrums. This article presents PGPointNovo, a novel neural network-based tool for parallel de novo peptide sequencing. PGPointNovo uses data parallelization technology to accelerate training and inference and optimizes the training obstacles caused by large batch sizes. The results of extensive experiments conducted on multiple datasets of different sizes demonstrate that compared with PointNovo the excellent neural network-based de novo peptide sequencing tool, PGPointNovo, accelerates de novo peptide sequencing by up to 7.35× without precision or recall compromises. Availability and implementation: The source code and the parameter settings are available at https://github.com/shallFun4Learning/PGPointNovo. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

15.
Sheng Wu Gong Cheng Xue Bao ; 39(4): 1815-1824, 2023 Apr 25.
Artículo en Zh | MEDLINE | ID: mdl-37154341

RESUMEN

Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.


Asunto(s)
Antibacterianos , Péptidos Catiónicos Antimicrobianos , Antibacterianos/farmacología , Antibacterianos/química , Secuencia de Aminoácidos , Péptidos Catiónicos Antimicrobianos/farmacología , Péptidos Catiónicos Antimicrobianos/química , Péptidos Antimicrobianos , Procesamiento de Lenguaje Natural
16.
Sheng Wu Gong Cheng Xue Bao ; 38(10): 3616-3627, 2022 Oct 25.
Artículo en Zh | MEDLINE | ID: mdl-36305397

RESUMEN

Cancer is a heterogeneous disease with complex mechanisms that requires targeted precision medicine strategies. The growth of precision medicine is indispensable from the rapid development of genomics. However, genomics has certain limitations in molecular phenotype analysis, proteogenomics thus arose at the right time. Proteogenomics is the merging of proteomics and genomics. This review describes the limitations of genomic analysis and highlights the importance of proteogenomics to re-understand precision oncology from a proteogenomic perspective. In addition, the application of proteogenomics in precision oncology is briefly introduced, the related public data projects are described, and finally, the challenges that need to be addressed at this stage are proposed.


Asunto(s)
Neoplasias , Proteogenómica , Humanos , Medicina de Precisión , Neoplasias/genética , Proteómica , Genómica
17.
Comput Math Methods Med ; 2022: 7300788, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36479313

RESUMEN

Hepatocellular carcinoma (LIHC) is the fifth common cancer worldwide, and it requires effective diagnosis and treatment to prevent aggressive metastasis. The purpose of this study was to construct a machine learning-based diagnostic model for the diagnosis of liver cancer. Using weighted correlation network analysis (WGCNA), univariate analysis, and Lasso-Cox regression analysis, protein-protein interactions network analysis is used to construct gene networks from transcriptome data of hepatocellular carcinoma patients and find hub genes for machine learning. The five models, including gradient boosting, random forest, support vector machine, logistic regression, and integrated learning, were to identify a multigene prediction model of patients. Immunological assessment, TP53 gene mutation and promoter methylation level analysis, and KEGG pathway analysis were performed on these groups. Potential drug molecular targets for the corresponding hepatocellular carcinomas were obtained by molecular docking for analysis, resulting in the screening of 2 modules that may be relevant to the survival of hepatocellular carcinoma patients, and the construction of 5 diagnostic models and multiple interaction networks. The modes of action of drug-molecule interactions that may be effective against hepatocellular carcinoma core genes CCNA2, CCNB1, and CDK1 were investigated. This study is expected to provide research ideas for early diagnosis of hepatocellular carcinoma.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Simulación del Acoplamiento Molecular , Aprendizaje Automático
18.
Biol Direct ; 17(1): 13, 2022 06 05.
Artículo en Inglés | MEDLINE | ID: mdl-35659725

RESUMEN

BACKGROUND: The evolution of spliceosomal introns has been widely studied among various eukaryotic groups. Researchers nearly reached the consensuses on the pattern and the mechanisms of intron losses and gains across eukaryotes. However, according to previous studies that analyzed a few genes or genomes, Nematoda seems to be an eccentric group. RESULTS: Taking advantage of the recent accumulation of sequenced genomes, we extensively analyzed the intron losses and gains using 104 nematode genomes across all the five Clades of the phylum. Nematodes have a wide range of intron density, from less than one to more than nine per kbp coding sequence. The rates of intron losses and gains exhibit significant heterogeneity both across different nematode lineages and across different evolutionary stages of the same lineage. The frequency of intron losses far exceeds that of intron gains. Five pieces of evidence supporting the model of cDNA-mediated intron loss have been observed in ten Caenorhabditis species, the dominance of the precise intron losses, frequent loss of adjacent introns, high-level expression of the intron-lost genes, preferential losses of short introns, and the preferential losses of introns close to 3'-ends of genes. Like studies in most eukaryotic groups, we cannot find the source sequences for the limited number of intron gains detected in the Caenorhabditis genomes. CONCLUSIONS: These results indicate that nematodes are a typical eukaryotic group rather than an outlier in intron evolution.


Asunto(s)
Nematodos , Animales , Secuencia de Bases , Eucariontes/genética , Evolución Molecular , Intrones , Nematodos/genética , Filogenia , Empalmosomas/genética
19.
Front Oncol ; 12: 847706, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35651795

RESUMEN

Gastric cancer (GC) is one of the most common malignant tumors with a high mortality rate worldwide and lacks effective methods for prognosis prediction. Postoperative adjuvant chemotherapy is the first-line treatment for advanced gastric cancer, but only a subgroup of patients benefits from it. Here, we used 833 formalin-fixed, paraffin-embedded resected tumor samples from patients with TNM stage II/III GC and established a proteomic subtyping workflow using 100 deep-learned features. Two proteomic subtypes (S-I and S-II) with overall survival differences were identified. S-I has a better survival rate and is sensitive to chemotherapy. Patients in the S-I who received adjuvant chemotherapy had a significant improvement in the 5-year overall survival rate compared with patients who received surgery alone (65.3% vs 52.6%; log-rank P = 0.014), but no improvement was observed in the S-II (54% vs 51%; log-rank P = 0.96). These results were verified in an independent validation set. Furthermore, we also evaluated the superiority and scalability of the deep learning-based workflow in cancer molecular subtyping, exhibiting its great utility and potential in prognosis prediction and therapeutic decision-making.

20.
Front Plant Sci ; 13: 839457, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35242159

RESUMEN

Plant circadian clock coordinates endogenous transcriptional rhythms with diurnal changes of environmental cues. OsPRR37, a negative component in the rice circadian clock, reportedly regulates transcriptome rhythms, and agronomically important traits. However, the underlying regulatory mechanisms of OsPRR37-output genes remain largely unknown. In this study, whole genome bisulfite sequencing and high-throughput RNA sequencing were applied to verify the role of DNA methylation in the transcriptional control of OsPRR37-output genes. We found that the overexpression of OsPRR37 suppressed rice growth and altered cytosine methylations in CG and CHG sequence contexts in but not the CHH context (H represents A, T, or C). In total, 35 overlapping genes were identified, and 25 of them showed negative correlation between the methylation level and gene expression. The promoter of the hexokinase gene OsHXK1 was hypomethylated at both CG and CHG sites, and the expression of OsHXK1 was significantly increased. Meanwhile, the leaf starch content was consistently lower in OsPRR37 overexpression lines than in the recipient parent Guangluai 4. Further analysis with published data of time-course transcriptomes revealed that most overlapping genes showed peak expression phases from dusk to dawn. The genes involved in DNA methylation, methylation maintenance, and DNA demethylation were found to be actively expressed around dusk. A DNA glycosylase, namely ROS1A/DNG702, was probably the upstream candidate that demethylated the promoter of OsHXK1. Taken together, our results revealed that CG and CHG methylation contribute to the transcriptional regulation of OsPRR37-output genes, and hypomethylation of OsHXK1 leads to decreased starch content and reduced plant growth in rice.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA