Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 40(Supplement_1): i257-i265, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940141

RESUMO

MOTIVATION: Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. RESULTS: We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses. AVAILABILITY AND IMPLEMENTATION: The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.


Assuntos
Bases de Dados de Proteínas , Peptídeos , Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Peptídeos/química , Humanos , Espectrometria de Massas em Tandem/métodos , Aprendizado Profundo , Software
2.
Gut Microbes ; 16(1): 2302076, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38214657

RESUMO

We developed MicroKPNN, a prior-knowledge guided interpretable neural network for microbiome-based human host phenotype prediction. The prior knowledge used in MicroKPNN includes the metabolic activities of different bacterial species, phylogenetic relationships, and bacterial community structure, all in a shallow neural network. Application of MicroKPNN to seven gut microbiome datasets (involving five different human diseases including inflammatory bowel disease, type 2 diabetes, liver cirrhosis, colorectal cancer, and obesity) shows that incorporation of the prior knowledge helped improve the microbiome-based host phenotype prediction. MicroKPNN outperformed fully connected neural network-based approaches in all seven cases, with the most improvement of accuracy in the prediction of type 2 diabetes. MicroKPNN outperformed a recently developed deep-learning based approach DeepMicro, which selects the best combination of autoencoder and machine learning approach to make predictions, in all of the seven cases. Importantly, we showed that MicroKPNN provides a way for interpretation of the predictive models. Using importance scores estimated for the hidden nodes, MicroKPNN could provide explanations for prior research findings by highlighting the roles of specific microbiome components in phenotype predictions. In addition, it may suggest potential future research directions for studying the impacts of microbiome on host health and diseases. MicroKPNN is publicly available at https://github.com/mgtools/MicroKPNN.


Assuntos
Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Microbiota , Humanos , Filogenia , Diabetes Mellitus Tipo 2/microbiologia , Microbiota/genética , Fenótipo
3.
Nat Commun ; 14(1): 7974, 2023 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-38042873

RESUMO

De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we present PepNet, a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data.


Assuntos
Análise de Sequência de Proteína , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Análise de Sequência de Proteína/métodos , Peptídeos , Sequência de Aminoácidos , Redes Neurais de Computação , Algoritmos , Biblioteca de Peptídeos
4.
J Proteome Res ; 22(2): 442-453, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36688801

RESUMO

The microbiome has been shown to be important for human health because of its influence on disease and the immune response. Mass spectrometry is an important tool for evaluating protein expression and species composition in the microbiome but is technically challenging and time-consuming. Multiplexing has emerged as a way to make spectrometry workflows faster while improving results. Here, we present MetaProD (MetaProteomics in Django) as a highly configurable metaproteomic data analysis pipeline supporting label-free and multiplexed mass spectrometry. The pipeline is open-source, uses fully open-source tools, and is integrated with Django to offer a web-based interface for configuration and data access. Benchmarking of MetaProD using multiple metaproteomics data sets showed that MetaProD achieved fast and efficient identification of peptides and proteins. Application of MetaProD to a multiplexed cancer data set resulted in identification of more differentially expressed human proteins in cancer tissues versus healthy tissues as compared to previous studies; in addition, MetaProD identified bacterial proteins in those samples, some of which are differentially abundant.


Assuntos
Microbiota , Proteômica , Humanos , Proteômica/métodos , Espectrometria de Massas , Proteínas de Bactérias , Análise Espectral
5.
J Comput Biol ; 29(7): 738-751, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35584271

RESUMO

Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a subtractive assembly approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs), which were subsequently used for the retrieval of differential kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes all genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples.


Assuntos
Metagenômica , Microbiota , Análise por Conglomerados , Metagenoma , Metagenômica/métodos , Microbiota/genética , Fenótipo
6.
J Cachexia Sarcopenia Muscle ; 13(1): 728-742, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34877814

RESUMO

BACKGROUND: Most of the microRNAs (MiRs) involved in myogenesis are transcriptional regulated. The role of MiR biogenesis in myogenesis has not been characterized yet. RNA-binding protein Musashi 2 (Msi2) is considered to be one of the major drivers for oncogenesis and stem cell proliferation. The functions of Msi2 in myogenesis have not been explored yet. We sought to investigate Msi2-regulated biogenesis of MiRs in myogenesis and muscle stem cell (MuSC) ageing. METHODS: We detected the expression of Msi2 in MuSCs and differentiated myotubes by quantitative reverse transcription PCR (RT-qPCR) and western blot. Msi2-binding partner human antigen R (HuR) was identified by immunoprecipitation followed by mass spectrometry analysis. The cooperative binding of Msi2 and HuR on MiR7a-1 was analysed by RNA immunoprecipitation and electrophoresis mobility shift assays. The inhibition of the processing of pri-MiR7a-1 mediated by Msi2 and HuR was shown by Msi2 and HuR knockdown. Immunofluorescent staining, RT-qPCR and immunoblotting were used to characterize the function of MiR7a-1 in myogenesis. Msi2 and HuR up-regulate cryptochrome circadian regulator 2 (Cry2) via MiR7a-1 was confirmed by the luciferase assay and western blot. The post-transcriptional regulatory cascade was further confirmed by RNAi and overexpressing of Msi2 and HuR in MuSCs, and the in vivo function was characterized by histopathological and molecular biological methods in Msi2 knockout mice. RESULTS: We identified a post-transcription regulatory cascade governed by a pair of RNA-binding proteins Msi2 and HuR. Msi2 is enriched in differentiated muscle cells and promotes MuSC differentiation despite its pro-proliferation functions in other cell types. Msi2 works synergistically with another RNA-binding protein HuR to repress the biogenesis of MiR7a-1 in an Msi2 dose-dependent manner to regulate the translation of the key component of the circadian core oscillator complex Cry2. Down-regulation of Cry2 (0.6-fold, vs. control, P < 0.05) mediated by MiR7a-1 represses MuSC differentiation. The disruption of this cascade leads to differentiation defects of MuSCs. In aged muscles, Msi2 (0.3-fold, vs. control, P < 0.01) expression declined, and the Cry2 protein level also decreases (0.5-fold, vs. control, P < 0.05), suggesting that the disruption of the Msi2-mediated post-transcriptional regulatory cascade could attribute to the declined ability of muscle regeneration in aged skeletal muscle. CONCLUSIONS: Our findings have identified a new post-transcriptional cascade regulating myogenesis. The cascade is disrupted in skeletal muscle ageing, which leads to declined muscle regeneration ability.


Assuntos
MicroRNAs , Desenvolvimento Muscular , Proteínas de Ligação a RNA/metabolismo , Animais , Diferenciação Celular/genética , Camundongos , MicroRNAs/genética , MicroRNAs/metabolismo , Desenvolvimento Muscular/genética , Fibras Musculares Esqueléticas/metabolismo , Mioblastos/metabolismo
7.
Microbiome ; 9(1): 80, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33795009

RESUMO

BACKGROUND: A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Video Abstract METHODS: Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). RESULTS: We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. CONCLUSIONS: The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.


Assuntos
Microbioma Gastrointestinal , Microbioma Gastrointestinal/genética , Humanos , Metagenômica , Peptídeos/genética , Proteômica , Espectrometria de Massas em Tandem
8.
Anal Chem ; 92(6): 4275-4283, 2020 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-32053352

RESUMO

The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m/z. Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).


Assuntos
Redes Neurais de Computação , Peptídeos/análise , Espectrometria de Massas em Tandem
9.
Mol Cell Proteomics ; 18(8 suppl 1): S183-S192, 2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31142575

RESUMO

Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.


Assuntos
Algoritmos , Microbiota/genética , Proteogenômica/métodos , Água do Mar/microbiologia , Águas Residuárias/microbiologia , Bases de Dados de Proteínas , Variação Genética , Peptídeos/genética , Espectrometria de Massas em Tandem
10.
Pac Symp Biocomput ; 24: 236-247, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864326

RESUMO

The microbiome research is going through an evolutionary transition from focusing on the characterization of reference microbiomes associated with different environments/hosts to the translational applications, including using microbiome for disease diagnosis, improving the effcacy of cancer treatments, and prevention of diseases (e.g., using probiotics). Microbial markers have been identified from microbiome data derived from cohorts of patients with different diseases, treatment responsiveness, etc, and often predictors based on these markers were built for predicting host phenotype given a microbiome dataset (e.g., to predict if a person has type 2 diabetes given his or her microbiome data). Unfortunately, these microbial markers and predictors are often not published so are not reusable by others. In this paper, we report the curation of a repository of microbial marker genes and predictors built from these markers for microbiome-based prediction of host phenotype, and a computational pipeline called Mi2P (from Microbiome to Phenotype) for using the repository. As an initial effort, we focus on microbial marker genes related to two diseases, type 2 diabetes and liver cirrhosis, and immunotherapy efficacy for two types of cancer, non-small-cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We characterized the marker genes from metagenomic data using our recently developed subtractive assembly approach. We showed that predictors built from these microbial marker genes can provide fast and reasonably accurate prediction of host phenotype given microbiome data. As understanding and making use of microbiome data (our second genome) is becoming vital as we move forward in this age of precision health and precision medicine, we believe that such a repository will be useful for enabling translational applications of microbiome data.


Assuntos
Genes Microbianos , Interações entre Hospedeiro e Microrganismos/genética , Microbiota/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/microbiologia , Carcinoma Pulmonar de Células não Pequenas/terapia , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/microbiologia , Carcinoma de Células Renais/terapia , Biologia Computacional/métodos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/microbiologia , Marcadores Genéticos , Humanos , Imunoterapia , Neoplasias Renais/genética , Neoplasias Renais/microbiologia , Neoplasias Renais/terapia , Cirrose Hepática/genética , Cirrose Hepática/microbiologia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/microbiologia , Neoplasias Pulmonares/terapia , Aprendizado de Máquina , Metagenômica/métodos , Metagenômica/estatística & dados numéricos , Fenótipo , Pesquisa Translacional Biomédica
11.
PLoS Comput Biol ; 12(12): e1005224, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27918579

RESUMO

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.


Assuntos
Metagenômica/métodos , Proteínas/classificação , Proteínas/genética , Proteômica/métodos , Algoritmos , Humanos , Microbiota/genética , Peptídeos/classificação , Peptídeos/genética , Espectrometria de Massas em Tandem
12.
Biochem Genet ; 47(7-8): 533-9, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19565205

RESUMO

Microsatellite markers and D-loop sequences of mtDNA from a female allotetraploid parent carp and her progenies of generations 1 and 2 induced by sperm of five distant fish species were analyzed. Eleven microsatellite markers were used to identify 48 alleles from the allotetraploid female. The same number of alleles (48) appeared in the first and second generations of the gynogenetic offspring, regardless of the source of the sperm used as an activator. The mtDNA D-loop analysis was performed on the female tetraploid parent, 25 gynogenetic offspring, and 5 sperm-donor species. Fourteen variable sites from the 1,018 bp sequences were observed in the offspring as compared to the female tetraploid parent. Results from D-loop sequence and microsatellite marker analysis showed exclusive maternal transmission, and no genetic information was derived from the father. Our study suggests that progenies of artificial tetraploid carp are genetically stable, which is important for genetic breeding of this tetraploid fish.


Assuntos
Carpas/genética , Carpas/fisiologia , Poliploidia , Reprodução/fisiologia , Espermatozoides/metabolismo , Animais , Sequência de Bases , Carpas/classificação , DNA Mitocondrial/genética , Feminino , Instabilidade Genômica , Masculino , Repetições de Microssatélites/genética , Filogenia
13.
Sci China C Life Sci ; 46(6): 595-604, 2003 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18758716

RESUMO

A polyploid hybrid fish with natural gynogenesis can prevent segregation and maintain their hybrid vigor in their progenies. Supposing the reproduction mode of induced polyploid fish being natural gynogenesis, allopolyploid hybrid between common carp and crucian carp into allopolyploid was performed. The purpose of this paper is to describe a lineage from sexual diploid carp transforming into allotriploid and allotetraploid unisexual clones by genome addition. The diploid hybrid between common carp and crucian carp reproduces an unreduced nucleus consisting of two parental genomes. This unreduced female pronucleus will fuse with male pronucleus and form allotriploid zygote after penetration of related species sperms. Allotriploid embryos grow normally, and part of female allotriploid can produce unreduced mature ova with three genomes. Mature ova of most allotriploid females are provided with natural gynogenetic trait and their nuclei do not fuse with any entrance sperm. All female offspring are produced by gynogenesis of allotriploid egg under activation of penetrating sperms. These offspring maintain morphological traits of their allotriploid maternal and form an allotetraploid unisexual clone by gynogenetic reproduction mode. However, female nuclei of rare allotriploid female can fuse with penetrating male pronuclei and result in the appearance of allotetraploid individuals by means of genome addition. All allotetraploid females can reproduce unreduced mature eggs containing four genomes. Therefore, mature eggs of allotetraploid maintain gynogenetic trait and allotetraploid unisexual clone is produced under activation of related species sperms.

14.
Hereditas ; 137(2): 140-4, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12627840

RESUMO

Mature eggs of allotetraploid carp were activated by inactive sperm or crossed with normal sperms of common carp (Cyprinus carpio), crucian carp (Carassius auratus), Chinese blunt snout bream (Megalobrama amblycephala), Hemiculter leucisculus and Pseudorasbora parva. Chromosome counts showed that all offspring of these crosses presented a mode number of 200 chromosomes (4n = 200), and their morphological traits are much like maternal. Microsatelite marker and RAPD patterns between allotetraploid maternal and its offspring, reproduced from different paternal species, were identical. Cytological, morphological and molecular evidences suggested that allotetraploid carp female nucleus would not fuse with any male nucleus and its reproduction mode might be gynogenesis and therefore their offspring are retaining their tetraploidy and give origin to clonal individuals.


Assuntos
Carpas/fisiologia , Poliploidia , Animais , Carpas/genética , Feminino , Hibridização Genética , Masculino , Razão de Masculinidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA