Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Nat Commun ; 14(1): 4046, 2023 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-37422459

RESUMO

Here we present GlycanFinder, a database search and de novo sequencing tool for the analysis of intact glycopeptides from mass spectrometry data. GlycanFinder integrates peptide-based and glycan-based search strategies to address the challenge of complex fragmentation of glycopeptides. A deep learning model is designed to capture glycan tree structures and their fragment ions for de novo sequencing of glycans that do not exist in the database. We performed extensive analyses to validate the false discovery rates (FDRs) at both peptide and glycan levels and to evaluate GlycanFinder based on comprehensive benchmarks from previous community-based studies. Our results show that GlycanFinder achieved comparable performance to other leading glycoproteomics softwares in terms of both FDR control and the number of identifications. Moreover, GlycanFinder was also able to identify glycopeptides not found in existing databases. Finally, we conducted a mass spectrometry experiment for antibody N-linked glycosylation profiling that could distinguish isomeric peptides and glycans in four immunoglobulin G subclasses, which had been a challenging problem to previous studies.


Assuntos
Glicopeptídeos , Espectrometria de Massas em Tandem , Glicopeptídeos/química , Espectrometria de Massas em Tandem/métodos , Software , Glicosilação , Polissacarídeos
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34891158

RESUMO

In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Imunoterapia , Proteínas/química
4.
Sci Rep ; 11(1): 18249, 2021 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-34521906

RESUMO

A promising technique of discovering disease biomarkers is to measure the relative protein abundance in multiple biofluid samples through liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics. The key step involves peptide feature detection in the LC-MS map, along with its charge and intensity. Existing heuristic algorithms suffer from inaccurate parameters and human errors. As a solution, we propose PointIso, the first point cloud based arbitrary-precision deep learning network to address this problem. It consists of attention based scanning step for segmenting the multi-isotopic pattern of 3D peptide features along with the charge, and a sequence classification step for grouping those isotopes into potential peptide features. PointIso achieves 98% detection of high-quality MS/MS identified peptide features in a benchmark dataset. Next, the model is adapted for handling the additional 'ion mobility' dimension and achieves 4% higher detection than existing algorithms on the human proteome dataset. Besides contributing to the proteomics study, our novel segmentation technique should serve the general object detection domain as well.

5.
Hum Mutat ; 42(10): 1229-1238, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34233069

RESUMO

Accurate profiling of population-specific recessive diseases is essential for the design of cost-effective carrier screening programs. However, minority populations and ethnic groups, including Vietnamese, are still underrepresented in existing genetic studies. Here, we reported the first comprehensive study of recessive diseases in the Vietnamese population. Clinical exome sequencing data of 4503 disease-associated genes obtained from a cohort of 985 Vietnamese individuals was analyzed to identify pathogenic variants, associated diseases and their carrier frequencies in the population. A total of 118 recessive diseases associated with 164 pathogenic or likely pathogenic variants were identified, among which 28 diseases had carrier frequencies of at least 1% (1 in 100 individuals). Three diseases were prevalent in the Vietnamese population with carrier frequencies of 2-12 times higher than in the world populations, including beta-thalassemia (1 in 23), citrin deficiency (1 in 31), and phenylketonuria (1 in 40). Seven novel pathogenic and two likely pathogenic variants associated with nine recessive diseases were discovered. The comprehensive profile of recessive diseases identified in this study enables the design of cost-effective carrier screening programs specific to the Vietnamese population.


Assuntos
Etnicidade , Exoma , Povo Asiático , Estudos de Coortes , Exoma/genética , Humanos , Sequenciamento do Exoma
6.
Sci Rep ; 10(1): 19142, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33154511

RESUMO

The under-representation of several ethnic groups in existing genetic databases and studies have undermined our understanding of the genetic variations and associated traits or diseases in many populations. Cost and technology limitations remain the challenges in performing large-scale genome sequencing projects in many developing countries, including Vietnam. As one of the most rapidly adopted genetic tests, non-invasive prenatal testing (NIPT) data offers an alternative untapped resource for genetic studies. Here we performed a large-scale genomic analysis of 2683 pregnant Vietnamese women using their NIPT data and identified a comprehensive set of 8,054,515 single-nucleotide polymorphisms, among which 8.2% were new to the Vietnamese population. Our study also revealed 24,487 disease-associated genetic variants and their allele frequency distribution, especially 5 pathogenic variants for prevalent genetic disorders in Vietnam. We also observed major discrepancies in the allele frequency distribution of disease-associated genetic variants between the Vietnamese and other populations, thus highlighting a need for genome-wide association studies dedicated to the Vietnamese population. The resulted database of Vietnamese genetic variants, their allele frequency distribution, and their associated diseases presents a valuable resource for future genetic studies.


Assuntos
Alelos , Povo Asiático/genética , Frequência do Gene , Testes Genéticos , Genótipo , Teste Pré-Natal não Invasivo , Feminino , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Gravidez , Vietnã
7.
Sci Rep ; 9(1): 17168, 2019 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-31748623

RESUMO

Liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics provides the relative different protein abundance in healthy and disease-afflicted patients, which offers the information for molecular interactions, signaling pathways, and biomarker identification to serve the drug discovery and clinical research. Typical analysis workflow begins with the peptide feature detection and intensity calculation from LC-MS map. We are the first to propose a deep learning based model, DeepIso, that combines recent advances in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to detect peptide features of different charge states, as well as, estimate their intensity. Existing tools are designed with limited engineered features and domain-specific parameters, which are hardly updated despite a huge amount of new coming proteomic data. On the other hand, DeepIso consisting of two separate deep learning based modules, learns multiple levels of representation of high dimensional data itself through many layers of neurons, and adaptable to newly acquired data. The peptide feature list reported by our model matches with 97.43% of high quality MS/MS identifications in a benchmark dataset, which is higher than the matching produced by several widely used tools. Our results demonstrate that novel deep learning tools are desirable to advance the state-of-the-art in protein identification and quantification.


Assuntos
Peptídeos/química , Biomarcadores/química , Cromatografia Líquida/métodos , Aprendizado Profundo , Redes Neurais de Computação , Neurônios/metabolismo , Proteínas/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho
8.
Nat Methods ; 16(1): 63-66, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30573815

RESUMO

We present DeepNovo-DIA, a de novo peptide-sequencing method for data-independent acquisition (DIA) mass spectrometry data. We use neural networks to capture precursor and fragment ions across m/z, retention-time, and intensity dimensions. They are then further integrated with peptide sequence patterns to address the problem of highly multiplexed spectra. DIA coupled with de novo sequencing allowed us to identify novel peptides in human antibodies and antigens.


Assuntos
Aprendizado Profundo , Espectrometria de Massas/métodos , Peptídeos/química , Bases de Dados de Proteínas , Humanos
9.
J Clin Med Res ; 10(5): 429-436, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29581806

RESUMO

BACKGROUND: The benefit of computer-assisted planning in orthognathic surgery (OGS) has been extensively documented over the last decade. This study aimed to evaluate the accuracy of three-dimensional (3D) virtual planning in surgery-first OGS. METHODS: Fifteen patients with skeletal class III malocclusion who underwent bimaxillary OGS with surgery-first approach were included. A composite skull model was reconstructed using data from cone-beam computed tomography and stereolithography from a scanned dental cast. Surgical procedures were simulated using Simplant O&O software, and the virtual plan was transferred to the operation room using 3D-printed splints. Differences of the 3D measurements between the virtual plan and postoperative results were evaluated, and the accuracy was reported using root mean square deviation (RMSD) and the Bland-Altman method. RESULTS: The virtual planning was successfully transferred to surgery. The overall mean linear difference was 0.88 mm (0.79 mm for the maxilla and 1 mm for the mandible), and the overall mean angular difference was 1.16°. The RMSD ranged from 0.86 to 1.46 mm and 1.27° to 1.45°, within the acceptable clinical criteria. CONCLUSION: In this study, virtual surgical planning and 3D-printed surgical splints facilitated the diagnosis and treatment planning, and offered an accurate outcome in surgery-first OGS.

10.
Proteomics ; 18(2)2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29239117

RESUMO

Deep learning has revolutionized research in image processing, speech recognition, natural language processing, game playing, and will soon revolutionize research in proteomics and genomics. Through three examples in genomics, protein structure prediction, and proteomics, we demonstrate that deep learning is changing bioinformatics research, shifting from algorithm-centric to data-centric approaches.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Proteômica/métodos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Proteínas/genética , Proteínas/metabolismo
11.
Proc Natl Acad Sci U S A ; 114(31): 8247-8252, 2017 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-28720701

RESUMO

De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7-22.9% higher accuracy at the amino acid level and 38.1-64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5-100% coverage and 97.2-99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.

12.
Sci Rep ; 6: 31730, 2016 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-27562653

RESUMO

De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100% coverage, and 96.64-100% accuracy.


Assuntos
Anticorpos Monoclonais/química , Análise de Sequência de Proteína/métodos , Aminoácidos/química , Animais , Automação , Galinhas , Cromatografia Líquida , Quimotripsina/química , Biologia Computacional , Mapeamento de Sequências Contíguas , Glicosilação , Humanos , Imunoglobulina G/química , Metaloendopeptidases/química , Muramidase , Peptídeos/química , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Espectrometria de Massas em Tandem , Tripsina/química
13.
Nucleic Acids Res ; 42(20): 12380-7, 2014 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-25300490

RESUMO

Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in different human cells. In this paper, the architectures of the 41 regulatory networks and the distributions of housekeeping and specific regulatory interactions are investigated. The TF regulatory networks of different human cell types demonstrate similar global three-layer (top, core and bottom) hierarchical architectures, which are greatly different from the yeast TF regulatory network. However, they have distinguishable local organizations, as suggested by the fact that wiring patterns of only a few TFs are enough to distinguish cell identities. The TF regulatory network of human embryonic stem cells (hESCs) is dense and enriched with interactions that are unseen in the networks of other cell types. The examination of specific regulatory interactions suggests that specific interactions play important roles in hESCs.


Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Algoritmos , Células-Tronco Embrionárias/metabolismo , Humanos
14.
BMC Res Notes ; 7: 320, 2014 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-24886411

RESUMO

BACKGROUND: Enormous volumes of short read data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to the challenging problem of short read assembly. Thus alignment-free methods are needed for the comparison of NGS samples of short reads. RESULTS: Recently several k-mer based distance measures such as CVTree, d2(S), and co-phylog have been proposed or enhanced to address this problem. However, how to choose an optimal k value for those distance measures is not trivial since it may depend on different aspects of the sequence data. In this paper, we considered an alternative parameter-free approach: compression-based distance measures. These measures have shown good performance for the comparison of long genomic sequences, but they have not yet been tested on NGS short reads. Hence, we performed extensive validation in this study and showed that the compression-based distances are highly consistent with those distances obtained from the k-mer based methods, from the multiple sequence alignment approach, and from existing benchmarks in the literature. Moreover, as the compression-based distance measures are parameter-free, no parameter optimization is required and these measures still perform consistently well on multiple types of sequence data, for different kinds of species and taxonomy levels. CONCLUSIONS: The compression-based distance measures are assembly-free, alignment-free, parameter-free, and thus represent useful tools for the comparison of long genomic sequences as well as the comparison of NGS samples of short reads.


Assuntos
Filogenia , Análise de Sequência
15.
Nat Commun ; 4: 2241, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23917172

RESUMO

Small over-represented motifs in biological networks often form essential functional units of biological processes. A natural question is to gauge whether a motif occurs abundantly or rarely in a biological network. Here we develop an accurate method to estimate the occurrences of a motif in the entire network from noisy and incomplete data, and apply it to eukaryotic interactomes and cell-specific transcription factor regulatory networks. The number of triangles in the human interactome is about 194 times that in the Saccharomyces cerevisiae interactome. A strong positive linear correlation exists between the numbers of occurrences of triad and quadriad motifs in human cell-specific transcription factor regulatory networks. Our findings show that the proposed method is general and powerful for counting motifs and can be applied to any network regardless of its topological structure.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Animais , Arabidopsis/metabolismo , Caenorhabditis elegans/metabolismo , Redes Reguladoras de Genes , Humanos , Ligação Proteica , Saccharomyces cerevisiae , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...