Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Theor Appl Genet ; 137(3): 74, 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38451289

RESUMO

KEY MESSAGE: Eight selected hotspots related to ear traits were identified from two maize-teosinte populations. Throughout the history of maize cultivation, ear-related traits have been selected. However, little is known about the specific genes involved in shaping these traits from their origins in the wild progenitor, teosinte, to the characteristics observed in modern maize. In this study, five ear traits (kernel row number [KRN], ear length [EL], kernel number per row [KNR], cob diameter [CD], and ear diameter [ED]) were investigated, and eight quantitative trait loci (QTL) hotspots were identified in two maize-teosinte populations. Notably, our findings revealed a significant enrichment of genes showing a selection signature and expressed in the ear in qbdCD1.1, qbdCD5.1, qbpCD2.1, qbdED1.1, qbpEL1.1, qbpEL5.1, qbdKNR1.1, and qbdKNR10.1, suggesting that these eight QTL are selected hotspots involved in shaping the maize ear. By combining the results of the QTL analysis with data from previous genome-wide association study (GWAS) involving two natural panels, we identified eight candidate selected genes related to KRN, KNR, CD, and ED. Among these, considering their expression pattern and sequence variation, Zm00001d025111, encoding a WD40/YVTN protein, was proposed as a positive regulator of KNR. This study presents a framework for understanding the genomic distribution of selected loci crucial in determining ear-related traits.


Assuntos
Estudo de Associação Genômica Ampla , Zea mays , Zea mays/genética , Genômica , Fenótipo , Locos de Características Quantitativas
2.
BMC Genomics ; 14 Suppl 8: S5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564548

RESUMO

BACKGROUND: Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed. METHODS: It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes. RESULTS: Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples. CONCLUSIONS: The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/genética , Proteínas de Fusão Oncogênica/análise , Peptídeos/metabolismo , Espectrometria de Massas em Tandem/métodos , Processamento Alternativo , Carcinoma Pulmonar de Células não Pequenas/patologia , Bases de Dados Genéticas , Fusão Gênica , Humanos , Pulmão/metabolismo , Neoplasias Pulmonares/patologia , Fusão Oncogênica , Peptídeos/genética , Proteínas Recombinantes de Fusão/análise
3.
Mol Cell Proteomics ; 10(4): M110.001750, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21149613

RESUMO

Embryonic stem cells are pluripotent and capable of unlimited self-renewal. Elucidation of the underlying molecular mechanism may contribute to the advancement of cell-based regenerative medicine. In the present work, we performed a large scale analysis of the phosphoproteome in mouse embryonic stem (mES) cells. Using multiplex strategies, we detected 4581 proteins and 3970 high confidence distinct phosphosites in 1642 phosphoproteins. Notably, 22 prominent phosphorylated stem cell marker proteins with 39 novel phosphosites were identified for the first time by mass spectrometry, including phosphorylation sites in NANOG (Ser-65) and RE1 silencing transcription factor (Ser-950 and Thr-953). Quantitative profiles of NANOG peptides obtained during the differentiation of mES cells revealed that the abundance of phosphopeptides and non-phosphopeptides decreased with different trends. To our knowledge, this study presents the largest global characterization of phosphorylation in mES cells. Compared with a study of ultimately differentiated tissue cells, a bioinformatics analysis of the phosphorylation data set revealed a consistent phosphorylation motif in human and mouse ES cells. Moreover, investigations into phosphorylation conservation suggested that phosphoproteins were more conserved in the undifferentiated ES cell state than in the ultimately differentiated tissue cell state. However, the opposite conclusion was drawn from this conservation comparison with phosphosites. Overall, this work provides an overview of phosphorylation in mES cells and is a valuable resource for the future understanding of basic biology in mES cells.


Assuntos
Células-Tronco Embrionárias/metabolismo , Fosfoproteínas/metabolismo , Proteoma/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Antígenos de Diferenciação/metabolismo , Diferenciação Celular , Linhagem Celular , Bases de Dados de Proteínas , Células-Tronco Embrionárias/citologia , Humanos , Camundongos , Dados de Sequência Molecular , Fosfoproteínas Fosfatases/metabolismo , Fosforilação , Proteínas Quinases/metabolismo , Processamento de Proteína Pós-Traducional
4.
Genomics ; 98(5): 343-51, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21840390

RESUMO

Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.


Assuntos
Genoma , Camundongos/genética , Fases de Leitura Aberta , Software , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Dados de Sequência Molecular , Peptídeos/química , Peptídeos/genética , Biossíntese de Proteínas , Sítios de Splice de RNA , Alinhamento de Sequência , Espectrometria de Massas em Tandem/métodos
5.
Front Oncol ; 12: 969238, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36465367

RESUMO

Microsatellite instability (MSI) is a molecular signature of mismatch repair deficiency (dMMR), a predictive marker of immune checkpoint inhibitor therapy response. Despite its recognized pan-cancer value, most methods only support detection of this signature in colorectal cancer. In addition to the tissue-specific differences that impact the sensitivity of MSI detection in other tissues, the performance of most methods is also affected by patient ethnicity, tumor content, and other sample-specific properties. These limitations are particularly important when only tumor samples are available and restrict the performance and adoption of MSI testing. Here we introduce MSIdetect, a novel solution for NGS-based MSI detection. MSIdetect models the impact of indel burden and tumor content on read coverage at a set of homopolymer regions that we found are minimally impacted by sample-specific factors. We validated MSIdetect in 139 Formalin-Fixed Paraffin-Embedded (FFPE) clinical samples from colorectal and endometrial cancer as well as other more challenging tumor types, such as glioma or sebaceous adenoma or carcinoma. Based on analysis of these samples, MSIdetect displays 100% specificity and 96.3% sensitivity. Limit of detection analysis supports that MSIdetect is sensitive even in samples with relatively low tumor content and limited microsatellite instability. Finally, the results obtained using MSIdetect in tumor-only data correlate well (R=0.988) with what is obtained using tumor-normal matched pairs, demonstrating that the solution addresses the challenges posed by MSI detection from tumor-only data. The accuracy of MSI detection by MSIdetect in different cancer types coupled with the flexibility afforded by NGS-based testing will support the adoption of MSI testing in the clinical setting and increase the number of patients identified that are likely to benefit from immune checkpoint inhibitor therapy.

6.
Mol Cell Proteomics ; 8(8): 1839-49, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19366988

RESUMO

With the rapid expansion of protein post-translational modification (PTM) research based on large-scale proteomic work, there is an increasing demand for a suitable repository to analyze PTM data. Here we present a curated, web-accessible PTM data base, SysPTM. SysPTM provides a systematic and sophisticated platform for proteomic PTM research equipped not only with a knowledge base of manually curated multi-type modification data but also with four fully developed, in-depth data mining tools. Currently, SysPTM contains data detailing 117,349 experimentally determined PTM sites on 33,421 proteins involving nearly 50 PTM types, curated from public resources including five data bases and four web servers and more than one hundred peer-reviewed mass spectrometry papers. Protein annotations including Pfam domains, KEGG pathways, GO functional classification, and ortholog groups are integrated into the data base. Four online tools have been developed and incorporated, including PTMBlast, to compare a user's PTM dataset with PTM data in SysPTM; PTMPathway, to map PTM proteins to KEGG pathways; PTMPhylog, to discover potentially conserved PTM sites; and PTMCluster, to find clusters of multi-site modifications. The workflow of SysPTM was demonstrated by analyzing an in-house phosphorylation dataset identified by MS/MS. It is shown that in SysPTM, the role of single-type and multi-type modifications can be systematically investigated in a full biological context. SysPTM could be an important contribution to modificomics research. SysPTM is freely available online at www.sysbio.ac.cn/SysPTM.


Assuntos
Bases de Dados de Proteínas/estatística & dados numéricos , Processamento de Proteína Pós-Traducional , Proteínas/análise , Proteômica/estatística & dados numéricos , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Humanos , Internet , Dados de Sequência Molecular , Fosforilação , Proteínas/genética , Proteínas/metabolismo , Projetos de Pesquisa , Homologia de Sequência de Aminoácidos , Transdução de Sinais , Interface Usuário-Computador
7.
JCO Clin Cancer Inform ; 5: 1085-1095, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34731027

RESUMO

PURPOSE: The ability of next-generation sequencing (NGS) assays to interrogate thousands of genomic loci has revolutionized genetic testing. However, translation to the clinic is impeded by false-negative results that pose a risk to patients. In response, regulatory bodies are calling for reliability measures to be reported alongside NGS results. Existing methods to estimate reliability do not account for sample- and position-specific variability, which can be significant. Here, we report an approach that computes reliability metrics for every genomic position and sample interrogated by an NGS assay. METHODS: Our approach predicts the limit of detection (LOD), the lowest reliably detectable variant fraction, by taking technical factors into account. We initially explored how LOD is affected by input material amount, library conversion rate, sequencing coverage, and sequencing error rate. This revealed that LOD depends heavily on genomic context and sample properties. Using these insights, we developed a computational approach to predict LOD on the basis of a biophysical model of the NGS workflow. We focused on targeted assays for cell-free DNA, but, in principle, this approach applies to any NGS assay. RESULTS: We validated our approach by showing that it accurately predicts LOD and distinguishes reliable from unreliable results when screening 580 lung cancer samples for actionable mutations. Compared with a standard variant calling workflow, our approach avoided most false negatives and improved interassay concordance from 94% to 99%. CONCLUSION: Our approach, which we name LAVA (LOD-aware variant analysis), reports the LOD for every position and sample interrogated by an NGS assay. This enables reliable results to be identified and improves the transparency and safety of genetic tests.


Assuntos
Neoplasias Pulmonares , Nucleotídeos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Mutação , Reprodutibilidade dos Testes
8.
Clin Microbiol Infect ; 27(7): 1036.e1-1036.e8, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33813118

RESUMO

OBJECTIVES: Genotyping of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been instrumental in monitoring viral evolution and transmission during the pandemic. The quality of the sequence data obtained from these genotyping efforts depends on several factors, including the quantity/integrity of the input material, the technology, and laboratory-specific implementation. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in genomic epidemiology studies. We aimed to establish clear and broadly applicable recommendations for reliable virus genotyping. METHODS: We established and used a sequencing data analysis workflow that reliably identifies and removes technical artefacts; such artefacts can result in miscalls when using alternative pipelines to process clinical samples and synthetic viral genomes with an amplicon-based genotyping approach. We evaluated the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination. RESULTS: We found that at least 1000 viral genomes are necessary to confidently detect variants in the SARS-CoV-2 genome at frequencies of ≥10%. The broad applicability of our recommendations was validated in over 200 clinical samples from six independent laboratories. The genotypes we determined for clinical isolates with sufficient quality cluster by sampling location and period. Our analysis also supports the rise in frequencies of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was facilitated by travel during the summer of 2020. CONCLUSIONS: We present much-needed recommendations for the reliable determination of SARS-CoV-2 genome sequences and demonstrate their broad applicability in a large cohort of clinical samples.


Assuntos
COVID-19/diagnóstico , Técnicas de Genotipagem/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , SARS-CoV-2/genética , Sequenciamento Completo do Genoma/normas , Artefatos , COVID-19/virologia , Genoma Viral , Técnicas de Genotipagem/métodos , Guias como Assunto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , RNA Viral , Reprodutibilidade dos Testes , SARS-CoV-2/isolamento & purificação , Sensibilidade e Especificidade , Sequenciamento Completo do Genoma/métodos , Fluxo de Trabalho
9.
Medicine (Baltimore) ; 98(45): e17858, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31702648

RESUMO

Reliable molecular signatures are needed to improve the early and accurate diagnosis of autism spectrum disorder (ASD), and indicate physicians to provide timely intervention. This study aimed to identify a robust blood small nuclear RNA (snRNA) signature in diagnosing ASD. 186 blood samples in the microarray dataset were randomly divided into the training set (n = 112) and validation set (n = 72). Then, the microarray probe expression profiles were re-annotated into the expression profiles of 1253 snRNAs though probe sequence mapping. In the training set, least absolute shrinkage and selection operator (LASSO) penalized generalized linear model was adopted to identify the 9-snRNA signature (RNU1-16P, RNU6-1031P, RNU6-258P, RNU6-335P, RNU6-485P, RNU6-549P, RNU6-98P, RNU6ATAC26P, and RNVU1-15), and a diagnostic score was calculated for each sample according to the snRNA expression levels and the model coefficients. The score demonstrated a good diagnostic ability for ASD in the training set (area under receiver operating characteristic curve (AUC) = 0.90), validation set (AUC = 0.87), and the overall (AUC = 0.88). Moreover, the blood samples of 23 ASD patients and 23 age- and gender-matched controls were collected as the external validation set, in which the signature also showed a good diagnostic ability for ASD (AUC = 0.88). In subgroup analysis, the signature was robust when considering the confounders of gender, age, and disease subtypes, and displayed a significantly better performance among the female and younger cases (P = .039; P = .002). In comparison with a 55-gene signature deriving from the same dataset, the snRNA signature showed a better diagnostic ability (AUC: 0.88 vs 0.80, P = .049). In conclusion, this study identified a novel and robust blood snRNA signature in diagnosing ASD, which might help improve the diagnostic accuracy for ASD in clinical practice. Nevertheless, a large-scale prospective study was needed to validate our results.


Assuntos
Transtorno do Espectro Autista/diagnóstico , RNA Nuclear Pequeno/sangue , Adolescente , Estudos de Casos e Controles , Criança , Pré-Escolar , Diagnóstico Precoce , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Masculino , Curva ROC , Reprodutibilidade dos Testes
10.
Cancer Med ; 8(8): 3685-3697, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31112372

RESUMO

Cell-free plasma DNA (cfDNA) and mimicking circulating tumor cells (mCTCs) have demonstrated tremendous potential for molecular diagnosis of cancer and have been rapidly implemented in specific settings. However, widespread clinical adoption still faces some obstacles. The purpose was to compare the performance of a BEAMing (beads, emulsion, amplification, and magnetics) assay (OncoBEAM™-epidermal growth factor receptor [EGFR] [Sysmex Inostics]) and a next-generation sequencing assay (NGS; 56G Oncology panel kit, Swift Bioscience) to detect the p.T790M EGFR mutation in cfDNA of non-small cell lung cancer (NSCLC) patients. CfDNA samples (n = 183) were collected within our hospital from patients having a known EGFR sensitizing mutation, and presenting disease progression while under first-line therapy. EGFR mutations were detected using NGS in 42.1% of samples during progression in cfDNA. Testing using the OncoBEAM™-EGFR assay enabled detection of the p.T790M EGFR mutation in 40/183 NSCLC patients (21.8%) versus 20/183 (10.9%), using the NGS assay. Samples that were only positive with the OncoBEAM™-EGFR assay had lower mutant allelic fractions (Mean = 0.1304%; SD ± 0.1463%). In addition, we investigated the detection of p.T790M in mCTCs using H1975 cells. These cells spiked into whole blood were enriched using the ClearCellFX1 microfluidic device. Using the OncoBEAM™-EGFR assay, p.T790M was detected in as few as 1.33 tumoral cells/mL. Overall, these findings highlight the value of using the OncoBEAM™-EGFR to optimize detection of the p.T790M mutation, as well as the complementary clinical value that each of the mutation detection assay offers: NGS enabled the detection of mutations in other oncogenes that may be relevant to secondary resistance mechanisms, whereas the OncoBEAM™-EGFR assay achieved higher sensitivity for detection of clinically actionable mutations.


Assuntos
Biomarcadores Tumorais , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , DNA Tumoral Circulante , DNA de Neoplasias , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Alelos , Carcinoma Pulmonar de Células não Pequenas/sangue , Análise Mutacional de DNA , Progressão da Doença , Receptores ErbB/genética , Perfilação da Expressão Gênica/métodos , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Biópsia Líquida/métodos , Neoplasias Pulmonares/sangue , Mutação , Células Neoplásicas Circulantes/metabolismo , Células Neoplásicas Circulantes/patologia
11.
PLoS One ; 7(7): e35230, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22807998

RESUMO

Elucidation of the mechanisms of stem cell differentiation is of great scientific interest. Increasing evidence suggests that stem cell differentiation involves changes at multiple levels of biological regulation, which together orchestrate the complex differentiation process; many related studies have been performed to investigate the various levels of regulation. The resulting valuable data, however, remain scattered. Most of the current stem cell-relevant databases focus on a single level of regulation (mRNA expression) from limited stem cell types; thus, a unifying resource would be of great value to compile the multiple levels of research data available. Here we present a database for this purpose, SyStemCell, deposited with multi-level experimental data from stem cell research. The database currently covers seven levels of stem cell differentiation-associated regulatory mechanisms, including DNA CpG 5-hydroxymethylcytosine/methylation, histone modification, transcript products, microRNA-based regulation, protein products, phosphorylation proteins and transcription factor regulation, all of which have been curated from 285 peer-reviewed publications selected from PubMed. The database contains 43,434 genes, recorded as 942,221 gene entries, for four organisms (Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta) and various stem cell sources (e.g., embryonic stem cells, neural stem cells and induced pluripotent stem cells). Data in SyStemCell can be queried by Entrez gene ID, symbol, alias, or browsed by specific stem cell type at each level of genetic regulation. An online analysis tool is integrated to assist researchers to mine potential relationships among different regulations, and the potential usage of the database is demonstrated by three case studies. SyStemCell is the first database to bridge multi-level experimental information of stem cell studies, which can become an important reference resource for stem cell researchers. The database is available at http://lifecenter.sgst.cn/SyStemCell/.


Assuntos
Bases de Dados Factuais , Bases de Dados Genéticas , Células-Tronco Embrionárias/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Macaca mulatta/genética , Células-Tronco Neurais/metabolismo , Software , Animais , Diferenciação Celular , Metilação de DNA , Regulação da Expressão Gênica , Histonas/genética , Humanos , Internet , Camundongos , MicroRNAs/genética , Fosforilação , RNA Mensageiro/genética , Ratos , Fatores de Transcrição/genética
12.
BMC Res Notes ; 4: 405, 2011 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-21992408

RESUMO

BACKGROUND: Escherichia coli has been extensively studied as a prokaryotic model organism whose whole genome was determined in 1997. However, it is difficult to identify all the gene products involved in diverse functions by using whole genome sequencesalone. The high-resolution transcriptome mapping using tiling arrays has proved effective to improve the annotation of transcript units and discover new transcripts of ncRNAs. While abundant tiling array data have been generated, the lack of appropriate visualization tools to accommodate and integrate multiple sources of data has emerged. FINDINGS: EcoBrowser is a web-based tool for visualizing genome annotations and transcriptome data of E. coli. Important tiling array data of E. coli from different experimental platforms are collected and processed for query. An AJAX based genome browser is embedded for visualization. Thus, genome annotations can be compared with transcript profiling and genome occupancy profiling from independent experiments, which will be helpful in discovering new transcripts including novel mRNAs and ncRNAs, generating a detailed description of the transcription unit architecture, further providing clues for investigation of prokaryotic transcriptional regulation that has proved to be far more complex than previously thought. CONCLUSIONS: With the help of EcoBrowser, users can get a systemic view both from the vertical and parallel sides, as well as inspirations for the design of new experiments which will expand our understanding of the regulation mechanism.

13.
PLoS One ; 3(10): e3357, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18852873

RESUMO

Efforts in phylogenomics have greatly improved our understanding of the backbone tree of life. However, due to the systematic error in sequence data, a sequence-based phylogenomic approach leads to well-resolved but statistically significant incongruence. Thus, independent test of current phylogenetic knowledge is required. Here, we have devised a distance-based strategy to reconstruct a highly resolved backbone tree of life, on the basis of the genome context networks of 195 fully sequenced representative species. Along with strongly supporting the monophylies of three superkingdoms and most taxonomic sub-divisions, the derived tree also suggests some intriguing results, such as high G+C gram positive origin of Bacteria, classification of Symbiobacterium thermophilum and Alcanivorax borkumensis in Firmicutes. Furthermore, simulation analyses indicate that addition of more gene relationships with high accuracy can greatly improve the resolution of the phylogenetic tree. Our results demonstrate the feasibility of the reconstruction of highly resolved phylogenetic tree with extensible gene networks across all three domains of life. This strategy also implies that the relationships between the genes (gene network) can define what kind of species it is.


Assuntos
Genoma , Filogenia , Biologia Computacional/métodos , Redes Reguladoras de Genes , Métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA