Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Glycobiology ; 33(5): 354-357, 2023 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-36799723

RESUMO

Recent technological advances in glycobiology have resulted in a large influx of data and the publication of many papers describing discoveries in glycoscience. However, the terms used in describing glycan structural features are not standardized, making it difficult to harmonize data across biomolecular databases, hampering the harvesting of information across studies and hindering text mining and curation efforts. To address this shortcoming, the Glycan Structure Dictionary has been developed as a reference dictionary to provide a standardized list of widely used glycan terms that can help in the curation and mapping of glycan structures described in publications. Currently, the dictionary has 190 glycan structure terms with 297 synonyms linked to 3,332 publications. For a term to be included in the dictionary, it must be present in at least 2 peer-reviewed publications. Synonyms, annotations, and cross-references to GlyTouCan, GlycoMotif, and other relevant databases and resources are also provided when available. The purpose of this effort is to facilitate biocuration, assist in the development of text mining tools, improve the harmonization of search, and browse capabilities in glycoinformatics resources and help to map glycan structures to function and disease. It is also expected that authors will use these terms to describe glycan structures in their manuscripts over time. A mechanism is also provided for researchers to submit terms for potential incorporation. The dictionary is available at https://wiki.glygen.org/Glycan_structure_dictionary.


Assuntos
Mineração de Dados , Polissacarídeos , Mineração de Dados/métodos , Bases de Dados Factuais , Polissacarídeos/química , Glicômica/métodos
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015823

RESUMO

In response to the COVID-19 outbreak, scientists and medical researchers are capturing a wide range of host responses, symptoms and lingering postrecovery problems within the human population. These variable clinical manifestations suggest differences in influential factors, such as innate and adaptive host immunity, existing or underlying health conditions, comorbidities, genetics and other factors-compounding the complexity of COVID-19 pathobiology and potential biomarkers associated with the disease, as they become available. The heterogeneous data pose challenges for efficient extrapolation of information into clinical applications. We have curated 145 COVID-19 biomarkers by developing a novel cross-cutting disease biomarker data model that allows integration and evaluation of biomarkers in patients with comorbidities. Most biomarkers are related to the immune (SAA, TNF-∝ and IP-10) or coagulation (D-dimer, antithrombin and VWF) cascades, suggesting complex vascular pathobiology of the disease. Furthermore, we observe commonality with established cancer biomarkers (ACE2, IL-6, IL-4 and IL-2) as well as biomarkers for metabolic syndrome and diabetes (CRP, NLR and LDL). We explore these trends as we put forth a COVID-19 biomarker resource (https://data.oncomx.org/covid19) that will help researchers and diagnosticians alike.

3.
Glycobiology ; 32(10): 855-870, 2022 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-35925813

RESUMO

Molecular biomarkers measure discrete components of biological processes that can contribute to disorders when impaired. Great interest exists in discovering early cancer biomarkers to improve outcomes. Biomarkers represented in a standardized data model, integrated with multi-omics data, may improve the understanding and use of novel biomarkers such as glycans and glycoconjugates. Among altered components in tumorigenesis, N-glycans exhibit substantial biomarker potential, when analyzed with their protein carriers. However, such data are distributed across publications and databases of diverse formats, which hamper their use in research and clinical application. Mass spectrometry measures of 50 N-glycans on 7 serum proteins in liver disease were integrated (as a panel) into a cancer biomarker data model, providing a unique identifier, standard nomenclature, links to glycan resources, and accession and ontology annotations to standard protein, gene, disease, and biomarker information. Data provenance was documented with a standardized United States Food and Drug Administration-supported BioCompute Object. Using the biomarker data model allows the capture of granular information, such as glycans with different levels of abundance in cirrhosis, hepatocellular carcinoma, and transplant groups. Such representation in a standardized data model harmonizes glycomics data in a unified framework, making glycan-protein biomarker data exploration more available to investigators and to other data resources. The biomarker data model we describe can be used by researchers to describe their novel glycan and glycoconjugate biomarkers; it can integrate N-glycan biomarker data with multi-source biomedical data and can foster discovery and insight within a unified data framework for glycan biomarker representation, thereby making the data FAIR (Findable, Accessible, Interoperable, Reusable) (https://www.go-fair.org/fair-principles/).


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Biomarcadores , Biomarcadores Tumorais , Carcinoma Hepatocelular/diagnóstico , Glicômica/métodos , Humanos , Neoplasias Hepáticas/diagnóstico , Polissacarídeos/química
4.
Glycobiology ; 31(11): 1510-1519, 2021 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-34314492

RESUMO

Glycans play a vital role in health, disease, bioenergy, biomaterials and bio-therapeutics. As a result, there is keen interest to identify and increase glycan data in bioinformatics databases like ChEBI and PubChem, and connecting them to resources at the EMBL-EBI and NCBI to facilitate access to important annotations at a global level. GlyTouCan is a comprehensive archival database that contains glycans obtained primarily through batch upload from glycan repositories, glycoprotein databases and individual laboratories. In many instances, the glycan structures deposited in GlyTouCan may not be fully defined or have supporting experimental evidence and citations. Databases like ChEBI and PubChem were designed to accommodate complete atomistic structures with well-defined chemical linkages. As a result, they cannot easily accommodate the structural ambiguity inherent in glycan databases. Consequently, there is a need to improve the organization of glycan data coherently to enhance connectivity across the major NCBI, EMBL-EBI and glycoscience databases. This paper outlines a workflow developed in collaboration between GlyGen, ChEBI and PubChem to improve the visibility and connectivity of glycan data across these resources. GlyGen hosts a subset of glycans (~29,000) from the GlyTouCan database and has submitted valuable glycan annotations to the PubChem database and integrated over 10,500 (including ambiguously defined) glycans into the ChEBI database. The integrated glycans were prioritized based on links to PubChem and connectivity to glycoprotein data. The pipeline provides a blueprint for how glycan data can be harmonized between different resources. The current PubChem, ChEBI and GlyTouCan mappings can be downloaded from GlyGen (https://data.glygen.org).


Assuntos
Bases de Dados de Compostos Químicos , Glicoproteínas/química , Polissacarídeos/química , Software , Configuração de Carboidratos , Glicômica
5.
Gastroenterology ; 158(1): 238-252, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31585122

RESUMO

BACKGROUND & AIMS: We studied interactions among proteins of the carcinoembryonic antigen-related cell adhesion molecule (CEACAM) family, which interact with microbes, and transforming growth factor beta (TGFB) signaling pathway, which is often altered in colorectal cancer cells. We investigated mechanisms by which CEACAM proteins inhibit TGFB signaling and alter the intestinal microbiome to promote colorectal carcinogenesis. METHODS: We collected data on DNA sequences, messenger RNA expression levels, and patient survival times from 456 colorectal adenocarcinoma cases, and a separate set of 594 samples of colorectal adenocarcinomas, in The Cancer Genome Atlas. We performed shotgun metagenomic sequencing analyses of feces from wild-type mice and mice with defects in TGFB signaling (Sptbn1+/- and Smad4+/-/Sptbn1+/-) to identify changes in microbiota composition before development of colon tumors. CEACAM protein and its mutants were overexpressed in SW480 and HCT116 colorectal cancer cell lines, which were analyzed by immunoblotting and proliferation and colony formation assays. RESULTS: In colorectal adenocarcinomas, high expression levels of genes encoding CEACAM proteins, especially CEACAM5, were associated with reduced survival times of patients. There was an inverse correlation between expression of CEACAM genes and expression of TGFB pathway genes (TGFBR1, TGFBR2, and SMAD3). In colorectal adenocarcinomas, we also found an inverse correlation between expression of genes in the TGFB signaling pathway and genes that regulate stem cell features of cells. We found mutations encoding L640I and A643T in the B3 domain of human CEACAM5 in colorectal adenocarcinomas; structural studies indicated that these mutations would alter the interaction between CEACAM5 and TGFBR1. Overexpression of these mutants in SW480 and HCT116 colorectal cancer cell lines increased their anchorage-independent growth and inhibited TGFB signaling to a greater extent than overexpression of wild-type CEACAM5, indicating that they are gain-of-function mutations. Compared with feces from wild-type mice, feces from mice with defects in TGFB signaling had increased abundance of bacterial species that have been associated with the development of colon tumors, including Clostridium septicum, and decreased amounts of beneficial bacteria, such as Bacteroides vulgatus and Parabacteroides distasonis. CONCLUSION: We found expression of CEACAMs and genes that regulate stem cell features of cells to be increased in colorectal adenocarcinomas and inversely correlated with expression of TGFB pathway genes. We found colorectal adenocarcinomas to express mutant forms of CEACAM5 that inhibit TGFB signaling and increase proliferation and colony formation. We propose that CEACAM proteins disrupt TGFB signaling, which alters the composition of the intestinal microbiome to promote colorectal carcinogenesis.


Assuntos
Antígeno Carcinoembrionário/genética , Carcinogênese/genética , Neoplasias Colorretais/genética , Microbioma Gastrointestinal/fisiologia , Transdução de Sinais/genética , Animais , Bactérias/genética , Bactérias/isolamento & purificação , Antígeno Carcinoembrionário/metabolismo , Neoplasias Colorretais/microbiologia , Neoplasias Colorretais/mortalidade , Modelos Animais de Doenças , Fezes/microbiologia , Proteínas Ligadas por GPI/genética , Proteínas Ligadas por GPI/metabolismo , Mutação com Ganho de Função , Regulação Neoplásica da Expressão Gênica , Células HCT116 , Humanos , Metagenômica , Camundongos , Camundongos Transgênicos , Domínios Proteicos/genética , Receptor do Fator de Crescimento Transformador beta Tipo I/metabolismo , Proteína Smad4/genética , Proteína Smad4/metabolismo , Esferoides Celulares , Análise de Sobrevida , Fator de Crescimento Transformador beta/metabolismo
6.
Bioinformatics ; 36(12): 3941-3943, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32324859

RESUMO

SUMMARY: Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources. AVAILABILITY AND IMPLEMENTATION: GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Conhecimento , Software , Glicômica , Armazenamento e Recuperação da Informação , Fluxo de Trabalho
7.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30596645

RESUMO

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Comunicação , Biologia Computacional/normas , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Medicina de Precisão/tendências , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas , Software , Fluxo de Trabalho
8.
Nucleic Acids Res ; 46(D1): D1128-D1136, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-30053270

RESUMO

Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively.


Assuntos
Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Bases de Conhecimento , Mutação , Neoplasias/genética , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs , Interface Usuário-Computador
9.
Gastroenterology ; 154(1): 195-210, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28918914

RESUMO

BACKGROUND & AIMS: Development of hepatocellular carcinoma (HCC) is associated with alterations in the transforming growth factor-beta (TGF-ß) signaling pathway, which regulates liver inflammation and can have tumor suppressor or promoter activities. Little is known about the roles of specific members of this pathway at specific of HCC development. We took an integrated approach to identify and validate the effects of changes in this pathway in HCC and identify therapeutic targets. METHODS: We performed transcriptome analyses for a total of 488 HCCs that include data from The Cancer Genome Atlas. We also screened 301 HCCs reported in the Catalogue of Somatic Mutations in Cancer and 202 from Cancer Genome Atlas for mutations in genome sequences. We expressed mutant forms of spectrin beta, non-erythrocytic 1 (SPTBN1) in HepG2, SNU398, and SNU475 cells and measured phosphorylation, nuclear translocation, and transcriptional activity of SMAD family member 3 (SMAD3). RESULTS: We found somatic mutations in at least 1 gene whose product is a member of TGF-ß signaling pathway in 38% of HCC samples. SPTBN1 was mutated in the largest proportion of samples (12 of 202, 6%). Unsupervised clustering of transcriptome data identified a group of HCCs with activation of the TGF-ß signaling pathway (increased transcription of genes in the pathway) and a group of HCCs with inactivation of TGF-ß signaling (reduced expression of genes in this pathway). Patients with tumors with inactivation of TGF-ß signaling had shorter survival times than patients with tumors with activation of TGF-ß signaling (P = .0129). Patterns of TGF-ß signaling correlated with activation of the DNA damage response and sirtuin signaling pathways. HepG2, SNU398, and SNU475 cells that expressed the D1089Y mutant or with knockdown of SPTBN1 had increased sensitivity to DNA crosslinking agents and reduced survival compared with cells that expressed normal SPTBN1 (controls). CONCLUSIONS: In genome and transcriptome analyses of HCC samples, we found mutations in genes in the TGF-ß signaling pathway in almost 40% of samples. These correlated with changes in expression of genes in the pathways; up-regulation of genes in this pathway would contribute to inflammation and fibrosis, whereas down-regulation would indicate loss of TGF-ß tumor suppressor activity. Our findings indicate that therapeutic agents for HCCs can be effective, based on genetic features of the TGF-ß pathway; agents that block TGF-ß should be used only in patients with specific types of HCCs.


Assuntos
Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Mutação/genética , Transdução de Sinais/fisiologia , Fator de Crescimento Transformador beta/fisiologia , Idoso , Carcinoma Hepatocelular/mortalidade , Estudos de Casos e Controles , Análise por Conglomerados , Feminino , Humanos , Neoplasias Hepáticas/mortalidade , Masculino , Pessoa de Meia-Idade
10.
Nucleic Acids Res ; 45(19): 10989-11003, 2017 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-28977510

RESUMO

Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Humano/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma Viral/genética , Humanos , Filogenia , Poliovirus/classificação , Poliovirus/genética , Reprodutibilidade dos Testes
11.
Bioinformatics ; 32(13): 2041-3, 2016 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-27153712

RESUMO

MOTIVATION: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION: http://proteininformationresource.org/rps/viruses/ CONTACT: chenc@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Proteoma/análise , Proteínas Virais/análise , Sequência de Aminoácidos , Análise por Conglomerados , Biologia Computacional , Bases de Conhecimento
12.
J Biol Chem ; 290(8): 4966-4980, 2015 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-25538240

RESUMO

Human N-methylpurine DNA glycosylase (hMPG) initiates base excision repair of a number of structurally diverse purine bases including 1,N(6)-ethenoadenine, hypoxanthine, and alkylation adducts in DNA. Genetic studies discovered at least eight validated non-synonymous single nucleotide polymorphisms (nsSNPs) of the hMPG gene in human populations that result in specific single amino acid substitutions. In this study, we tested the functional consequences of these nsSNPs of hMPG. Our results showed that two specific arginine residues, Arg-141 and Arg-120, are important for the activity of hMPG as the germ line variants R120C and R141Q had reduced enzymatic activity in vitro as well as in mammalian cells. Expression of these two variants in mammalian cells lacking endogenous MPG also showed an increase in mutations and sensitivity to an alkylating agent compared with the WT hMPG. Real time binding experiments by surface plasmon resonance spectroscopy suggested that these variants have substantial reduction in the equilibrium dissociation constant of binding (KD) of hMPG toward 1,N(6)-ethenoadenine-containing oligonucleotide (ϵA-DNA). Pre-steady-state kinetic studies showed that the substitutions at arginine residues affected the turnover of the enzyme significantly under multiple turnover condition. Surface plasmon resonance spectroscopy further showed that both variants had significantly decreased nonspecific (undamaged) DNA binding. Molecular modeling suggested that R141Q substitution may have resulted in a direct loss of the salt bridge between ϵA-DNA and hMPG, whereas R120C substitution redistributed, at a distance, the interactions among residues in the catalytic pocket. Together our results suggest that individuals carrying R120C and R141Q MPG variants may be at risk for genomic instability and associated diseases as a consequence.


Assuntos
Adenina/análogos & derivados , DNA Glicosilases , Reparo do DNA , Mutagênicos/farmacologia , Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Adenina/farmacologia , Substituição de Aminoácidos , Animais , Domínio Catalítico , DNA Glicosilases/química , DNA Glicosilases/genética , DNA Glicosilases/metabolismo , Reparo do DNA/efeitos dos fármacos , Reparo do DNA/genética , Expressão Gênica , Instabilidade Genômica , Células HEK293 , Humanos , Cinética , Camundongos , Camundongos Knockout , Ressonância de Plasmônio de Superfície
13.
Nucleic Acids Res ; 42(18): 11570-88, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25232094

RESUMO

Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations.


Assuntos
Genes Neoplásicos , Variação Genética , Acetilação , Sítios de Ligação/genética , Domínio Catalítico/genética , Doença/genética , Ontologia Genética , Genômica , Glicosilação , Humanos , Metilação , Mutação , Proteínas de Neoplasias/genética , Fosforilação/genética , Filogenia , Processamento de Proteína Pós-Traducional/genética , Proteoma/genética , Ubiquitinação/genética
16.
Genomics ; 104(1): 1-7, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24930720

RESUMO

UNLABELLED: Next-generation sequencing data can be mapped to a reference genome to identify single-nucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position) and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets. AVAILABILITY: PhyloSNP is freely available for download at http://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp.


Assuntos
Burkholderia pseudomallei/genética , Genoma Humano , Genômica/métodos , Filogenia , Poliovirus/genética , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Algoritmos , Humanos , Software
17.
BMC Bioinformatics ; 15: 28, 2014 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-24467687

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteoma/genética , Proteômica/métodos , Algoritmos , Pesquisa Biomédica , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleotídeo Único , Proteoma/classificação , Proteoma/metabolismo , Interface Usuário-Computador
18.
BMC Genomics ; 15: 918, 2014 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-25336203

RESUMO

BACKGROUND: Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis. RESULTS: Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a 'sneak peak' into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI's nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results. CONCLUSION: CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.


Assuntos
Classificação/métodos , Metagenoma , Biodiversidade , Bases de Dados Genéticas , Genoma Fúngico/genética , Humanos , Intestinos/microbiologia , Microbiota/genética , Infecções Respiratórias/microbiologia , Fatores de Tempo
19.
Appl Physiol Nutr Metab ; 49(1): 125-134, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-37902107

RESUMO

Sucralose and acesulfame-potassium consumption alters gut microbiota in rodents, with unclear effects in humans. We examined effects of three-times daily sucralose- and acesulfame-potassium-containing diet soda consumption for 1 (n = 17) or 8 (n = 8) weeks on gut microbiota composition in young adults. After 8 weeks of diet soda consumption, the relative abundance of Proteobacteria, specifically Enterobacteriaceae, increased; and, increased abundance of two Proteobacteria taxa was also observed after 1 week of diet soda consumption compared with sparkling water. In addition, three taxa in the Bacteroides genus increased following 1 week of diet soda consumption compared with sparkling water. The clinical relevance of these findings and effects of sucralose and acesulfame-potassium consumption on human gut microbiota warrant further investigation in larger studies. Clinical trial registration: NCT02877186 and NCT03125356.


Assuntos
Água Carbonatada , Adulto Jovem , Humanos , Projetos Piloto , Edulcorantes/farmacologia , Dieta , Potássio
20.
Drug Discov Today ; 29(3): 103884, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38219969

RESUMO

The volume of nucleic acid sequence data has exploded recently, amplifying the challenge of transforming data into meaningful information. Processing data can require an increasingly complex ecosystem of customized tools, which increases difficulty in communicating analyses in an understandable way yet is of sufficient detail to enable informed decisions or repeats. This can be of particular interest to institutions and companies communicating computations in a regulatory environment. BioCompute Objects (BCOs; an instance of pipeline documentation that conforms to the IEEE 2791-2020 standard) were developed as a standardized mechanism for analysis reporting. A suite of BCOs is presented, representing interconnected elements of a computation modeled after those that might be found in a regulatory submission but are shared publicly - in this case a pipeline designed to identify viral contaminants in biological manufacturing, such as for vaccines.


Assuntos
Biologia Computacional , Vacinas , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA