RESUMEN
SUMMARY: The simplicity and precision of CRISPR/Cas9 system has brought in a new era of gene editing. Screening for desired clones with CRISPR-mediated genomic edits in a large number of samples is made possible by next generation sequencing (NGS) due to its multiplexing. Here we present CRISPR-DAV (CRISPR Data Analysis and Visualization) pipeline to analyze the CRISPR NGS data in a high throughput manner. In the pipeline, Burrows-Wheeler Aligner and Assembly Based ReAlignment are used for small and large indel detection, and results are presented in a comprehensive set of charts and interactive alignment view. AVAILABILITY AND IMPLEMENTATION: CRISPR-DAV is available at GitHub and Docker Hub repositories: https://github.com/pinetree1/crispr-dav.git and https://hub.docker.com/r/pinetree1/crispr-dav/. CONTACT: xuning.wang@bms.com.
Asunto(s)
Células Clonales , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bacterias/genética , Genoma Bacteriano , Genómica/métodosRESUMEN
This study reports findings of an unusual cluster of mutations spanning 22 bp (base pairs) in a monoclonal antibody expression vector. It was identified by two orthogonal methods: mass spectrometry on expressed protein and next-generation sequencing (NGS) on the plasmid DNA. While the initial NGS analysis confirmed the designed sequence modification, intact mass analysis detected an additional mass of the antibody molecule expressed in CHO cells. The extra mass was eventually found to be associated with unmatched nucleotides in a distal region by checking full-length sequence alignment plots. Interestingly, the complementary sequence of the mutated sequence was a reverse sequence of the original sequence and flanked by two 10-bp reverse-complementary sequences, leading to an undesirable DNA recombination. The finding highlights the necessity of rigorous examination of expression vector design and early monitoring of molecule integrity at both DNA and protein levels to prevent clones from having sequence variants during cell line development.
Asunto(s)
Anticuerpos/metabolismo , Vectores Genéticos , Factores Inmunológicos/metabolismo , Mutación , Proteínas Recombinantes/metabolismo , Animales , Anticuerpos/química , Anticuerpos/genética , Células CHO , Cricetulus , Secuenciación de Nucleótidos de Alto Rendimiento , Factores Inmunológicos/química , Factores Inmunológicos/genética , Espectrometría de Masas , Plásmidos , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Recombinación GenéticaRESUMEN
The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein-DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data 'boutiques' within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk.
Asunto(s)
Bases de Datos Genéticas , Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción , Programas Informáticos , Factores de Transcripción/metabolismo , Secuencia de Bases , Sitios de Unión , Secuencia Conservada , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
The steady elaboration of the Metagenomic and Metadesign of Subways and Urban Biomes (MetaSUB) international consortium project raises important new questions about the origin, variation, and antimicrobial resistance of the collected samples. CAMDA (Critical Assessment of Massive Data Analysis, http://camda.info/) forum organizes annual challenges where different bioinformatics and statistical approaches are tested on samples collected around the world for bacterial classification and prediction of geographical origin. This work proposes a method which not only predicts the locations of unknown samples, but also estimates the relative risk of antimicrobial resistance through spatial modeling. We introduce a new component in the standard analysis as we apply a Bayesian spatial convolution model which accounts for spatial structure of the data as defined by the longitude and latitude of the samples and assess the relative risk of antimicrobial resistance taxa across regions which is relevant to public health. We can then use the estimated relative risk as a new measure for antimicrobial resistance. We also compare the performance of several machine learning methods, such as Gradient Boosting Machine, Random Forest, and Neural Network to predict the geographical origin of the mystery samples. All three methods show consistent results with some superiority of Random Forest classifier. In our future work we can consider a broader class of spatial models and incorporate covariates related to the environment and climate profiles of the samples to achieve more reliable estimation of the relative risk related to antimicrobial resistance.
RESUMEN
Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from combining SNV, leading to incorrect conclusions about the downstream effects of the variants. Here, we analyzed 10,383 variant call files (VCF) from the Cancer Genome Atlas (TCGA) and found 12,141 incorrectly annotated MNVs. Analysis of seven commonly mutated genes from 178 studies in cBioPortal revealed that MNVs were consistently missed in 20 of these studies, whereas they were correctly annotated in 15 more recent studies. At the BRAF V600 locus, the most common example of MNV, several public datasets reported separate BRAF V600E and BRAF V600M variants instead of a single merged V600K variant. VCFs from the TCGA Mutect2 caller were used to develop a solution to merge SNV to MNV. Our custom script used the phasing information from the SNV VCF and determined whether SNVs were at the same codon and needed to be merged into MNV before variant annotation. This study shows that institutions performing NGS sequencing for cancer genomics should incorporate the step of merging MNV as a best practice in their pipelines. SIGNIFICANCE: Identification of incorrect mutation calls in TCGA, including clinically relevant BRAF V600 and KRAS G12, will influence research and potentially clinical decisions.
Asunto(s)
Genoma Humano , Genómica/normas , Anotación de Secuencia Molecular/normas , Mutación , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Error Científico Experimental/estadística & datos numéricos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Neoplasias/patologíaRESUMEN
Although next-generation sequencing assays are routinely carried out using samples from cancer trials, the sequencing data are not always of the required quality. There is a need to evaluate the performance of tissue collection sites and provide feedback about the quality of next-generation sequencing data. This study used a modeling approach based on whole exome sequencing quality control (QC) metrics to evaluate the relative performance of sites participating in the Bristol Myers Squibb Immuno-Oncology clinical trials sample collection. We identified several events for the sample swap. Overall, most sites performed well and few showed poor performance. These findings can increase awareness of sample failure and improve the quality of samples.
Asunto(s)
Secuenciación del Exoma , Modelos Teóricos , Manejo de Especímenes , Técnicas de Laboratorio Clínico , Humanos , Control de Calidad , Secuenciación del Exoma/normasRESUMEN
The scientific value of re-analyzing existing datasets is often proportional to the complexity of the data. Proteomics data are inherently complex and can be analyzed at many levels, including proteins, peptides, and post-translational modifications to verify and/or develop new hypotheses. In this paper, we present our re-analysis of a previously published study comparing colon biopsy samples from ulcerative colitis (UC) patients to non-affected controls. We used a different statistical approach, employing a linear mixed-effects regression model and analyzed the data both on the protein and peptide level. In addition to confirming and reinforcing the original finding of upregulation of neutrophil extracellular traps (NETs), we report novel findings, including that Extracellular Matrix (ECM) degradation and neutrophil maturation are involved in the pathology of UC. The pharmaceutically most relevant differential protein expressions were confirmed using immunohistochemistry as an orthogonal method. As part of this study, we also compared proteomics data to previously published mRNA expression data. These comparisons indicated compensatory regulation at transcription levels of the ECM proteins we identified and open possible new avenues for drug discovery.
Asunto(s)
Colitis Ulcerosa/metabolismo , Colitis Ulcerosa/patología , Matriz Extracelular/metabolismo , Biopsia , Estudios de Casos y Controles , Colon/metabolismo , Colon/patología , Humanos , Hidroxiprolina/metabolismo , Proteínas/genética , Proteínas/metabolismo , Control de CalidadRESUMEN
INTRODUCTION: Tumor mutational burden (TMB) has emerged as a clinically relevant biomarker that may be associated with immune checkpoint inhibitor efficacy. Standardization of TMB measurement is essential for implementing diagnostic tools to guide treatment. OBJECTIVE: Here we describe the in-depth evaluation of bioinformatic TMB analysis by whole exome sequencing (WES) in formalin-fixed, paraffin-embedded samples from a phase III clinical trial. METHODS: In the CheckMate 026 clinical trial, TMB was retrospectively assessed in 312 patients with non-small-cell lung cancer (58% of the intent-to-treat population) who received first-line nivolumab treatment or standard-of-care chemotherapy. We examined the sensitivity of TMB assessment to bioinformatic filtering methods and assessed concordance between TMB data derived by WES and the FoundationOne® CDx assay. RESULTS: TMB scores comprising synonymous, indel, frameshift, and nonsense mutations (all mutations) were 3.1-fold higher than data including missense mutations only, but values were highly correlated (Spearman's r = 0.99). Scores from CheckMate 026 samples including missense mutations only were similar to those generated from data in The Cancer Genome Atlas, but those including all mutations were generally higher. Using databases for germline subtraction (instead of matched controls) showed a trend for race-dependent increases in TMB scores. WES and FoundationOne CDx outputs were highly correlated (Spearman's r = 0.90). CONCLUSIONS: Parameter variation can impact TMB calculations, highlighting the need for standardization. Encouragingly, differences between assays could be accounted for by empirical calibration, suggesting that reliable TMB assessment across assays, platforms, and centers is achievable.
Asunto(s)
Biomarcadores de Tumor , Carcinoma de Pulmón de Células no Pequeñas/genética , Biología Computacional , Neoplasias Pulmonares/genética , Mutación , Carcinoma de Pulmón de Células no Pequeñas/mortalidad , Carcinoma de Pulmón de Células no Pequeñas/patología , Biología Computacional/métodos , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Neoplasias Pulmonares/patología , Pronóstico , Reproducibilidad de los Resultados , Secuenciación del Exoma , Flujo de TrabajoRESUMEN
KRAS is the most common oncogenic driver in lung adenocarcinoma (LUAC). We previously reported that STK11/LKB1 (KL) or TP53 (KP) comutations define distinct subgroups of KRAS-mutant LUAC. Here, we examine the efficacy of PD-1 inhibitors in these subgroups. Objective response rates to PD-1 blockade differed significantly among KL (7.4%), KP (35.7%), and K-only (28.6%) subgroups (P < 0.001) in the Stand Up To Cancer (SU2C) cohort (174 patients) with KRAS-mutant LUAC and in patients treated with nivolumab in the CheckMate-057 phase III trial (0% vs. 57.1% vs. 18.2%; P = 0.047). In the SU2C cohort, KL LUAC exhibited shorter progression-free (P < 0.001) and overall (P = 0.0015) survival compared with KRASMUT;STK11/LKB1WT LUAC. Among 924 LUACs, STK11/LKB1 alterations were the only marker significantly associated with PD-L1 negativity in TMBIntermediate/High LUAC. The impact of STK11/LKB1 alterations on clinical outcomes with PD-1/PD-L1 inhibitors extended to PD-L1-positive non-small cell lung cancer. In Kras-mutant murine LUAC models, Stk11/Lkb1 loss promoted PD-1/PD-L1 inhibitor resistance, suggesting a causal role. Our results identify STK11/LKB1 alterations as a major driver of primary resistance to PD-1 blockade in KRAS-mutant LUAC.Significance: This work identifies STK11/LKB1 alterations as the most prevalent genomic driver of primary resistance to PD-1 axis inhibitors in KRAS-mutant lung adenocarcinoma. Genomic profiling may enhance the predictive utility of PD-L1 expression and tumor mutation burden and facilitate establishment of personalized combination immunotherapy approaches for genomically defined LUAC subsets. Cancer Discov; 8(7); 822-35. ©2018 AACR.See related commentary by Etxeberria et al., p. 794This article is highlighted in the In This Issue feature, p. 781.
Asunto(s)
Adenocarcinoma del Pulmón/tratamiento farmacológico , Resistencia a Antineoplásicos/genética , Neoplasias Pulmonares/tratamiento farmacológico , Mutación , Nivolumab/uso terapéutico , Proteínas Serina-Treonina Quinasas/genética , Proteínas Proto-Oncogénicas p21(ras)/genética , Quinasas de la Proteína-Quinasa Activada por el AMP , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/metabolismo , Adenocarcinoma del Pulmón/terapia , Adulto , Anciano , Anciano de 80 o más Años , Animales , Antineoplásicos Inmunológicos/farmacología , Antineoplásicos Inmunológicos/uso terapéutico , Modelos Animales de Enfermedad , Humanos , Inmunoterapia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/terapia , Masculino , Ratones , Persona de Mediana Edad , Nivolumab/farmacología , Pronóstico , Receptor de Muerte Celular Programada 1/antagonistas & inhibidores , Supervivencia sin ProgresiónRESUMEN
High-throughput experiments in biology often produce sets of genes of potential interests. Some of those gene sets might be of considerable size. Therefore, computer-assisted analysis is necessary for the biological interpretation of the gene sets, and for creating working hypotheses, which can be tested experimentally. One obvious way to analyze gene set data is to associate the genes with a particular biological feature, for example, a given pathway. Statistical analysis could be used to evaluate if a gene set is truly associated with a feature. Over the past few years many tools that perform such analysis have been created. In this chapter, using WebGestalt as an example, it will be explained in detail how to associate gene sets with functional annotations, pathways, publication records, and protein domains.
Asunto(s)
Bases de Datos Genéticas , Técnicas Genéticas/estadística & datos numéricos , Programas Informáticos , Biología Computacional , Interpretación Estadística de Datos , Perfilación de la Expresión Génica/estadística & datos numéricos , Genómica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricosRESUMEN
High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php.
Asunto(s)
Genes , Programas Informáticos , Gráficos por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas , Expresión Génica , Genómica , Humanos , Internet , Proteómica , Integración de Sistemas , Distribución Tisular , Interfaz Usuario-ComputadorRESUMEN
OBJECTIVE: To examine the incidence of nonsynonymous missense variants in SCN9A (NaV1.7), SCN10A (NaV1.8), and SCN11A (NaV1.9) in patients with painful and nonpainful peripheral neuropathy. METHODS: Next-generation sequencing was performed on 457 patient DNA samples provided by the Peripheral Neuropathy Research Registry (PNRR). The patient diagnosis was as follows: 278 idiopathic peripheral neuropathy (67% painful and 33% nonpainful) and 179 diabetic distal polyneuropathy (77% painful and 23% nonpainful). RESULTS: We identified 36 (SCN9A), 31 (SCN10A), and 15 (SCN11A) nonsynonymous missense variants, with 47.7% of patients carrying a low-frequency (minor allele frequency <5%) missense variant in at least 1 gene. The incidence of previously reported gain-of-function missense variants was low (≤3%), and these were detected in patients with and without pain. There were no significant differences in missense variant allele frequencies of any gene, or SCN9A haplotype frequencies, between PNRR patients with painful or nonpainful peripheral neuropathy. PNRR patient SCN9A and SCN11A missense variant allele frequencies were not significantly different from the Exome Variant Server, European American (EVS-EA) reference population. For SCN10A, there was a significant increase in the alternate allele frequency of the common variant p.V1073A and low-frequency variant pS509P in PNRR patients compared with EVS-EA and the 1000 Genomes European reference populations. CONCLUSIONS: These results suggest that identification of a genetically defined subpopulation for testing of NaV1.7 inhibitors in patients with peripheral neuropathy is unlikely and that additional factors, beyond expression of previously reported disease "mutations," are more important for the development of painful neuropathy than previously discussed.
RESUMEN
INTRODUCTION: The combination of daclatasvir (DCV, pan-genotypic NS5A inhibitor) plus asunaprevir (ASV; NS3 protease inhibitor) is approved in Japan, Korea and other countries for the treatment of chronic hepatitis C virus (HCV) genotype (GT)-1. A high (~90 to 100%) sustained virologic response (SVR) with DCV/ASV therapy has been achieved by excluding patients infected with HCV GT-1b with baseline NS5A resistance-associated variants (RAVs) at L31 or Y93H detected by direct sequencing (DS). We set out to determine whether patients with minor variants at NS5A-L31 or -Y93H, detected by next-generation sequencing (NGS), impacted SVR rates with DCV/ASV therapy. METHODS: Baseline samples from 222 interferon (IFN)-ineligible/intolerant (N = 135) and prior non-responder (N = 87) patients infected with GT-1b who were treated with DCV/ASV for 24 weeks in the Phase 3 clinical study AI447026 were prepared for NGS (Ion-Torrent platform). The prevalence of baseline NS5A RAVs and their impact on SVR when observed at ≥1% by NGS in a patient's virus population were examined. NGS and DS (sensitivity ≥20%) data were compared. RESULTS: The prevalence of baseline NS5A RAVs at L31 or Y93H was 29% (63/219) and 18% (39/214) by NGS and DS, respectively. SVR24 rates were comparable in patients without observed baseline L31 or Y93H polymorphisms whether assessed by NGS (96%; 148/154) or by the less sensitive DS platform (95%; 164/173). CONCLUSION: Optimal SVR rates (≥95%) to DCV/ASV treatment were achieved using DS to exclude patients infected with GT-1b with NS5A RAVs at L31 or Y93H representing ≥20% of their virus population. Exclusion by NGS of patients with minor variants in NS5A (<20%) did not enhance SVR rates. These results suggest that the presence of minor variants in NS5A does not appear to impact the overall SVR rate in patients with GT-1b treated with DCV/ASV. FUNDING: This study was sponsored by Bristol-Myers Squibb. TRIAL REGISTRATION: ClinicalTrials.gov identifier: NCT01497834.
Asunto(s)
Antivirales/uso terapéutico , Hepacivirus/genética , Hepatitis C Crónica/tratamiento farmacológico , Imidazoles/uso terapéutico , Isoquinolinas/uso terapéutico , Sulfonamidas/uso terapéutico , Adulto , Anciano , Antivirales/administración & dosificación , Carbamatos , Quimioterapia Combinada , Genotipo , Humanos , Imidazoles/administración & dosificación , Isoquinolinas/administración & dosificación , Japón , Masculino , Persona de Mediana Edad , Polimorfismo Genético , Pirrolidinas , Sulfonamidas/administración & dosificación , Valina/análogos & derivados , Proteínas no Estructurales Virales/genética , Adulto JovenRESUMEN
The BET (bromodomain and extra-terminal) proteins bind acetylated histones and recruit protein complexes to promote transcription elongation. In hematologic cancers, BET proteins have been shown to regulate expression of MYC and other genes that are important to disease pathology. Pharmacologic inhibition of BET protein binding has been shown to inhibit tumor growth in MYC-dependent cancers, such as multiple myeloma. In this study, we demonstrate that small cell lung cancer (SCLC) cells are exquisitely sensitive to growth inhibition by the BET inhibitor JQ1. JQ1 treatment has no impact on MYC protein expression, but results in downregulation of the lineage-specific transcription factor ASCL1. SCLC cells that are sensitive to JQ1 are also sensitive to ASCL1 depletion by RNAi. Chromatin immunoprecipitation studies confirmed the binding of the BET protein BRD4 to the ASCL1 enhancer, and the ability of JQ1 to disrupt the interaction. The importance of ASCL1 as a potential driver oncogene in SCLC is further underscored by the observation that ASCL1 is overexpressed in >50% of SCLC specimens, an extent greater than that observed for other putative oncogenes (MYC, MYCN, and SOX2) previously implicated in SCLC. Our studies have provided a mechanistic basis for the sensitivity of SCLC to BET inhibition and a rationale for the clinical development of BET inhibitors in this disease with high unmet medical need.
Asunto(s)
Antineoplásicos/farmacología , Azepinas/farmacología , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Neoplasias Pulmonares/metabolismo , Proteínas Nucleares/metabolismo , Carcinoma Pulmonar de Células Pequeñas/metabolismo , Factores de Transcripción/metabolismo , Triazoles/farmacología , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Proteínas de Ciclo Celular , Línea Celular Tumoral , Proliferación Celular/efectos de los fármacos , Supervivencia Celular/efectos de los fármacos , Resistencia a Antineoplásicos/genética , Ensayos de Selección de Medicamentos Antitumorales , Elementos de Facilitación Genéticos , Expresión Génica , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Humanos , Concentración 50 Inhibidora , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/genética , Unión Proteica , Carcinoma Pulmonar de Células Pequeñas/tratamiento farmacológico , Carcinoma Pulmonar de Células Pequeñas/genética , Transcriptoma/efectos de los fármacosRESUMEN
BACKGROUND: Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. RESULTS: We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at http://genereg.ornl.gov/gotm/. CONCLUSION: GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets.
Asunto(s)
Genes de Insecto/fisiología , Genes/fisiología , Animales , Análisis por Conglomerados , Biología Computacional/estadística & datos numéricos , Gráficos por Computador/estadística & datos numéricos , Interpretación Estadística de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Dípteros/genética , Perfilación de la Expresión Génica/estadística & datos numéricos , Genoma , Genoma Humano , Humanos , Internet , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Ratas , Programas Informáticos/estadística & datos numéricos , Diseño de Software , Interfaz Usuario-ComputadorRESUMEN
Most high-throughput methods which are used in molecular biology generate gene lists. Interpreting large gene lists can reveal mechanistic insights and generate useful testable hypotheses. The process can be cumbersome and challenging. Multiple commercial and open solution currently exist that can aid researchers in the functional annotation of gene lists. The process of gene set annotation includes dataset preparation, which is method specific, gene list annotation and analysis and interpretation of the significant associations that were found. In this chapter, we demonstrate how WebGestalt can be applied to gene lists generated from transcriptional profiling data.
Asunto(s)
Anticuerpos Monoclonales/farmacología , Antineoplásicos/farmacología , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Sitios de Unión , Biopsia , Ontología de Genes , Redes Reguladoras de Genes , Antígenos HLA-D/fisiología , Humanos , Ipilimumab , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Neoplasias/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Transducción de Señal , Factores de Transcripción/fisiologíaRESUMEN
PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.
Asunto(s)
Bases de Datos Genéticas , Internet , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/genéticaRESUMEN
Gene expression microarray data can be used for the assembly of genetic coexpression network graphs. Using mRNA samples obtained from recombinant inbred Mus musculus strains, it is possible to integrate allelic variation with molecular and higher-order phenotypes. The depth of quantitative genetic analysis of microarray data can be vastly enhanced utilizing this mouse resource in combination with powerful computational algorithms, platforms, and data repositories. The resulting network graphs transect many levels of biological scale. This approach is illustrated with the extraction of cliques of putatively co-regulated genes and their annotation using gene ontology analysis and cis-regulatory element discovery. The causal basis for co-regulation is detected through the use of quantitative trait locus mapping.