Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Mol Cell ; 84(13): 2553-2572.e19, 2024 Jul 11.
Article in English | MEDLINE | ID: mdl-38917794

ABSTRACT

CRISPR-Cas technology has transformed functional genomics, yet understanding of how individual exons differentially shape cellular phenotypes remains limited. Here, we optimized and conducted massively parallel exon deletion and splice-site mutation screens in human cell lines to identify exons that regulate cellular fitness. Fitness-promoting exons are prevalent in essential and highly expressed genes and commonly overlap with protein domains and interaction interfaces. Conversely, fitness-suppressing exons are enriched in nonessential genes, exhibiting lower inclusion levels, and overlap with intrinsically disordered regions and disease-associated mutations. In-depth mechanistic investigation of the screen-hit TAF5 alternative exon-8 revealed that its inclusion is required for assembly of the TFIID general transcription initiation complex, thereby regulating global gene expression output. Collectively, our orthogonal exon perturbation screens established a comprehensive repository of phenotypically important exons and uncovered regulatory mechanisms governing cellular fitness and gene expression.


Subject(s)
Exons , Humans , Exons/genetics , CRISPR-Cas Systems , Transcription Factor TFIID/genetics , Transcription Factor TFIID/metabolism , Genetic Fitness , HEK293 Cells , TATA-Binding Protein Associated Factors/genetics , TATA-Binding Protein Associated Factors/metabolism , RNA Splice Sites , Mutation , Gene Expression Regulation , Alternative Splicing
2.
Bioinformatics ; 37(16): 2467-2469, 2021 08 25.
Article in English | MEDLINE | ID: mdl-33289511

ABSTRACT

SUMMARY: The Annotation, Visualization and Impact Analysis (AVIA) is a web application combining multiple features to annotate and visualize genomic variant data. Users can investigate functional significance of their genetic alterations across samples, genes and pathways. Version 3.0 of AVIA offers filtering options through interactive charts and by linking disease relevant data sources. Newly incorporated services include gene, variant and sample level reporting, literature and functional correlations among impacted genes, comparative analysis across samples and against data sources such as TCGA and ClinVar, and cohort building. Sample and data management is now feasible through the application, which allows greater flexibility with sharing, reannotating and organizing data. Most importantly, AVIA's utility stems from its convenience for allowing users to upload and explore results without any a priori knowledge or the need to install, update and maintain software or databases. Together, these enhancements strengthen AVIA as a comprehensive, user-driven variant analysis portal. AVAILABILITYAND IMPLEMENTATION: AVIA is accessible online at https://avia-abcc.ncifcrf.gov.


Subject(s)
Databases, Genetic , Genetic Variation , Data Management , Genome , Genomics , Humans , Internet , Software
3.
Bioinformatics ; 31(16): 2748-50, 2015 Aug 15.
Article in English | MEDLINE | ID: mdl-25861966

ABSTRACT

UNLABELLED: As sequencing becomes cheaper and more widely available, there is a greater need to quickly and effectively analyze large-scale genomic data. While the functionality of AVIA v1.0, whose implementation was based on ANNOVAR, was comparable with other annotation web servers, AVIA v2.0 represents an enhanced web-based server that extends genomic annotations to cell-specific transcripts and protein-level functional annotations. With AVIA's improved interface, users can better visualize their data, perform comprehensive searches and categorize both coding and non-coding variants. AVAILABILITY AND IMPLEMENTATION: AVIA is freely available through the web at http://avia.abcc.ncifcrf.gov. CONTACT: Hue.Vuong@fnlcr.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genes , Genetic Variation , Molecular Sequence Annotation , Software , Databases, Genetic , Internet
4.
Proc Natl Acad Sci U S A ; 108(14): 5626-31, 2011 Apr 05.
Article in English | MEDLINE | ID: mdl-21427231

ABSTRACT

DNA methylation is critical for normal development and plays important roles in genome organization and transcriptional regulation. Although DNA methyltransferases have been identified, the factors that establish and contribute to genome-wide methylation patterns remain elusive. Here, we report a high-resolution cytosine methylation map of the murine genome modulated by Lsh, a chromatin remodeling family member that has previously been shown to regulate CpG methylation at repetitive sequences. We provide evidence that Lsh also controls genome-wide cytosine methylation at nonrepeat sequences and relate those changes to alterations in H4K4me3 modification and gene expression. Deletion of Lsh alters the allocation of cytosine methylation in chromosomal regions of 50 kb to 2 Mb and, in addition, leads to changes in the methylation profile at the 5' end of genes. Furthermore, we demonstrate that loss of Lsh promotes--as well as prevents--cytosine methylation. Our data indicate that Lsh is an epigenetic modulator that is critical for normal distribution of cytosine methylation throughout the murine genome.


Subject(s)
Cytosine/metabolism , DNA Helicases/metabolism , DNA Methylation , Epigenomics , Animals , Blotting, Southern , Cell Line , Chromatin Immunoprecipitation , Chromatography, High Pressure Liquid , Gene Expression Profiling , Genomics , Mice , Mice, Knockout , Oligonucleotide Array Sequence Analysis , Statistics, Nonparametric
5.
Bioinformatics ; 25(4): 555-6, 2009 Feb 15.
Article in English | MEDLINE | ID: mdl-19129209

ABSTRACT

SUMMARY: bioDBnet is an online web resource that provides interconnected access to many types of biological databases. It has integrated many of the most commonly used biological databases and in its current state has 153 database identifiers (nodes) covering all aspects of biology including genes, proteins, pathways and other biological concepts. bioDBnet offers various ways to work with these databases including conversions, extensive database reports, custom navigation and has various tools to enhance the quality of the results. Importantly, the access to bioDBnet is updated regularly, providing access to the most recent releases of each individual database. AVAILABILITY: http://biodbnet.abcc.ncifcrf.gov.


Subject(s)
Databases, Genetic , Software , Internet
6.
BMC Bioinformatics ; 10: 200, 2009 Jun 29.
Article in English | MEDLINE | ID: mdl-19563622

ABSTRACT

BACKGROUND: One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself. RESULTS: We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets. CONCLUSION: This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at http://www.abcc.ncifcrf.gov/wps/wps_index.php.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Databases, Genetic , Software , User-Computer Interface
7.
PLoS One ; 8(12): e80503, 2013.
Article in English | MEDLINE | ID: mdl-24312478

ABSTRACT

As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.


Subject(s)
Data Mining/methods , Databases, Factual , Humans
8.
Stat Med ; 25(18): 3134-49, 2006 Sep 30.
Article in English | MEDLINE | ID: mdl-16252274

ABSTRACT

The genetic case-control association study of unrelated subjects is a leading method to identify single nucleotide polymorphisms (SNPs) and SNP haplotypes that modulate the risk of complex diseases. Association studies often genotype several SNPs in a number of candidate genes; we propose a two-stage approach to address the inherent statistical multiple comparisons problem. In the first stage, each gene's association with disease is summarized by a single p-value that controls a familywise error rate. In the second stage, summary p-values are adjusted for multiplicity using a false discovery rate (FDR) controlling procedure. For the first stage, we consider marginal and joint tests of SNPs and haplotypes within genes, and we construct an omnibus test that combines SNP and haplotype analysis. Simulation studies show that when disease susceptibility is conferred by a SNP, and all common SNPs in a gene are genotyped, marginal analysis of SNPs using the Simes test has similar or higher power than marginal or joint haplotype analysis. Conversely, haplotype analysis can be more powerful when disease susceptibility is conferred by a haplotype. The omnibus test tracks the more powerful of the two approaches, which is generally unknown. Multiple testing balances the desire for statistical power against the implicit costs of false positive results, which up to now appear to be common in the literature.


Subject(s)
Case-Control Studies , Data Interpretation, Statistical , Genetic Predisposition to Disease , Logistic Models , Polymorphism, Single Nucleotide , Computer Simulation , Haplotypes , Humans
SELECTION OF CITATIONS
SEARCH DETAIL