Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
Nucleic Acids Res ; 46(D1): D1128-D1136, 2018 01 04.
Article in English | MEDLINE | ID: mdl-30053270

ABSTRACT

Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively.


Subject(s)
Biomarkers, Tumor/genetics , Databases, Genetic , Knowledge Bases , Mutation , Neoplasms/genetics , Gene Expression Regulation, Neoplastic , Humans , MicroRNAs , User-Computer Interface
2.
Methods Mol Biol ; 1878: 1-37, 2019.
Article in English | MEDLINE | ID: mdl-30378067

ABSTRACT

The use of large datasets has become ubiquitous in biomedical sciences. Researchers in the field of cancer genomics have, in recent years, generated large volumes of data from their experiments. Those responsible for production of this data often analyze a narrow subset of this data based on the research question they are trying to address: this is the case whether or not they are acting independently or in conjunction with a large-scale cancer genomics project. The reality of this situation creates the opportunity for other researchers to repurpose this data for different hypotheses if the data is made easily and freely available. New insights in biology resulting from more researchers having access to data they otherwise would be unable to generate on their own are a boon for the field. The following chapter reviews several cancer genomics-related databases and outlines the type of data they contain, as well as the methods required to access each database. While this list is not comprehensive, it should provide a basis for cancer researchers to begin exploring some of the many large datasets that are available to them.


Subject(s)
Neoplasms/genetics , Databases, Genetic , Genomics/methods , Humans , Research
3.
PLoS One ; 14(4): e0213770, 2019.
Article in English | MEDLINE | ID: mdl-30934003

ABSTRACT

Human endogenous retroviruses (HERVs) have been investigated for potential links with human cancer. However, the distribution of somatic nucleotide variations in HERV elements has not been explored in detail. This study aims to identify HERV elements with an over-representation of somatic mutations (hot spots) in cancer patients. Four HERV elements with mutation hotspots were identified that overlap with exons of four human protein coding genes. These hotspots were identified based on the significant over-representation (p<8.62e-4) of non-synonymous single-nucleotide variations (nsSNVs). These genes are TNN (HERV-9/LTR12), OR4K15 (HERV-IP10F/LTR10F), ZNF99 (HERV-W/HERV17/LTR17), and KIR2DL1 (MST/MaLR). In an effort to identify mutations that effect survival, all nsSNVs were further evaluated and it was found that kidney cancer patients with mutation C2270G in ZNF99 have a significantly lower survival rate (hazard ratio = 2.6) compared to those without it. Among HERV elements in the human non-protein coding regions, we found 788 HERVs with significantly elevated numbers of somatic single-nucleotide variations (SNVs) (p<1.60e-5). From this category the top three HERV elements with significantly over-represented SNVs are HERV-H/LTR7, HERV-9/LTR12 and HERV-L/MLT2. Majority of the SNVs in these 788 HERV elements are located in three DNA functional groups: long non-coding RNAs (lncRNAs) (60%), introns (22.2%) and transcriptional factor binding sites (TFBS) (14.8%). This study provides a list of mutational hotspots in HERVs, which could potentially be used as biomarkers and therapeutic targets.


Subject(s)
Endogenous Retroviruses/genetics , Genome, Human/genetics , Kidney Neoplasms/genetics , Polymorphism, Single Nucleotide/genetics , Exons/genetics , Gene Expression Regulation, Neoplastic , Humans , Introns/genetics , Kidney Neoplasms/pathology , Mutation , RNA, Long Noncoding/genetics , Receptors, KIR2DL1/genetics , Survival Analysis , Tenascin/genetics , Terminal Repeat Sequences/genetics
4.
Sci Rep ; 8(1): 16549, 2018 11 08.
Article in English | MEDLINE | ID: mdl-30409989

ABSTRACT

Leishmania donovani is responsible for visceral leishmaniasis, a neglected and lethal parasitic disease with limited treatment options and no vaccine. The study of L. donovani has been hindered by the lack of a high-quality reference genome and this can impact experimental outcomes including the identification of virulence genes, drug targets and vaccine development. We therefore generated a complete genome assembly by deep sequencing using a combination of second generation (Illumina) and third generation (PacBio) sequencing technologies. Compared to the current L. donovani assembly, the genome assembly reported within resulted in the closure over 2,000 gaps, the extension of several chromosomes up to telomeric repeats and the re-annotation of close to 15% of protein coding genes and the annotation of hundreds of non-coding RNA genes. It was possible to correctly assemble the highly repetitive A2 and Amastin virulence gene clusters. A comparative sequence analysis using the improved reference genome confirmed 70 published and identified 15 novel genomic differences between closely related visceral and atypical cutaneous disease-causing L. donovani strains providing a more complete map of genes associated with virulence and visceral organ tropism. Bioinformatic tools including protein variation effect analyzer and basic local alignment search tool were used to prioritize a list of potential virulence genes based on mutation severity, gene conservation and function. This complete genome assembly and novel information on virulence factors will support the identification of new drug targets and the development of a vaccine for L. donovani.


Subject(s)
Leishmania donovani/pathogenicity , Virulence Factors/genetics , Whole Genome Sequencing/methods , Animals , Genetic Variation , High-Throughput Nucleotide Sequencing , Leishmania donovani/genetics , Leishmaniasis, Visceral/parasitology , Molecular Sequence Annotation , Sri Lanka , Tropism
5.
Article in English | MEDLINE | ID: mdl-28113865

ABSTRACT

Services such as Facebook, Amazon, and eBay were once solely accessed from stationary computers. These web services are now being used increasingly on mobile devices. We acknowledge this new reality by providing users a way to access publications and a curated cancer mutation database on their mobile device with daily automated updates. AVAILABILITY: http://hive. biochemistry.gwu.edu/tools/HivePubcast.


Subject(s)
Data Mining/methods , Database Management Systems , Databases, Genetic , Periodicals as Topic , Smartphone , User-Computer Interface , Data Curation , Internet
6.
Sci Rep ; 7: 42169, 2017 02 08.
Article in English | MEDLINE | ID: mdl-28176830

ABSTRACT

Single nucleotide variations (SNVs) can result in loss or gain of protein functional sites. We analyzed the effects of SNVs on enzyme active sites, ligand binding sites, and various types of post translational modification (PTM) sites. We found that, for most types of protein functional sites, the SNV pattern differs between germline and somatic mutations as well as between synonymous and non-synonymous mutations. From a total of 51,138 protein functional site affecting SNVs (pfsSNVs), a pan-cancer analysis revealed 142 somatic pfsSNVs in five or more cancer types. By leveraging patient information for somatic pfsSNVs, we identified 17 loss of functional site SNVs and 60 gain of functional site SNVs which are significantly enriched in patients with specific cancer types. Of the key pfsSNVs identified in our analysis above, we highlight 132 key pfsSNVs within 17 genes that are found in well-established cancer associated gene lists. For illustrating how key pfsSNVs can be prioritized further, we provide a use case where we performed survival analysis showing that a loss of phosphorylation site pfsSNV at position 105 in MEF2A is significantly associated with decreased pancreatic cancer patient survival rate. These 132 pfsSNVs can be used in developing genetic testing pipelines.


Subject(s)
Gene Expression Regulation, Neoplastic , Germ-Line Mutation , Neoplasm Proteins/genetics , Neoplasms/genetics , Polymorphism, Single Nucleotide , Protein Processing, Post-Translational , Acetylation , Catalytic Domain , Databases, Genetic , Gene Expression Profiling , Gene Ontology , Glycosylation , Humans , Methylation , Molecular Sequence Annotation , Neoplasm Proteins/chemistry , Neoplasm Proteins/metabolism , Neoplasms/metabolism , Neoplasms/mortality , Neoplasms/pathology , Phosphorylation , Survival Analysis , Ubiquitination
7.
Article in English | MEDLINE | ID: mdl-26989153

ABSTRACT

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , User-Computer Interface , Computational Biology , Mutation/genetics , Poliovirus/genetics , Poliovirus Vaccines/immunology , Proteomics , Recombination, Genetic , Sequence Alignment , Statistics as Topic
SELECTION OF CITATIONS
SEARCH DETAIL