Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 38(7): 1788-1793, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35022670

RESUMEN

MOTIVATION: Telomeres are the repetitive sequences found at the ends of eukaryotic chromosomes and are often thought of as a 'biological clock,' with their average length shortening during division in most cells. In addition to their association with senescence, abnormal telomere lengths are well known to be associated with multiple cancers, short telomere syndromes and as risk factors for a broad range of diseases. While a majority of methods for measuring telomere length will report average lengths across all chromosomes, it is known that aberrations in specific chromosome arms are biomarkers for certain diseases. Due to their repetitive nature, characterizing telomeres at this resolution is prohibitive for short read sequencing approaches, and is challenging still even with longer reads. RESULTS: We present Telogator: a method for reporting chromosome-specific telomere length from long read sequencing data. We demonstrate Telogator's sensitivity in detecting chromosome-specific telomere length in simulated data across a range of read lengths and error rates. Telogator is then applied to 10 germline samples, yielding a high correlation with short read methods in reporting average telomere length. In addition, we investigate common subtelomere rearrangements and identify the minimum read length required to anchor telomere/subtelomere boundaries in samples with these haplotypes. AVAILABILITY AND IMPLEMENTATION: Telogator is written in Python3 and is available at github.com/zstephens/telogator. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuencias Repetitivas de Ácidos Nucleicos , Telómero , Telómero/genética , Haplotipos
2.
Bioinformatics ; 37(11): 1598-1599, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-31808791

RESUMEN

MOTIVATION: DNA methylation can be measured at the single CpG level using sodium bisulfite conversion of genomic DNA followed by sequencing or array hybridization. Many analytic tools have been developed, yet there is still a high demand for a comprehensive and multifaceted tool suite to analyze, annotate, QC and visualize the DNA methylation data. RESULTS: We developed the CpGtools package to analyze DNA methylation data generated from bisulfite sequencing or Illumina methylation arrays. The CpGtools package consists of three types of modules: (i) 'CpG position modules' focus on analyzing the genomic positions of CpGs, including associating other genomic and epigenomic features to a given list of CpGs and generating the DNA motif logo enriched in the genomic contexts of a given list of CpGs; (ii) 'CpG signal modules' are designed to analyze DNA methylation values, such as performing the PCA or t-SNE analyses, using Bayesian Gaussian mixture modeling to classify CpG sites into fully methylated, partially methylated and unmethylated groups, profiling the average DNA methylation level over user-specified genomics regions and generating the bean/violin plots and (iii) 'differential CpG analysis modules' focus on identifying differentially methylated CpGs between groups using different statistical methods including Fisher's Exact Test, Student's t-test, ANOVA, non-parametric tests, linear regression, logistic regression, beta-binomial regression and Bayesian estimation. AVAILABILITY AND IMPLEMENTATION: CpGtools is written in Python under the open-source GPL license. The source code and documentation are freely available at https://github.com/liguowang/cpgtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Teorema de Bayes , Islas de CpG , Humanos , Análisis de Secuencia de ADN
3.
Gastroenterology ; 157(1): 210-226.e12, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30878468

RESUMEN

BACKGROUND & AIMS: The CCNE1 locus, which encodes cyclin E1, is amplified in many types of cancer cells and is activated in hepatocellular carcinomas (HCCs) from patients infected with hepatitis B virus or adeno-associated virus type 2, due to integration of the virus nearby. We investigated cell-cycle and oncogenic effects of cyclin E1 overexpression in tissues of mice. METHODS: We generated mice with doxycycline-inducible expression of Ccne1 (Ccne1T mice) and activated overexpression of cyclin E1 from age 3 weeks onward. At 14 months of age, livers were collected from mice that overexpress cyclin E1 and nontransgenic mice (controls) and analyzed for tumor burden and by histology. Mouse embryonic fibroblasts (MEFs) and hepatocytes from Ccne1T and control mice were analyzed to determine the extent to which cyclin E1 overexpression perturbs S-phase entry, DNA replication, and numbers and structures of chromosomes. Tissues from 4-month-old Ccne1T and control mice (at that age were free of tumors) were analyzed for chromosome alterations, to investigate the mechanisms by which cyclin E1 predisposes hepatocytes to transformation. RESULTS: Ccne1T mice developed more hepatocellular adenomas and HCCs than control mice. Tumors developed only in livers of Ccne1T mice, despite high levels of cyclin E1 in other tissues. Ccne1T MEFs had defects that promoted chromosome missegregation and aneuploidy, including incomplete replication of DNA, centrosome amplification, and formation of nonperpendicular mitotic spindles. Whereas Ccne1T mice accumulated near-diploid aneuploid cells in multiple tissues and organs, polyploidization was observed only in hepatocytes, with losses and gains of whole chromosomes, DNA damage, and oxidative stress. CONCLUSIONS: Livers, but not other tissues of mice with inducible overexpression of cyclin E1, develop tumors. More hepatocytes from the cyclin E1-overexpressing mice were polyploid than from control mice, and had losses or gains of whole chromosomes, DNA damage, and oxidative stress; all of these have been observed in human HCC cells. The increased risk of HCC in patients with hepatitis B virus or adeno-associated virus type 2 infection might involve activation of cyclin E1 and its effects on chromosomes and genomes of liver cells.


Asunto(s)
Adenoma de Células Hepáticas/genética , Carcinoma Hepatocelular/genética , Inestabilidad Cromosómica/genética , Ciclina E/genética , Neoplasias Hepáticas/genética , Hígado/metabolismo , Proteínas Oncogénicas/genética , Adenoma de Células Hepáticas/patología , Adenoma de Células Hepáticas/virología , Animales , Carcinoma Hepatocelular/patología , Carcinoma Hepatocelular/virología , Estructuras Cromosómicas , Daño del ADN/genética , Replicación del ADN , Dependovirus , Fibroblastos , Hepatitis B Crónica , Hepatocitos , Hígado/patología , Neoplasias Hepáticas/patología , Neoplasias Hepáticas/virología , Neoplasias Hepáticas Experimentales/genética , Neoplasias Hepáticas Experimentales/patología , Ratones , Estrés Oxidativo/genética , Infecciones por Parvoviridae , Parvovirinae , Poliploidía , Puntos de Control de la Fase S del Ciclo Celular
4.
Gynecol Oncol ; 156(2): 387-392, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31787246

RESUMEN

OBJECTIVE: We aimed to assess whether endometrial cancer (EC) can be detected in shed DNA collected with vaginal tampon by analyzing copy number, methylation markers, and mutations. METHODS: Tampons were collected prior to hysterectomy from 38 EC patients and 28 women with benign indications. Extracted tampon DNA underwent the following: 1) low-coverage whole genome sequencing (LC-WGS) to assess copy number, 2) pyrosequencing to measure percent promotor methylation of HOXA9, RASSF1, and CDH13 and 3) next generation sequencing (NGS) to identify mutations in 19 genes associated with EC identified through The Cancer Genome Atlas. Sensitivity and specificity for each test and test combinations were calculated. RESULTS: Methylation analysis yielded the highest specificities but lowest sensitivities (37-40% sensitivity; 100% specificity for HOXA9, RASSF1 and HTR1B) while mutation analysis had improved sensitivity (50% sensitivity; 83% specificity). Only one "false positive" result for copy number variants was identified among women with benign surgical indications, which was based on detection of copy number changes, and associated with a leiomyosarcoma that was only recognized at hysterectomy. Considering any of the 3 biomarker classes as a positive, resulted in a sensitivity of 92% and specificity of 86%. Mutation analysis did not add sensitivity to the combination of analysis of copy number and methylation. CONCLUSIONS: This study demonstrates a proof-of-principle for non-invasive yet precise detection of endometrial cancer. We propose that with improved biomarker testing, it may be possible to develop a clinically useful test for detecting EC.


Asunto(s)
Metilación de ADN , Neoplasias Endometriales/genética , Dosificación de Gen , Productos para la Higiene Menstrual , Biomarcadores de Tumor/genética , Diagnóstico Diferencial , Neoplasias Endometriales/diagnóstico , Neoplasias Endometriales/patología , Femenino , Humanos , Persona de Mediana Edad , Mutación , Enfermedades Uterinas/diagnóstico , Enfermedades Uterinas/genética , Enfermedades Uterinas/patología , Frotis Vaginal/métodos
5.
Brief Bioinform ; 18(6): 973-983, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-27473065

RESUMEN

Driver somatic mutations are a hallmark of a tumor that can be used for diagnosis and targeted therapy. Mutations are primarily detected from tumor DNA. As dynamic molecules of gene activities, transcriptome profiling by RNA sequence (RNA-seq) is becoming increasingly popular, which not only measures gene expression but also structural variations such as mutations and fusion transcripts. Although single-nucleotide variants (SNVs) can be easily identified from RNA-seq, intermediate long insertions/deletions (indels > 2 bases and less than sequence reads) cause significant challenges and are ignored by most RNA-seq analysis tools. This study evaluates commonly used RNA-seq analysis programs along with variant and somatic mutation callers in a series of data sets with simulated and known indels. The aim is to develop strategies for accurate indel detection. Our results show that the RNA-seq alignment is the most important step for indel identification and the evaluated programs have a wide range of sensitivity to map sequence reads with indels, from not at all to decently sensitive. The sensitivity is impacted by sequence read lengths. Most variant calling programs rely on hard evidence indels marked in the alignment and the programs with realignment may use soft-clipped reads for indel inferencing. Based on the observations, we have provided practical recommendations for indel detection when different RNA-seq aligners are used and demonstrated the best option with highly reliable results. With careful customization of bioinformatics algorithms, RNA-seq can be reliably used for both SNV and indel mutation detection that can be used for clinical decision-making.


Asunto(s)
Biología Computacional/métodos , Receptores ErbB/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL , Neoplasias Pulmonares/genética , Programas Informáticos , Algoritmos , Estudios de Casos y Controles , Humanos , Secuenciación del Exoma
6.
Nucleic Acids Res ; 45(22): e179, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-28981748

RESUMEN

Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy.


Asunto(s)
Algoritmos , Bioestadística/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Modelos Lineales , ARN/clasificación , ARN/genética , Reproducibilidad de los Resultados
7.
Nucleic Acids Res ; 45(10): 5653-5665, 2017 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-28472449

RESUMEN

Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or 'cerQTL', and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.


Asunto(s)
Regulación de la Expresión Génica , Redes Reguladoras de Genes , Genoma Humano , MicroARNs/genética , Sitios de Carácter Cuantitativo , ARN no Traducido/genética , Regiones no Traducidas 3' , Emparejamiento Base , Sitios de Unión , Mapeo Cromosómico , Estudio de Asociación del Genoma Completo , Humanos , MicroARNs/metabolismo , Polimorfismo de Nucleótido Simple , ARN no Traducido/metabolismo
8.
Hum Hered ; 83(2): 79-91, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30347404

RESUMEN

AIMS: We propose a novel machine learning approach to expand the knowledge about drug-target interactions. Our method may help to develop effective, less harmful treatment strategies and to enable the detection of novel indications for existing drugs. METHODS: We developed a novel machine learning strategy to predict drug-target interactions based on drug side effects and traits from genome-wide association studies. We integrated data from the databases SIDER and GWASdb and utilized them in a unique way by a neural network approach. RESULTS: We validate our method using drug-target interactions from the STITCH database. In addition, we compare the chemical similarity of the predicted target to known targets of the drug under consideration and present literature-based evidence for predicted interactions. We find drug combination warnings for drugs we predict to target the same protein, hinting to synergistic effects aggravating harmful events. This substantiates the translational value of our approach, because we are able to detect drugs that should be taken together with care due to common mechanisms of action. CONCLUSION: Taken together, we conclude that our approach is able to generate a novel and clinically applicable insight into the molecular determinants of drug action.


Asunto(s)
Interacciones Farmacológicas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Humanos , Redes Neurales de la Computación
9.
BMC Bioinformatics ; 19(1): 271, 2018 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-30016933

RESUMEN

BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html .


Asunto(s)
Transferencia de Gen Horizontal/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Integración Viral/genética , Algoritmos , Secuencia de Bases , Neoplasias de la Mama/virología , Línea Celular Tumoral , Simulación por Computador , Femenino , Humanos , Curva ROC , Programas Informáticos , Secuenciación Completa del Genoma , Flujo de Trabajo
10.
BMC Genomics ; 19(1): 841, 2018 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-30482155

RESUMEN

BACKGROUND: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. RESULTS: We have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models. High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction. CONCLUSIONS: The proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value.


Asunto(s)
Cistadenocarcinoma Seroso/clasificación , Variaciones en el Número de Copia de ADN , Ciencia de los Datos/métodos , Genoma Humano , Neoplasias Ováricas/clasificación , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/patología , Femenino , Humanos , Clasificación del Tumor , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Secuenciación Completa del Genoma
11.
Mol Carcinog ; 57(1): 114-124, 2018 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-28926134

RESUMEN

Chromosome instability (CIN) is widely observed in both sporadic and hereditary colorectal cancer (CRC). Defects in APC and WNT signaling are primarily associated with CIN in hereditary CRC, but the genetic causes for CIN in sporadic CRC remain elusive. Using high-density SNP array and exome data from The Cancer Genome Atlas (TCGA), we characterized loss of heterozygosity (LOH) and copy number variation (CNV) in the peripheral blood, normal colon, and corresponding tumor tissue in 15 CRC patients with proficient mismatch repair (MMR) and 24 CRC patients with deficient MMR. We found a high frequency of 18q LOH in tumors and arm-specific enrichment of genetic aberrations on 18q in the normal colon (primarily copy neutral LOH) and blood (primarily copy gain). These aberrations were specific to the sporadic, pMMR CRC. Though in tumor samples genetic aberrations were observed for genes commonly mutated in hereditary CRC (eg, APC, CTNNB1, SMAD4, BRAF), none of them showed LOH or CNV in the normal colon or blood. DCC located on 18q21.1 topped the list of genes with genetic aberrations in the tumor. In an independent cohort of 13 patients subjected to Whole Genome Sequencing (WGS), we found LOH and CNV on 18q in adenomatous polyp and tumor tissues. Our data suggests that patients with sporadic CRC may have genetic aberrations preferentially enriched on 18q in their blood, normal colon epithelium, and non-malignant polyp lesions that may prove useful as a clinical marker for sporadic CRC detection and risk assessment.


Asunto(s)
Neoplasias Colorrectales/genética , Variaciones en el Número de Copia de ADN , Reparación de la Incompatibilidad de ADN/genética , Pérdida de Heterocigocidad , Anciano , Anciano de 80 o más Años , Inestabilidad Cromosómica , Cromosomas Humanos Par 18/genética , Estudios de Cohortes , Neoplasias Colorrectales/patología , Femenino , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Mutación
12.
BMC Cancer ; 18(1): 743, 2018 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-30021563

RESUMEN

Correction to: BMC Cancer (2018) 18:577 DOI https://doi.org/10.1186/s12885-018-4345-2.

13.
BMC Cancer ; 18(1): 577, 2018 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-29783934

RESUMEN

BACKGROUND: The right drug to the right patient at the right time is one of the ideals of Individualized Medicine (IM) and remains one of the most compelling promises of the post-genomic age. The addition of genomic information is expected to increase the precision of an individual patient's treatment, resulting in improved outcomes. While pilot studies have been encouraging, key aspects of interpreting tumor genomics information, such as somatic activation of drug transport or metabolism, have not been systematically evaluated. METHODS: In this work, we developed a simple rule-based approach to classify the therapies administered to each patient from The Cancer Genome Atlas PanCancer dataset (n = 2858) as effective or ineffective. Our Therapy Efficacy model used each patient's drug target and pharmacokinetic (PK) gene expression profile; the specific genes considered for each patient depended on the therapies they received. Patients who received predictably ineffective therapies were considered at high-risk of cancer-related mortality and those who did not receive ineffective therapies were considered at low-risk. The utility of our Therapy Efficacy model was assessed using per-cancer and pan-cancer differential survival. RESULTS: Our simple rule-based Therapy Efficacy model classified 143 (5%) patients as high-risk. High-risk patients had age ranges comparable to low-risk patients of the same cancer type and tended to be later stage and higher grade (odds ratios of 1.6 and 1.4, respectively). A significant pan-cancer association was identified between predictions of our Therapy Efficacy model and poorer overall survival (hazard ratio, HR = 1.47, p = 6.3 × 10- 3). Individually, drug export (HR = 1.49, p = 4.70 × 10- 3) and drug metabolism (HR = 1.73, p = 9.30 × 10- 5) genes demonstrated significant survival associations. Survival associations for target gene expression are mechanism-dependent. Similar results were observed for event-free survival. CONCLUSIONS: While the resolution of clinical information within the dataset is not ideal, and modeling the relative contribution of each gene to the activity of each therapy remains a challenge, our approach demonstrates that somatic PK alterations should be integrated into the interpretation of somatic transcriptomic profiles as they likely have a significant impact on the survival of specific patients. We believe that this approach will aid the prospective design of personalized therapeutic strategies.


Asunto(s)
Antineoplásicos/farmacocinética , Modelos Biológicos , Neoplasias/tratamiento farmacológico , Medicina de Precisión/métodos , Antineoplásicos/uso terapéutico , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica , Humanos , Neoplasias/genética , Variantes Farmacogenómicas/genética , Supervivencia sin Progresión , Modelos de Riesgos Proporcionales , Resultado del Tratamiento
14.
Nucleic Acids Res ; 44(D1): D869-76, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26615194

RESUMEN

Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0 × 10(-3)) from each original publication, and generated a total of 252,530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.


Asunto(s)
Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Ontologías Biológicas , Enfermedad/genética , Genes , Humanos , Anotación de Secuencia Molecular
15.
BMC Bioinformatics ; 18(1): 269, 2017 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-28532394

RESUMEN

BACKGROUND: The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. RESULT: We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CONCLUSION: CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.


Asunto(s)
Internet , Motivos de Nucleótidos/genética , Programas Informáticos , Secuencia de Bases , Sitios de Unión , Células Eucariotas/metabolismo , Intrones/genética , Conformación de Ácido Nucleico , Sitios de Empalme de ARN/genética , ARN de Transferencia/química , ARN de Transferencia/genética , Análisis de Secuencia de ADN
16.
Bioinformatics ; 32(3): 469-71, 2016 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-26449931

RESUMEN

SUMMARY: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method. AVAILABILITY AND IMPLEMENTATION: CpGFilter is implemented in R and publicly available under CRAN via the R package 'CpGFilter'. CONTACT: chen.jun2@mayo.edu or xlin@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Islas de CpG , Metilación de ADN , Epigenómica/métodos , Estudio de Asociación del Genoma Completo , Genoma Humano , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos
17.
Bioinformatics ; 32(18): 2729-36, 2016 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-27273672

RESUMEN

MOTIVATION: Prediction and prioritization of human non-coding regulatory variants is critical for understanding the regulatory mechanisms of disease pathogenesis and promoting personalized medicine. Existing tools utilize functional genomics data and evolutionary information to evaluate the pathogenicity or regulatory functions of non-coding variants. However, different algorithms lead to inconsistent and even conflicting predictions. Combining multiple methods may increase accuracy in regulatory variant prediction. RESULTS: Here, we compiled an integrative resource for predictions from eight different tools on functional annotation of non-coding variants. We further developed a composite strategy to integrate multiple predictions and computed the composite likelihood of a given variant being regulatory variant. Benchmarked by multiple independent causal variants datasets, we demonstrated that our composite model significantly improves the prediction performance. AVAILABILITY AND IMPLEMENTATION: We implemented our model and scoring procedure as a tool, named PRVCS, which is freely available to academic and non-profit usage at http://jjwanglab.org/PRVCS CONTACT: wang.junwen@mayo.edu, jliu@stat.harvard.edu, or limx54@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Modelos Teóricos , Anotación de Secuencia Molecular , Programas Informáticos , Evolución Biológica , Variación Genética , Humanos , ARN no Traducido
18.
Nucleic Acids Res ; 43(2): e7, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25378314

RESUMEN

Integrative analyses of epigenetic data promise a deeper understanding of the epigenome. Epidaurus is a bioinformatics tool used to effectively reveal inter-dataset relevance and differences through data aggregation, integration and visualization. In this study, we demonstrated the utility of Epidaurus in validating hypotheses and generating novel biological insights. In particular, we described the use of Epidaurus to (i) integrate epigenetic data from prostate cancer cell lines to validate the activation function of EZH2 in castration-resistant prostate cancer and to (ii) study the mechanism of androgen receptor (AR) binding deregulation induced by the knockdown of FOXA1. We found that EZH2's noncanonical activation function was reaffirmed by its association with active histone markers and the lack of association with repressive markers. More importantly, we revealed that the binding of AR was selectively reprogramed to promoter regions, leading to the up-regulation of hundreds of cancer-associated genes including EGFR. The prebuilt epigenetic dataset from commonly used cell lines (LNCaP, VCaP, LNCaP-Abl, MCF7, GM12878, K562, HeLa-S3, A549, HePG2) makes Epidaurus a useful online resource for epigenetic research. As standalone software, Epidaurus is specifically designed to process user customized datasets with both efficiency and convenience.


Asunto(s)
Epigenómica/métodos , Neoplasias de la Próstata/genética , Programas Informáticos , Línea Celular Tumoral , Proteína Potenciadora del Homólogo Zeste 2 , Epigénesis Genética , Regulación Neoplásica de la Expresión Génica , Factor Nuclear 3-alfa del Hepatocito/antagonistas & inhibidores , Humanos , Masculino , Complejo Represivo Polycomb 2/metabolismo , Receptores Androgénicos/metabolismo , Transactivadores/metabolismo
19.
Nucleic Acids Res ; 43(14): 6945-58, 2015 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-25916844

RESUMEN

To determine early somatic changes in high-grade serous ovarian cancer (HGSOC), we performed whole genome sequencing on a rare collection of 16 low stage HGSOCs. The majority showed extensive structural alterations (one had an ultramutated profile), exhibited high levels of p53 immunoreactivity, and harboured a TP53 mutation, deletion or inactivation. BRCA1 and BRCA2 mutations were observed in two tumors, with nine showing evidence of a homologous recombination (HR) defect. Combined Analysis with The Cancer Genome Atlas (TCGA) indicated that low and late stage HGSOCs have similar mutation and copy number profiles. We also found evidence that deleterious TP53 mutations are the earliest events, followed by deletions or loss of heterozygosity (LOH) of chromosomes carrying TP53, BRCA1 or BRCA2. Inactivation of HR appears to be an early event, as 62.5% of tumours showed a LOH pattern suggestive of HR defects. Three tumours with the highest ploidy had little genome-wide LOH, yet one of these had a homozygous somatic frame-shift BRCA2 mutation, suggesting that some carcinomas begin as tetraploid then descend into diploidy accompanied by genome-wide LOH. Lastly, we found evidence that structural variants (SV) cluster in HGSOC, but are absent in one ultramutated tumor, providing insights into the pathogenesis of low stage HGSOC.


Asunto(s)
Genes p53 , Mutación , Neoplasias Ováricas/genética , Reparación del ADN por Recombinación , Tetraploidía , Carcinoma/genética , ADN Primasa/genética , Femenino , Humanos , Pérdida de Heterocigocidad , Tasa de Mutación
20.
BMC Bioinformatics ; 17: 58, 2016 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-26842848

RESUMEN

BACKGROUND: Stored biological samples with pathology information and medical records are invaluable resources for translational medical research. However, RNAs extracted from the archived clinical tissues are often substantially degraded. RNA degradation distorts the RNA-seq read coverage in a gene-specific manner, and has profound influences on whole-genome gene expression profiling. RESULT: We developed the transcript integrity number (TIN) to measure RNA degradation. When applied to 3 independent RNA-seq datasets, we demonstrated TIN is a reliable and sensitive measure of the RNA degradation at both transcript and sample level. Through comparing 10 prostate cancer clinical samples with lower RNA integrity to 10 samples with higher RNA quality, we demonstrated that calibrating gene expression counts with TIN scores could effectively neutralize RNA degradation effects by reducing false positives and recovering biologically meaningful pathways. When further evaluating the performance of TIN correction using spike-in transcripts in RNA-seq data generated from the Sequencing Quality Control consortium, we found TIN adjustment had better control of false positives and false negatives (sensitivity = 0.89, specificity = 0.91, accuracy = 0.90), as compared to gene expression analysis results without TIN correction (sensitivity = 0.98, specificity = 0.50, accuracy = 0.86). CONCLUSION: TIN is a reliable measurement of RNA integrity and a valuable approach used to neutralize in vitro RNA degradation effect and improve differential gene expression analysis.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/normas , Neoplasias de la Próstata/genética , Control de Calidad , Estabilidad del ARN/genética , ARN Mensajero/genética , ARN Neoplásico/genética , Análisis de Secuencia de ARN/normas , Genoma Humano , Humanos , Masculino , ARN Mensajero/química , ARN Neoplásico/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA