Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 116
Filtrar
1.
BMC Bioinformatics ; 25(1): 194, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38755561

RESUMO

Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.


Assuntos
Alelos , Telômero , Telômero/genética , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetitivas de Ácido Nucleico/genética
2.
Gene Ther ; 30(3-4): 386-397, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36258038

RESUMO

Gene editing for the cure of inborn errors of metabolism (IEMs) has been limited by inefficiency of adult hepatocyte targeting. Here, we demonstrate that in utero CRISPR/Cas9-mediated gene editing in a mouse model of hereditary tyrosinemia type 1 provides stable cure of the disease. Following this, we performed an extensive gene expression analysis to explore the inherent characteristics of fetal/neonatal hepatocytes that make them more susceptible to efficient gene editing than adult hepatocytes. We showed that fetal and neonatal livers are comprised of proliferative hepatocytes with abundant expression of genes involved in homology-directed repair (HDR) of DNA double-strand breaks (DSBs), key for efficient gene editing by CRISPR/Cas9. We demonstrated the same is true of hepatocytes after undergoing a regenerative stimulus (partial hepatectomy), where post-hepatectomy cells show a higher efficiency of HDR and correction. Specifically, we demonstrated that HDR-related genome correction is most effective in the replicative phase, or S-phase, of an actively proliferating cell. In conclusion, this study shows that taking advantage of or triggering cell proliferation, specifically DNA replication in S-phase, may serve as an important tool to improve efficiency of CRISPR/Cas9-mediated genome editing in the liver and provide a curative therapy for IEMs in both children and adults.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes , Animais , Camundongos , Reparo de DNA por Recombinação , Quebras de DNA de Cadeia Dupla , DNA , Reparo do DNA
3.
Bioinformatics ; 38(7): 1788-1793, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35022670

RESUMO

MOTIVATION: Telomeres are the repetitive sequences found at the ends of eukaryotic chromosomes and are often thought of as a 'biological clock,' with their average length shortening during division in most cells. In addition to their association with senescence, abnormal telomere lengths are well known to be associated with multiple cancers, short telomere syndromes and as risk factors for a broad range of diseases. While a majority of methods for measuring telomere length will report average lengths across all chromosomes, it is known that aberrations in specific chromosome arms are biomarkers for certain diseases. Due to their repetitive nature, characterizing telomeres at this resolution is prohibitive for short read sequencing approaches, and is challenging still even with longer reads. RESULTS: We present Telogator: a method for reporting chromosome-specific telomere length from long read sequencing data. We demonstrate Telogator's sensitivity in detecting chromosome-specific telomere length in simulated data across a range of read lengths and error rates. Telogator is then applied to 10 germline samples, yielding a high correlation with short read methods in reporting average telomere length. In addition, we investigate common subtelomere rearrangements and identify the minimum read length required to anchor telomere/subtelomere boundaries in samples with these haplotypes. AVAILABILITY AND IMPLEMENTATION: Telogator is written in Python3 and is available at github.com/zstephens/telogator. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequências Repetitivas de Ácido Nucleico , Telômero , Telômero/genética , Haplótipos
4.
Bioinformatics ; 37(11): 1598-1599, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-31808791

RESUMO

MOTIVATION: DNA methylation can be measured at the single CpG level using sodium bisulfite conversion of genomic DNA followed by sequencing or array hybridization. Many analytic tools have been developed, yet there is still a high demand for a comprehensive and multifaceted tool suite to analyze, annotate, QC and visualize the DNA methylation data. RESULTS: We developed the CpGtools package to analyze DNA methylation data generated from bisulfite sequencing or Illumina methylation arrays. The CpGtools package consists of three types of modules: (i) 'CpG position modules' focus on analyzing the genomic positions of CpGs, including associating other genomic and epigenomic features to a given list of CpGs and generating the DNA motif logo enriched in the genomic contexts of a given list of CpGs; (ii) 'CpG signal modules' are designed to analyze DNA methylation values, such as performing the PCA or t-SNE analyses, using Bayesian Gaussian mixture modeling to classify CpG sites into fully methylated, partially methylated and unmethylated groups, profiling the average DNA methylation level over user-specified genomics regions and generating the bean/violin plots and (iii) 'differential CpG analysis modules' focus on identifying differentially methylated CpGs between groups using different statistical methods including Fisher's Exact Test, Student's t-test, ANOVA, non-parametric tests, linear regression, logistic regression, beta-binomial regression and Bayesian estimation. AVAILABILITY AND IMPLEMENTATION: CpGtools is written in Python under the open-source GPL license. The source code and documentation are freely available at https://github.com/liguowang/cpgtools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Teorema de Bayes , Ilhas de CpG , Humanos , Análise de Sequência de DNA
5.
Blood ; 133(26): 2776-2789, 2019 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-31101622

RESUMO

Anaplastic large cell lymphomas (ALCLs) represent a relatively common group of T-cell non-Hodgkin lymphomas (T-NHLs) that are unified by similar pathologic features but demonstrate marked genetic heterogeneity. ALCLs are broadly classified as being anaplastic lymphoma kinase (ALK)+ or ALK-, based on the presence or absence of ALK rearrangements. Exome sequencing of 62 T-NHLs identified a previously unreported recurrent mutation in the musculin gene, MSC E116K, exclusively in ALK- ALCLs. Additional sequencing for a total of 238 T-NHLs confirmed the specificity of MSC E116K for ALK- ALCL and further demonstrated that 14 of 15 mutated cases (93%) had coexisting DUSP22 rearrangements. Musculin is a basic helix-loop-helix (bHLH) transcription factor that heterodimerizes with other bHLH proteins to regulate lymphocyte development. The E116K mutation localized to the DNA binding domain of musculin and permitted formation of musculin-bHLH heterodimers but prevented their binding to authentic target sequence. Functional analysis showed MSCE116K acted in a dominant-negative fashion, reversing wild-type musculin-induced repression of MYC and cell cycle inhibition. Chromatin immunoprecipitation-sequencing and transcriptome analysis identified the cell cycle regulatory gene E2F2 as a direct transcriptional target of musculin. MSCE116K reversed E2F2-induced cell cycle arrest and promoted expression of the CD30-IRF4-MYC axis, whereas its expression was reciprocally induced by binding of IRF4 to the MSC promoter. Finally, ALCL cells expressing MSC E116K were preferentially targeted by the BET inhibitor JQ1. These findings identify a novel recurrent MSC mutation as a key driver of the CD30-IRF4-MYC axis and cell cycle progression in a unique subset of ALCLs.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Linfoma Anaplásico de Células Grandes/genética , Quinase do Linfoma Anaplásico/genética , Ciclo Celular/genética , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Mutação
6.
Gastroenterology ; 157(1): 210-226.e12, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30878468

RESUMO

BACKGROUND & AIMS: The CCNE1 locus, which encodes cyclin E1, is amplified in many types of cancer cells and is activated in hepatocellular carcinomas (HCCs) from patients infected with hepatitis B virus or adeno-associated virus type 2, due to integration of the virus nearby. We investigated cell-cycle and oncogenic effects of cyclin E1 overexpression in tissues of mice. METHODS: We generated mice with doxycycline-inducible expression of Ccne1 (Ccne1T mice) and activated overexpression of cyclin E1 from age 3 weeks onward. At 14 months of age, livers were collected from mice that overexpress cyclin E1 and nontransgenic mice (controls) and analyzed for tumor burden and by histology. Mouse embryonic fibroblasts (MEFs) and hepatocytes from Ccne1T and control mice were analyzed to determine the extent to which cyclin E1 overexpression perturbs S-phase entry, DNA replication, and numbers and structures of chromosomes. Tissues from 4-month-old Ccne1T and control mice (at that age were free of tumors) were analyzed for chromosome alterations, to investigate the mechanisms by which cyclin E1 predisposes hepatocytes to transformation. RESULTS: Ccne1T mice developed more hepatocellular adenomas and HCCs than control mice. Tumors developed only in livers of Ccne1T mice, despite high levels of cyclin E1 in other tissues. Ccne1T MEFs had defects that promoted chromosome missegregation and aneuploidy, including incomplete replication of DNA, centrosome amplification, and formation of nonperpendicular mitotic spindles. Whereas Ccne1T mice accumulated near-diploid aneuploid cells in multiple tissues and organs, polyploidization was observed only in hepatocytes, with losses and gains of whole chromosomes, DNA damage, and oxidative stress. CONCLUSIONS: Livers, but not other tissues of mice with inducible overexpression of cyclin E1, develop tumors. More hepatocytes from the cyclin E1-overexpressing mice were polyploid than from control mice, and had losses or gains of whole chromosomes, DNA damage, and oxidative stress; all of these have been observed in human HCC cells. The increased risk of HCC in patients with hepatitis B virus or adeno-associated virus type 2 infection might involve activation of cyclin E1 and its effects on chromosomes and genomes of liver cells.


Assuntos
Adenoma de Células Hepáticas/genética , Carcinoma Hepatocelular/genética , Instabilidade Cromossômica/genética , Ciclina E/genética , Neoplasias Hepáticas/genética , Fígado/metabolismo , Proteínas Oncogênicas/genética , Adenoma de Células Hepáticas/patologia , Adenoma de Células Hepáticas/virologia , Animais , Carcinoma Hepatocelular/patologia , Carcinoma Hepatocelular/virologia , Estruturas Cromossômicas , Dano ao DNA/genética , Replicação do DNA , Dependovirus , Fibroblastos , Hepatite B Crônica , Hepatócitos , Fígado/patologia , Neoplasias Hepáticas/patologia , Neoplasias Hepáticas/virologia , Neoplasias Hepáticas Experimentais/genética , Neoplasias Hepáticas Experimentais/patologia , Camundongos , Estresse Oxidativo/genética , Infecções por Parvoviridae , Parvovirinae , Poliploidia , Pontos de Checagem da Fase S do Ciclo Celular
7.
Gynecol Oncol ; 156(2): 387-392, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787246

RESUMO

OBJECTIVE: We aimed to assess whether endometrial cancer (EC) can be detected in shed DNA collected with vaginal tampon by analyzing copy number, methylation markers, and mutations. METHODS: Tampons were collected prior to hysterectomy from 38 EC patients and 28 women with benign indications. Extracted tampon DNA underwent the following: 1) low-coverage whole genome sequencing (LC-WGS) to assess copy number, 2) pyrosequencing to measure percent promotor methylation of HOXA9, RASSF1, and CDH13 and 3) next generation sequencing (NGS) to identify mutations in 19 genes associated with EC identified through The Cancer Genome Atlas. Sensitivity and specificity for each test and test combinations were calculated. RESULTS: Methylation analysis yielded the highest specificities but lowest sensitivities (37-40% sensitivity; 100% specificity for HOXA9, RASSF1 and HTR1B) while mutation analysis had improved sensitivity (50% sensitivity; 83% specificity). Only one "false positive" result for copy number variants was identified among women with benign surgical indications, which was based on detection of copy number changes, and associated with a leiomyosarcoma that was only recognized at hysterectomy. Considering any of the 3 biomarker classes as a positive, resulted in a sensitivity of 92% and specificity of 86%. Mutation analysis did not add sensitivity to the combination of analysis of copy number and methylation. CONCLUSIONS: This study demonstrates a proof-of-principle for non-invasive yet precise detection of endometrial cancer. We propose that with improved biomarker testing, it may be possible to develop a clinically useful test for detecting EC.


Assuntos
Metilação de DNA , Neoplasias do Endométrio/genética , Dosagem de Genes , Produtos de Higiene Menstrual , Biomarcadores Tumorais/genética , Diagnóstico Diferencial , Neoplasias do Endométrio/diagnóstico , Neoplasias do Endométrio/patologia , Feminino , Humanos , Pessoa de Meia-Idade , Mutação , Doenças Uterinas/diagnóstico , Doenças Uterinas/genética , Doenças Uterinas/patologia , Esfregaço Vaginal/métodos
8.
Brief Bioinform ; 18(6): 973-983, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-27473065

RESUMO

Driver somatic mutations are a hallmark of a tumor that can be used for diagnosis and targeted therapy. Mutations are primarily detected from tumor DNA. As dynamic molecules of gene activities, transcriptome profiling by RNA sequence (RNA-seq) is becoming increasingly popular, which not only measures gene expression but also structural variations such as mutations and fusion transcripts. Although single-nucleotide variants (SNVs) can be easily identified from RNA-seq, intermediate long insertions/deletions (indels > 2 bases and less than sequence reads) cause significant challenges and are ignored by most RNA-seq analysis tools. This study evaluates commonly used RNA-seq analysis programs along with variant and somatic mutation callers in a series of data sets with simulated and known indels. The aim is to develop strategies for accurate indel detection. Our results show that the RNA-seq alignment is the most important step for indel identification and the evaluated programs have a wide range of sensitivity to map sequence reads with indels, from not at all to decently sensitive. The sensitivity is impacted by sequence read lengths. Most variant calling programs rely on hard evidence indels marked in the alignment and the programs with realignment may use soft-clipped reads for indel inferencing. Based on the observations, we have provided practical recommendations for indel detection when different RNA-seq aligners are used and demonstrated the best option with highly reliable results. With careful customization of bioinformatics algorithms, RNA-seq can be reliably used for both SNV and indel mutation detection that can be used for clinical decision-making.


Assuntos
Biologia Computacional/métodos , Receptores ErbB/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação INDEL , Neoplasias Pulmonares/genética , Software , Algoritmos , Estudos de Casos e Controles , Humanos , Sequenciamento do Exoma
9.
Nucleic Acids Res ; 45(22): e179, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28981748

RESUMO

Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy.


Assuntos
Algoritmos , Bioestatística/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Modelos Lineares , RNA/classificação , RNA/genética , Reprodutibilidade dos Testes
10.
Nucleic Acids Res ; 45(10): 5653-5665, 2017 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-28472449

RESUMO

Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or 'cerQTL', and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.


Assuntos
Regulação da Expressão Gênica , Redes Reguladoras de Genes , Genoma Humano , MicroRNAs/genética , Locos de Características Quantitativas , RNA não Traduzido/genética , Regiões 3' não Traduzidas , Pareamento de Bases , Sítios de Ligação , Mapeamento Cromossômico , Estudo de Associação Genômica Ampla , Humanos , MicroRNAs/metabolismo , Polimorfismo de Nucleotídeo Único , RNA não Traduzido/metabolismo
11.
Nucleic Acids Res ; 45(W1): W215-W221, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28482068

RESUMO

Cancer therapies have experienced rapid progress in recent years, with a number of novel small-molecule kinase inhibitors and monoclonal antibodies now being widely used to treat various types of human cancers. During cancer treatments, mutations can have important effects on drug sensitivity. However, the relationship between tumor genomic profiles and the effectiveness of cancer drugs remains elusive. We introduce Mutation To Cancer Therapy Scan (mTCTScan) web server (http://jjwanglab.org/mTCTScan) that can systematically analyze mutations affecting cancer drug sensitivity based on individual genomic profiles. The platform was developed by leveraging the latest knowledge on mutation-cancer drug sensitivity associations and the results from large-scale chemical screening using human cancer cell lines. Using an evidence-based scoring scheme based on current integrative evidences, mTCTScan is able to prioritize mutations according to their associations with cancer drugs and preclinical compounds. It can also show related drugs/compounds with sensitivity classification by considering the context of the entire genomic profile. In addition, mTCTScan incorporates comprehensive filtering functions and cancer-related annotations to better interpret mutation effects and their association with cancer drugs. This platform will greatly benefit both researchers and clinicians for interrogating mechanisms of mutation-dependent drug response, which will have a significant impact on cancer precision medicine.


Assuntos
Resistencia a Medicamentos Antineoplásicos/genética , Mutação , Software , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Neoplasias/genética
12.
Hum Hered ; 83(2): 79-91, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30347404

RESUMO

AIMS: We propose a novel machine learning approach to expand the knowledge about drug-target interactions. Our method may help to develop effective, less harmful treatment strategies and to enable the detection of novel indications for existing drugs. METHODS: We developed a novel machine learning strategy to predict drug-target interactions based on drug side effects and traits from genome-wide association studies. We integrated data from the databases SIDER and GWASdb and utilized them in a unique way by a neural network approach. RESULTS: We validate our method using drug-target interactions from the STITCH database. In addition, we compare the chemical similarity of the predicted target to known targets of the drug under consideration and present literature-based evidence for predicted interactions. We find drug combination warnings for drugs we predict to target the same protein, hinting to synergistic effects aggravating harmful events. This substantiates the translational value of our approach, because we are able to detect drugs that should be taken together with care due to common mechanisms of action. CONCLUSION: Taken together, we conclude that our approach is able to generate a novel and clinically applicable insight into the molecular determinants of drug action.


Assuntos
Interações Medicamentosas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Estudo de Associação Genômica Ampla , Aprendizado de Máquina , Humanos , Redes Neurais de Computação
13.
BMC Bioinformatics ; 19(Suppl 20): 508, 2018 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577744

RESUMO

BACKGROUND: With applications in cancer, drug metabolism, and disease etiology, understanding structural variation in the human genome is critical in advancing the thrusts of individualized medicine. However, structural variants (SVs) remain challenging to detect with high sensitivity using short read sequencing technologies. This problem is exacerbated when considering complex SVs comprised of multiple overlapping or nested rearrangements. Longer reads, such as those from Pacific Biosciences platforms, often span multiple breakpoints of such events, and thus provide a way to unravel small-scale complexities in SVs with higher confidence. RESULTS: We present CORGi (COmplex Rearrangement detection with Graph-search), a method for the detection and visualization of complex local genomic rearrangements. This method leverages the ability of long reads to span multiple breakpoints to untangle SVs that appear very complicated with respect to a reference genome. We validated our approach against both simulated long reads, and real data from two long read sequencing technologies. We demonstrate the ability of our method to identify breakpoints inserted in synthetic data with high accuracy, and the ability to detect and plot SVs from NA12878 germline, achieving 88.4% concordance between the two sets of sequence data. The patterns of complexity we find in many NA12878 SVs match known mechanisms associated with DNA replication and structural variant formation, and highlight the ability of our method to automatically label complex SVs with an intuitive combination of adjacent or overlapping reference transformations. CONCLUSIONS: CORGi is a method for interrogating genomic regions suspected to contain local rearrangements using long reads. Using pairwise alignments and graph search CORGi produces labels and visualizations for local SVs of arbitrary complexity.


Assuntos
Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Simulação por Computador , Duplicação Gênica , Genoma Humano , Humanos , Alinhamento de Sequência , Software
14.
BMC Bioinformatics ; 19(1): 271, 2018 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-30016933

RESUMO

BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html .


Assuntos
Transferência Genética Horizontal/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Integração Viral/genética , Algoritmos , Sequência de Bases , Neoplasias da Mama/virologia , Linhagem Celular Tumoral , Simulação por Computador , Feminino , Humanos , Curva ROC , Software , Sequenciamento Completo do Genoma , Fluxo de Trabalho
15.
BMC Genomics ; 19(1): 841, 2018 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-30482155

RESUMO

BACKGROUND: Copy Number Alternations (CNAs) is defined as somatic gain or loss of DNA regions. The profiles of CNAs may provide a fingerprint specific to a tumor type or tumor grade. Low-coverage sequencing for reporting CNAs has recently gained interest since successfully translated into clinical applications. Ovarian serous carcinomas can be classified into two largely mutually exclusive grades, low grade and high grade, based on their histologic features. The grade classification based on the genomics may provide valuable clue on how to best manage these patients in clinic. Based on the study of ovarian serous carcinomas, we explore the methodology of combining CNAs reporting from low-coverage sequencing with machine learning techniques to stratify tumor biospecimens of different grades. RESULTS: We have developed a data-driven methodology for tumor classification using the profiles of CNAs reported by low-coverage sequencing. The proposed method called Bag-of-Segments is used to summarize fixed-length CNA features predictive of tumor grades. These features are further processed by machine learning techniques to obtain classification models. High accuracy is obtained for classifying ovarian serous carcinoma into high and low grades based on leave-one-out cross-validation experiments. The models that are weakly influenced by the sequence coverage and the purity of the sample can also be built, which would be of higher relevance for clinical applications. The patterns captured by Bag-of-Segments features correlate with current clinical knowledge: low grade ovarian tumors being related to aneuploidy events associated to mitotic errors while high grade ovarian tumors are induced by DNA repair gene malfunction. CONCLUSIONS: The proposed data-driven method obtains high accuracy with various parametrizations for the ovarian serous carcinoma study, indicating that it has good generalization potential towards other CNA classification problems. This method could be applied to the more difficult task of classifying ovarian serous carcinomas with ambiguous histology or in those with low grade tumor co-existing with high grade tumor. The closer genomic relationship of these tumor samples to low or high grade may provide important clinical value.


Assuntos
Cistadenocarcinoma Seroso/classificação , Variações do Número de Cópias de DNA , Ciência de Dados/métodos , Genoma Humano , Neoplasias Ovarianas/classificação , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/patologia , Feminino , Humanos , Gradação de Tumores , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/patologia , Sequenciamento Completo do Genoma
16.
Mol Carcinog ; 57(1): 114-124, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28926134

RESUMO

Chromosome instability (CIN) is widely observed in both sporadic and hereditary colorectal cancer (CRC). Defects in APC and WNT signaling are primarily associated with CIN in hereditary CRC, but the genetic causes for CIN in sporadic CRC remain elusive. Using high-density SNP array and exome data from The Cancer Genome Atlas (TCGA), we characterized loss of heterozygosity (LOH) and copy number variation (CNV) in the peripheral blood, normal colon, and corresponding tumor tissue in 15 CRC patients with proficient mismatch repair (MMR) and 24 CRC patients with deficient MMR. We found a high frequency of 18q LOH in tumors and arm-specific enrichment of genetic aberrations on 18q in the normal colon (primarily copy neutral LOH) and blood (primarily copy gain). These aberrations were specific to the sporadic, pMMR CRC. Though in tumor samples genetic aberrations were observed for genes commonly mutated in hereditary CRC (eg, APC, CTNNB1, SMAD4, BRAF), none of them showed LOH or CNV in the normal colon or blood. DCC located on 18q21.1 topped the list of genes with genetic aberrations in the tumor. In an independent cohort of 13 patients subjected to Whole Genome Sequencing (WGS), we found LOH and CNV on 18q in adenomatous polyp and tumor tissues. Our data suggests that patients with sporadic CRC may have genetic aberrations preferentially enriched on 18q in their blood, normal colon epithelium, and non-malignant polyp lesions that may prove useful as a clinical marker for sporadic CRC detection and risk assessment.


Assuntos
Neoplasias Colorretais/genética , Variações do Número de Cópias de DNA , Reparo de Erro de Pareamento de DNA/genética , Perda de Heterozigosidade , Idoso , Idoso de 80 Anos ou mais , Instabilidade Cromossômica , Cromossomos Humanos Par 18/genética , Estudos de Coortes , Neoplasias Colorretais/patologia , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Mutação
17.
Brief Bioinform ; 17(2): 346-51, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26210358

RESUMO

Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.


Assuntos
Algoritmos , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Interface Usuário-Computador , Sistemas de Gerenciamento de Base de Dados , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software
18.
BMC Cancer ; 18(1): 743, 2018 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-30021563

RESUMO

Correction to: BMC Cancer (2018) 18:577 DOI https://doi.org/10.1186/s12885-018-4345-2.

19.
BMC Cancer ; 18(1): 577, 2018 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-29783934

RESUMO

BACKGROUND: The right drug to the right patient at the right time is one of the ideals of Individualized Medicine (IM) and remains one of the most compelling promises of the post-genomic age. The addition of genomic information is expected to increase the precision of an individual patient's treatment, resulting in improved outcomes. While pilot studies have been encouraging, key aspects of interpreting tumor genomics information, such as somatic activation of drug transport or metabolism, have not been systematically evaluated. METHODS: In this work, we developed a simple rule-based approach to classify the therapies administered to each patient from The Cancer Genome Atlas PanCancer dataset (n = 2858) as effective or ineffective. Our Therapy Efficacy model used each patient's drug target and pharmacokinetic (PK) gene expression profile; the specific genes considered for each patient depended on the therapies they received. Patients who received predictably ineffective therapies were considered at high-risk of cancer-related mortality and those who did not receive ineffective therapies were considered at low-risk. The utility of our Therapy Efficacy model was assessed using per-cancer and pan-cancer differential survival. RESULTS: Our simple rule-based Therapy Efficacy model classified 143 (5%) patients as high-risk. High-risk patients had age ranges comparable to low-risk patients of the same cancer type and tended to be later stage and higher grade (odds ratios of 1.6 and 1.4, respectively). A significant pan-cancer association was identified between predictions of our Therapy Efficacy model and poorer overall survival (hazard ratio, HR = 1.47, p = 6.3 × 10- 3). Individually, drug export (HR = 1.49, p = 4.70 × 10- 3) and drug metabolism (HR = 1.73, p = 9.30 × 10- 5) genes demonstrated significant survival associations. Survival associations for target gene expression are mechanism-dependent. Similar results were observed for event-free survival. CONCLUSIONS: While the resolution of clinical information within the dataset is not ideal, and modeling the relative contribution of each gene to the activity of each therapy remains a challenge, our approach demonstrates that somatic PK alterations should be integrated into the interpretation of somatic transcriptomic profiles as they likely have a significant impact on the survival of specific patients. We believe that this approach will aid the prospective design of personalized therapeutic strategies.


Assuntos
Antineoplásicos/farmacocinética , Modelos Biológicos , Neoplasias/tratamento farmacológico , Medicina de Precisão/métodos , Antineoplásicos/uso terapêutico , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Humanos , Neoplasias/genética , Variantes Farmacogenômicos/genética , Intervalo Livre de Progressão , Modelos de Riscos Proporcionais , Resultado do Tratamento
20.
Nucleic Acids Res ; 44(D1): D869-76, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26615194

RESUMO

Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0 × 10(-3)) from each original publication, and generated a total of 252,530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.


Assuntos
Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Ontologias Biológicas , Doença/genética , Genes , Humanos , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA