Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Genome-wide detection of tandem DNA repeats that are expanded in autism.

Trost, Brett; Engchuan, Worrawat; Nguyen, Charlotte M; Thiruvahindrapuram, Bhooma; Dolzhenko, Egor; Backstrom, Ian; Mirceta, Mila; Mojarad, Bahareh A; Yin, Yue; Dov, Alona; Chandrakumar, Induja; Prasolava, Tanya; Shum, Natalie; Hamdan, Omar; Pellecchia, Giovanna; Howe, Jennifer L; Whitney, Joseph; Klee, Eric W; Baheti, Saurabh; Amaral, David G; Anagnostou, Evdokia; Elsabbagh, Mayada; Fernandez, Bridget A; Hoang, Ny; Lewis, M E Suzanne; Liu, Xudong; Sjaarda, Calvin; Smith, Isabel M; Szatmari, Peter; Zwaigenbaum, Lonnie; Glazer, David; Hartley, Dean; Stewart, A Keith; Eberle, Michael A; Sato, Nozomu; Pearson, Christopher E; Scherer, Stephen W; Yuen, Ryan K C.

Nature ; 586(7827): 80-86, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32717741

RESUMO

Tandem DNA repeats vary in the size and sequence of each unit (motif). When expanded, these tandem DNA repeats have been associated with more than 40 monogenic disorders1. Their involvement in disorders with complex genetics is largely unknown, as is the extent of their heterogeneity. Here we investigated the genome-wide characteristics of tandem repeats that had motifs with a length of 2-20 base pairs in 17,231 genomes of families containing individuals with autism spectrum disorder (ASD)2,3 and population control individuals4. We found extensive polymorphism in the size and sequence of motifs. Many of the tandem repeat loci that we detected correlated with cytogenetic fragile sites. At 2,588 loci, gene-associated expansions of tandem repeats that were rare among population control individuals were significantly more prevalent among individuals with ASD than their siblings without ASD, particularly in exons and near splice junctions, and in genes related to the development of the nervous system and cardiovascular system or muscle. Rare tandem repeat expansions had a prevalence of 23.3% in children with ASD compared with 20.7% in children without ASD, which suggests that tandem repeat expansions make a collective contribution to the risk of ASD of 2.6%. These rare tandem repeat expansions included previously undescribed ASD-linked expansions in DMPK and FXN, which are associated with neuromuscular conditions, and in previously unknown loci such as FGF14 and CACNB1. Rare tandem repeat expansions were associated with lower IQ and adaptive ability. Our results show that tandem DNA repeat expansions contribute strongly to the genetic aetiology and phenotypic complexity of ASD.

Assuntos

Transtorno do Espectro Autista/genética , Expansão das Repetições de DNA/genética , Genoma Humano/genética , Genômica , Sequências de Repetição em Tandem/genética , Feminino , Fatores de Crescimento de Fibroblastos/genética , Predisposição Genética para Doença , Humanos , Inteligência/genética , Proteínas de Ligação ao Ferro/genética , Masculino , Miotonina Proteína Quinase/genética , Motivos de Nucleotídeos , Polimorfismo Genético , Frataxina

2.

Monoallelic Mutations to DNAJB11 Cause Atypical Autosomal-Dominant Polycystic Kidney Disease.

Cornec-Le Gall, Emilie; Olson, Rory J; Besse, Whitney; Heyer, Christina M; Gainullin, Vladimir G; Smith, Jessica M; Audrézet, Marie-Pierre; Hopp, Katharina; Porath, Binu; Shi, Beili; Baheti, Saurabh; Senum, Sarah R; Arroyo, Jennifer; Madsen, Charles D; Férec, Claude; Joly, Dominique; Jouret, François; Fikri-Benbrahim, Oussamah; Charasse, Christophe; Coulibaly, Jean-Marie; Yu, Alan S; Khalili, Korosh; Pei, York; Somlo, Stefan; Le Meur, Yannick; Torres, Vicente E; Harris, Peter C.

Am J Hum Genet ; 102(5): 832-844, 2018 05 03.

Artigo em Inglês | MEDLINE | ID: mdl-29706351

RESUMO

Autosomal-dominant polycystic kidney disease (ADPKD) is characterized by the progressive development of kidney cysts, often resulting in end-stage renal disease (ESRD). This disorder is genetically heterogeneous with â¼7% of families genetically unresolved. We performed whole-exome sequencing (WES) in two multiplex ADPKD-like pedigrees, and we analyzed a further 591 genetically unresolved, phenotypically similar families by targeted next-generation sequencing of 65 candidate genes. WES identified a DNAJB11 missense variant (p.Pro54Arg) in two family members presenting with non-enlarged polycystic kidneys and a frameshifting change (c.166_167insTT) in a second family with small renal and liver cysts. DNAJB11 is a co-factor of BiP, a key chaperone in the endoplasmic reticulum controlling folding, trafficking, and degradation of secreted and membrane proteins. Five additional multigenerational families carrying DNAJB11 mutations were identified by the targeted analysis. The clinical phenotype was consistent in the 23 affected members, with non-enlarged cystic kidneys that often evolved to kidney atrophy; 7 subjects reached ESRD from 59 to 89 years. The lack of kidney enlargement, histologically evident interstitial fibrosis in non-cystic parenchyma, and recurring episodes of gout (one family) suggested partial phenotypic overlap with autosomal-dominant tubulointerstitial diseases (ADTKD). Characterization of DNAJB11-null cells and kidney samples from affected individuals revealed a pathogenesis associated with maturation and trafficking defects involving the ADPKD protein, PC1, and ADTKD proteins, such as UMOD. DNAJB11-associated disease is a phenotypic hybrid of ADPKD and ADTKD, characterized by normal-sized cystic kidneys and progressive interstitial fibrosis resulting in late-onset ESRD.

Assuntos

Alelos , Proteínas de Choque Térmico HSP40/genética , Mutação/genética , Rim Policístico Autossômico Dominante/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Sequência de Aminoácidos , Sequência de Bases , Células Epiteliais/metabolismo , Família , Feminino , Proteínas de Choque Térmico HSP40/química , Humanos , Alça do Néfron/patologia , Masculino , Pessoa de Meia-Idade , Linhagem , Rim Policístico Autossômico Dominante/diagnóstico por imagem , Rim Policístico Autossômico Dominante/patologia , Canais de Cátion TRPP/genética , Uromodulina/metabolismo , Sequenciamento do Exoma , Adulto Jovem

3.

Next-Generation Sequencing of CYP2C19 in Stent Thrombosis: Implications for Clopidogrel Pharmacogenomics.

Morales-Rosado, Joel A; Goel, Kashish; Zhang, Lingxin; Åkerblom, Axel; Baheti, Saurabh; Black, John L; Eriksson, Niclas; Wallentin, Lars; James, Stefan; Storey, Robert F; Goodman, Shaun G; Jenkins, Gregory D; Eckloff, Bruce W; Bielinski, Suzette J; Sicotte, Hugues; Johnson, Stephen; Roger, Veronique L; Wang, Liewei; Weinshilboum, Richard; Klee, Eric W; Rihal, Charanjit S; Pereira, Naveen L.

Cardiovasc Drugs Ther ; 35(3): 549-559, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-32623598

RESUMO

PURPOSE: Describe CYP2C19 sequencing results in the largest series of clopidogrel-treated cases with stent thrombosis (ST), the closest clinical phenotype to clopidogrel resistance. Evaluate the impact of CYP2C19 genetic variation detected by next-generation sequencing (NGS) with comprehensive annotation and functional studies. METHODS: Seventy ST cases on clopidogrel identified from the PLATO trial (n = 58) and Mayo Clinic biorepository (n = 12) were matched 1:1 with controls for age, race, sex, diabetes mellitus, presentation, and stent type. NGS was performed to cover the entire CYP2C19 gene. Assessment of exonic variants involved measuring in vitro protein expression levels. Intronic variants were evaluated for potential splicing motif variations. RESULTS: Poor metabolizers (n = 4) and rare CYP2C19*8, CYP2C19*15, and CYP2C19*11 alleles were identified only in ST cases. CYP2C19*17 heterozygote carriers were observed more frequently in cases (n = 29) than controls (n = 18). Functional studies of CYP2C19 exonic variants (n = 11) revealed 3 cases and only 1 control carrying a deleterious variant as determined by in vitro protein expression studies. Greater intronic variation unique to ST cases (n = 169) compared with controls (n = 84) was observed with predictions revealing 13 allele candidates that may lead to a potential disruption of splicing and a loss-of-function effect of CYP2C19 in ST cases. CONCLUSION: NGS detected CYP2C19 poor metabolizers and paradoxically greater number of so-called rapid metabolizers in ST cases. Rare deleterious exonic variation occurs in 4%, and potentially disruptive intronic alleles occur in 16% of ST cases. Additional studies are required to evaluate the role of these variants in platelet aggregation and clopidogrel metabolism.

Assuntos

Clopidogrel/farmacocinética , Citocromo P-450 CYP2C19/genética , Resistência a Medicamentos/genética , Inibidores da Agregação Plaquetária/farmacocinética , Trombose/prevenção & controle , Idoso , Alelos , Clopidogrel/administração & dosagem , Exoma/genética , Feminino , Humanos , Íntrons , Masculino , Pessoa de Meia-Idade , Inibidores da Agregação Plaquetária/administração & dosagem , Stents

4.

Detection and characterization of mosaicism in autosomal dominant polycystic kidney disease.

Hopp, Katharina; Cornec-Le Gall, Emilie; Senum, Sarah R; Te Paske, Iris B A W; Raj, Sonam; Lavu, Sravanthi; Baheti, Saurabh; Edwards, Marie E; Madsen, Charles D; Heyer, Christina M; Ong, Albert C M; Bae, Kyongtae T; Fatica, Richard; Steinman, Theodore I; Chapman, Arlene B; Gitomer, Berenice; Perrone, Ronald D; Rahbari-Oskoui, Frederic F; Torres, Vicente E; Harris, Peter C.

Kidney Int ; 97(2): 370-382, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31874800

RESUMO

Autosomal dominant polycystic kidney disease (ADPKD) is an inherited, progressive nephropathy accounting for 4-10% of end stage renal disease worldwide. PKD1 and PKD2 are the most common disease loci, but even accounting for other genetic causes, about 7% of families remain unresolved. Typically, these unsolved cases have relatively mild kidney disease and often have a negative family history. Mosaicism, due to de novo mutation in the early embryo, has rarely been identified by conventional genetic analysis of ADPKD families. Here we screened for mosaicism by employing two next generation sequencing screens, specific analysis of PKD1 and PKD2 employing long-range polymerase chain reaction, or targeted capture of cystogenes. We characterized mosaicism in 20 ADPKD families; the pathogenic variant was transmitted to the next generation in five families and sporadic in 15. The mosaic pathogenic variant was newly discovered by next generation sequencing in 13 families, and these methods precisely quantified the level of mosaicism in all. All of the mosaic cases had PKD1 mutations, 14 were deletions or insertions, and 16 occurred in females. Analysis of kidney size and function showed the mosaic cases had milder disease than a control PKD1 population, but only a few had clearly asymmetric disease. Thus, in a typical ADPKD population, readily detectable mosaicism by next generation sequencing accounts for about 1% of cases, and about 10% of genetically unresolved cases with an uncertain family history. Hence, identification of mosaicism is important to fully characterize ADPKD populations and provides informed prognostic information.

Assuntos

Rim Policístico Autossômico Dominante , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mosaicismo , Mutação , Rim Policístico Autossômico Dominante/diagnóstico , Rim Policístico Autossômico Dominante/genética , Canais de Cátion TRPP/genética

5.

Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

Zhang, Yun; Baheti, Saurabh; Sun, Zhifu.

Brief Bioinform ; 19(3): 374-386, 2018 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-28040747

RESUMO

High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

Assuntos

Ilhas de CpG , Metilação de DNA , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Leucemia Mielomonocítica Crônica/genética , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Humanos

6.

Molecular profiling reveals immunogenic cues in anaplastic large cell lymphomas with DUSP22 rearrangements.

Luchtel, Rebecca A; Dasari, Surendra; Oishi, Naoki; Pedersen, Martin Bjerregård; Hu, Guangzhen; Rech, Karen L; Ketterling, Rhett P; Sidhu, Jagmohan; Wang, Xueju; Katoh, Ryohei; Dogan, Ahmet; Kip, N Sertac; Cunningham, Julie M; Sun, Zhifu; Baheti, Saurabh; Porcher, Julie C; Said, Jonathan W; Jiang, Liuyan; Hamilton-Dutoit, Stephen Jacques; Møller, Michael Boe; Nørgaard, Peter; Bennani, N Nora; Chng, Wee-Joo; Huang, Gaofeng; Link, Brian K; Facchetti, Fabio; Cerhan, James R; d'Amore, Francesco; Ansell, Stephen M; Feldman, Andrew L.

Blood ; 132(13): 1386-1398, 2018 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-30093402

RESUMO

Anaplastic large cell lymphomas (ALCLs) are CD30-positive T-cell non-Hodgkin lymphomas broadly segregated into ALK-positive and ALK-negative types. Although ALK-positive ALCLs consistently bear rearrangements of the ALK tyrosine kinase gene, ALK-negative ALCLs are clinically and genetically heterogeneous. About 30% of ALK-negative ALCLs have rearrangements of DUSP22 and have excellent long-term outcomes with standard therapy. To better understand this group of tumors, we evaluated their molecular signature using gene expression profiling. DUSP22-rearranged ALCLs belonged to a distinct subset of ALCLs that lacked expression of genes associated with JAK-STAT3 signaling, a pathway contributing to growth in the majority of ALCLs. Reverse-phase protein array and immunohistochemical studies confirmed the lack of activated STAT3 in DUSP22-rearranged ALCLs. DUSP22-rearranged ALCLs also overexpressed immunogenic cancer-testis antigen (CTA) genes and showed marked DNA hypomethylation by reduced representation bisulfate sequencing and DNA methylation arrays. Pharmacologic DNA demethylation in ALCL cells recapitulated the overexpression of CTAs and other DUSP22 signature genes. In addition, DUSP22-rearranged ALCLs minimally expressed PD-L1 compared with other ALCLs, but showed high expression of the costimulatory gene CD58 and HLA class II. Taken together, these findings indicate that DUSP22 rearrangements define a molecularly distinct subgroup of ALCLs, and that immunogenic cues related to antigenicity, costimulatory molecule expression, and inactivity of the PD-1/PD-L1 immune checkpoint likely contribute to their favorable prognosis. More aggressive ALCLs might be pharmacologically reprogrammed to a DUSP22-like immunogenic molecular signature through the use of demethylating agents and/or immune checkpoint inhibitors.

Assuntos

Metilação de DNA , Fosfatases de Especificidade Dupla/genética , Regulação Neoplásica da Expressão Gênica , Rearranjo Gênico , Linfoma Anaplásico de Células Grandes/genética , Fosfatases da Proteína Quinase Ativada por Mitógeno/genética , Antígenos de Neoplasias/genética , Fosfatases de Especificidade Dupla/imunologia , Feminino , Humanos , Linfoma Anaplásico de Células Grandes/diagnóstico , Linfoma Anaplásico de Células Grandes/imunologia , Linfoma Anaplásico de Células Grandes/patologia , Masculino , Pessoa de Meia-Idade , Fosfatases da Proteína Quinase Ativada por Mitógeno/imunologia , Fosforilação , Prognóstico , Fator de Transcrição STAT3/análise , Transcriptoma , Evasão Tumoral

7.

Association of intraneural perineurioma with neurofibromatosis type 2.

Pendleton, Courtney; Spinner, Robert J; Dyck, P James B; Mauermann, Michelle L; Ladak, Adil; Restrepo, Carlos E; Baheti, Saurabh; Klein, Christopher J.

Acta Neurochir (Wien) ; 162(8): 1891-1897, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32529330

RESUMO

BACKGROUND: Neurofibromatosis type 2 (NF2) is a genetic disorder characterized by mutations of the NF2 tumor suppressor gene that predisposes patients to develop multiple tumors in the peripheral and central nervous system. The most common neoplasms associated with the disease are schwannomas and meningiomas. Both have been shown to contain abnormalities in chromosome 22 and the NF2 gene, suggesting a genetic component to their pathogenesis. Perineuriomas are rare benign tumors arising from the perineural cells. They are commonly classified as intraneural and soft tissue perineuriomas. Several studies have reported mutations in genes on chromosome 22 in both types of perineuriomas, and there are reports of soft tissue perineuriomas associated with NF2 gene mutations. Despite this, perineuriomas are not considered as part of the NF2 constellation of tumors. METHOD: The electronic medical records were searched for patients with a radiologic or pathologic diagnosis of intraneural perineurioma. Patients with clinical signs and genetic testing consistent with a diagnosis of NF2 were further evaluated. RESULTS: Of 112 patients meeting inclusion criteria, there were two cases of intraneural perineurioma in patients with NF2 treated at our institution (1.8%). We include a third patient treated at another facility for whom we performed a virtual consultation. CONCLUSIONS: The rarity of both NF2 and perineuriomas could explain the rarity of perineuriomas in the setting of NF2. Furthermore, there is divergent intraneural and soft tissue perineurioma somatic mutation pathogenesis, and there may be cytogenetic overlap between perineuriomas and multiple tumor syndromes. Our observed occurrence of intraneural perineurioma in the setting of NF2 in several patients provides further evidence of a potential link between the NF2 gene and the development of intraneural perineurioma.

Assuntos

Neoplasias de Bainha Neural/complicações , Neurofibromatose 2/epidemiologia , Humanos , Neurofibromatose 2/complicações

8.

Recommendations for performance optimizations when using GATK3.8 and GATK4.

Heldenbrand, Jacob R; Baheti, Saurabh; Bockol, Matthew A; Drucker, Travis M; Hart, Steven N; Hudson, Matthew E; Iyer, Ravishankar K; Kalmbach, Michael T; Kendig, Katherine I; Klee, Eric W; Mattson, Nathan R; Wieben, Eric D; Wiepert, Mathieu; Wildman, Derek E; Mainzer, Liudmila S.

BMC Bioinformatics ; 20(1): 557, 2019 Nov 08.

Artigo em Inglês | MEDLINE | ID: mdl-31703611

RESUMO

BACKGROUND: Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance. RESULTS: We re-evaluated multiple options, such as threading, parallel garbage collection, I/O options and data-level parallelization. Additionally, we considered the trade-offs of using GATK3.8 and GATK4. We found optimized parameter values that reduce the time of executing the best practices variant calling procedure by 29.3% for GATK3.8 and 16.9% for GATK4. Further speedups can be accomplished by splitting data for parallel analysis, resulting in run time of only a few hours on whole human genome sequenced to the depth of 20X, for both versions of GATK. Nonetheless, GATK4 is already much more cost-effective than GATK3.8. Thanks to significant rewrites of the algorithms, the same analysis can be run largely in a single-threaded fashion, allowing users to process multiple samples on the same CPU. CONCLUSIONS: In time-sensitive situations, when a patient has a critical or rapidly developing condition, it is useful to minimize the time to process a single sample. In such cases we recommend using GATK3.8 by splitting the sample into chunks and computing across multiple nodes. The resultant walltime will be nnn.4 hours at the cost of $41.60 on 4 c5.18xlarge instances of Amazon Cloud. For cost-effectiveness of routine analyses or for large population studies, it is useful to maximize the number of samples processed per unit time. Thus we recommend GATK4, running multiple samples on one node. The total walltime will be â¼34.1 hours on 40 samples, with 1.18 samples processed per hour at the cost of $2.60 per sample on c5.18xlarge instance of Amazon Cloud.

Assuntos

Genômica/métodos , Software , Algoritmos , Cromossomos Humanos/genética , Genoma Humano , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

9.

Correction to: Recommendations for performance optimizations when using GATK3.8 and GATK4.

Heldenbrand, Jacob R; Baheti, Saurabh; Bockol, Matthew A; Drucker, Travis M; Hart, Steven N; Hudson, Matthew E; Iyer, Ravishankar K; Kalmbach, Michael T; Kendig, Katherine I; Klee, Eric W; Mattson, Nathan R; Wieben, Eric D; Wiepert, Mathieu; Wildman, Derek E; Mainzer, Liudmila S.

BMC Bioinformatics ; 20(1): 722, 2019 12 17.

Artigo em Inglês | MEDLINE | ID: mdl-31847808

RESUMO

Following publication of the original article [1], the author explained that Table 2 is displayed incorrectly. The correct Table 2 is given below. The original article has been corrected.

10.

Mutations in GANAB, Encoding the Glucosidase IIα Subunit, Cause Autosomal-Dominant Polycystic Kidney and Liver Disease.

Porath, Binu; Gainullin, Vladimir G; Cornec-Le Gall, Emilie; Dillinger, Elizabeth K; Heyer, Christina M; Hopp, Katharina; Edwards, Marie E; Madsen, Charles D; Mauritz, Sarah R; Banks, Carly J; Baheti, Saurabh; Reddy, Bharathi; Herrero, José Ignacio; Bañales, Jesús M; Hogan, Marie C; Tasic, Velibor; Watnick, Terry J; Chapman, Arlene B; Vigneau, Cécile; Lavainne, Frédéric; Audrézet, Marie-Pierre; Ferec, Claude; Le Meur, Yannick; Torres, Vicente E; Harris, Peter C.

Am J Hum Genet ; 98(6): 1193-1207, 2016 06 02.

Artigo em Inglês | MEDLINE | ID: mdl-27259053

RESUMO

Autosomal-dominant polycystic kidney disease (ADPKD) is a common, progressive, adult-onset disease that is an important cause of end-stage renal disease (ESRD), which requires transplantation or dialysis. Mutations in PKD1 or PKD2 (â¼85% and â¼15% of resolved cases, respectively) are the known causes of ADPKD. Extrarenal manifestations include an increased level of intracranial aneurysms and polycystic liver disease (PLD), which can be severe and associated with significant morbidity. Autosomal-dominant PLD (ADPLD) with no or very few renal cysts is a separate disorder caused by PRKCSH, SEC63, or LRP5 mutations. After screening, 7%-10% of ADPKD-affected and â¼50% of ADPLD-affected families were genetically unresolved (GUR), suggesting further genetic heterogeneity of both disorders. Whole-exome sequencing of six GUR ADPKD-affected families identified one with a missense mutation in GANAB, encoding glucosidase II subunit α (GIIα). Because PRKCSH encodes GIIß, GANAB is a strong ADPKD and ADPLD candidate gene. Sanger screening of 321 additional GUR families identified eight further likely mutations (six truncating), and a total of 20 affected individuals were identified in seven ADPKD- and two ADPLD-affected families. The phenotype was mild PKD and variable, including severe, PLD. Analysis of GANAB-null cells showed an absolute requirement of GIIα for maturation and surface and ciliary localization of the ADPKD proteins (PC1 and PC2), and reduced mature PC1 was seen in GANAB(+/-) cells. PC1 surface localization in GANAB(-/-) cells was rescued by wild-type, but not mutant, GIIα. Overall, we show that GANAB mutations cause ADPKD and ADPLD and that the cystogenesis is most likely driven by defects in PC1 maturation.

Assuntos

Cistos/genética , Hepatopatias/genética , Mutação/genética , Rim Policístico Autossômico Dominante/genética , alfa-Glucosidases/genética , Adulto , Idoso , Sequência de Aminoácidos , Sistemas CRISPR-Cas , Células Cultivadas , Criança , Feminino , Imunofluorescência , Humanos , Imunoprecipitação , Masculino , Microscopia Confocal , Pessoa de Meia-Idade , Linhagem , Rim Policístico Autossômico Dominante/patologia , Homologia de Sequência de Aminoácidos

11.

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.

Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas; Middha, Sumit; McDonnell, Shannon K; Baheti, Saurabh; Musolf, Anthony; Li, Qing; Holzinger, Emily; Karyadi, Danielle; Cannon-Albright, Lisa A; Teerlink, Craig C; Stanford, Janet L; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan M; Schleutker, Johanna; Carpten, John D; Powell, Isaac J; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William D; Mandal, Diptasri; Eeles, Rosalind A; Kote-Jarai, Zsofia; Bustamante, Carlos D; Schaid, Daniel J; Hastie, Trevor; Ostrander, Elaine A; Bailey-Wilson, Joan E; Radivojac, Predrag; Thibodeau, Stephen N; Whittemore, Alice S; Sieh, Weiva.

Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.

Artigo em Inglês | MEDLINE | ID: mdl-27666373

RESUMO

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.

Assuntos

Doença/genética , Mutação de Sentido Incorreto/genética , Software , Área Sob a Curva , Análise Mutacional de DNA , Exoma/genética , Frequência do Gene , Humanos , Curva ROC

12.

HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data.

Baheti, Saurabh; Tang, Xiaojia; O'Brien, Daniel R; Chia, Nicholas; Roberts, Lewis R; Nelson, Heidi; Boughey, Judy C; Wang, Liewei; Goetz, Matthew P; Kocher, Jean-Pierre A; Kalari, Krishna R.

BMC Bioinformatics ; 19(1): 271, 2018 07 17.

Artigo em Inglês | MEDLINE | ID: mdl-30016933

RESUMO

BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html .

Assuntos

Transferência Genética Horizontal/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Integração Viral/genética , Algoritmos , Sequência de Bases , Neoplasias da Mama/virologia , Linhagem Celular Tumoral , Simulação por Computador , Feminino , Humanos , Curva ROC , Software , Sequenciamento Completo do Genoma , Fluxo de Trabalho

13.

The Role of the Histone Methyltransferase Enhancer of Zeste Homolog 2 (EZH2) in the Pathobiological Mechanisms Underlying Inflammatory Bowel Disease (IBD).

Sarmento, Olga F; Svingen, Phyllis A; Xiong, Yuning; Sun, Zhifu; Bamidele, Adebowale O; Mathison, Angela J; Smyrk, Thomas C; Nair, Asha A; Gonzalez, Michelle M; Sagstetter, Mary R; Baheti, Saurabh; McGovern, Dermot P B; Friton, Jessica J; Papadakis, Konstantinos A; Gautam, Goel; Xavier, Ramnik J; Urrutia, Raul A; Faubion, William A.

J Biol Chem ; 292(2): 706-722, 2017 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-27909059

RESUMO

Regulatory T (Treg) cells expressing the transcription factor FOXP3 play a pivotal role in maintaining immunologic self-tolerance. We and others have shown previously that EZH2 is recruited to the FOXP3 promoter and its targets in Treg cells. To further address the role for EZH2 in Treg cellular function, we have now generated mice that lack EZH2 specifically in Treg cells (EZH2Δ/ΔFOXP3+). We find that EZH2 deficiency in FOXP3+ T cells results in lethal multiorgan autoimmunity. We further demonstrate that EZH2Δ/ΔFOXP3+ T cells lack a regulatory phenotype in vitro and secrete proinflammatory cytokines. Of special interest, EZH2Δ/ΔFOXP3+ mice develop spontaneous inflammatory bowel disease. Guided by these results, we assessed the FOXP3 and EZH2 gene networks by RNA sequencing in isolated intestinal CD4+ T cells from patients with Crohn's disease. Gene network analysis demonstrates that these CD4+ T cells display a Th1/Th17-like phenotype with an enrichment of gene targets shared by FOXP3 and EZH2. Combined, these results suggest that the inflammatory milieu found in Crohn's disease could lead to or result from deregulation of FOXP3/EZH2-enforced T cell gene networks contributing to the underlying intestinal inflammation.

Assuntos

Doença de Crohn/imunologia , Proteína Potenciadora do Homólogo 2 de Zeste/imunologia , Redes Reguladoras de Genes/imunologia , Linfócitos T Reguladores/imunologia , Células Th17/imunologia , Animais , Doença de Crohn/patologia , Citocinas/genética , Citocinas/imunologia , Proteína Potenciadora do Homólogo 2 de Zeste/genética , Fatores de Transcrição Forkhead/genética , Fatores de Transcrição Forkhead/imunologia , Humanos , Camundongos , Camundongos Transgênicos , Linfócitos T Reguladores/patologia , Células Th17/patologia

14.

Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression.

Larson, Nicholas B; McDonnell, Shannon; French, Amy J; Fogarty, Zach; Cheville, John; Middha, Sumit; Riska, Shaun; Baheti, Saurabh; Nair, Asha A; Wang, Liang; Schaid, Daniel J; Thibodeau, Stephen N.

Am J Hum Genet ; 96(6): 869-82, 2015 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-25983244

RESUMO

The identification of cis-acting regulatory variation in primary tissues has the potential to elucidate the genetic basis of complex traits and further our understanding of transcriptomic diversity across cell types. Expression quantitative trait locus (eQTL) association analysis using RNA sequencing (RNA-seq) data can improve upon the detection of cis-acting regulatory variation by leveraging allele-specific expression (ASE) patterns in association analysis. Here, we present a comprehensive evaluation of cis-acting eQTLs by analyzing RNA-seq gene-expression data and genome-wide high-density genotypes from 471 samples of normal primary prostate tissue. Using statistical models that integrate ASE information, we identified extensive cis-eQTLs across the prostate transcriptome and found that approximately 70% of expressed genes corresponded to a significant eQTL at a gene-level false-discovery rate of 0.05. Overall, cis-eQTLs were heavily concentrated near the transcription start and stop sites of affected genes, and effects were negatively correlated with distance. We identified multiple instances of cis-acting co-regulation by using phased genotype data and discovered 233 SNPs as the most strongly associated eQTLs for more than one gene. We also noted significant enrichment (25/50, p = 2E-5) of previously reported prostate cancer risk SNPs in prostate eQTLs. Our results illustrate the benefit of assessing ASE data in cis-eQTL analyses by showing better reproducibility of prior eQTL findings than of eQTL mapping based on total expression alone. Altogether, our analysis provides extensive functional context of thousands of SNPs in prostate tissue, and these results will be of critical value in guiding studies examining disease of the human prostate.

Assuntos

Variação Genética , Próstata/metabolismo , Locos de Características Quantitativas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Transcriptoma/genética , Biologia Computacional , Genótipo , Humanos , Masculino , Modelos Genéticos , Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos

15.

Germline miRNA DNA variants and the risk of colorectal cancer by subtype.

Lindor, Noralane M; Larson, Melissa C; DeRycke, Melissa S; McDonnell, Shannon K; Baheti, Saurabh; Fogarty, Zachary C; Win, Aung Ko; Potter, John D; Buchanan, Daniel D; Clendenning, Mark; Newcomb, Polly A; Casey, Graham; Gallinger, Steven; Le Marchand, Loïc; Hopper, John L; Jenkins, Mark A; Goode, Ellen L; Thibodeau, Stephen N.

Genes Chromosomes Cancer ; 56(3): 177-184, 2017 03.

Artigo em Inglês | MEDLINE | ID: mdl-27636879

RESUMO

MicroRNAs (miRNAs) regulate up to one-third of all protein-coding genes including genes relevant to cancer. Variants within miRNAs have been reported to be associated with prognosis, survival, response to chemotherapy across cancer types, in vitro parameters of cell growth, and altered risks for development of cancer. Five miRNA variants have been reported to be associated with risk for development of colorectal cancer (CRC). In this study, we evaluated germline genetic variation in 1,123 miRNAs in 899 individuals with CRCs categorized by clinical subtypes and in 204 controls. The role of common miRNA variation in CRC was investigated using single variant and miRNA-level association tests. Twenty-nine miRNAs and 30 variants exhibited some marginal association with CRC in at least one subtype of CRC. Previously reported associations were not confirmed (n = 4) or could not be evaluated (n = 1). The variants noted for the CRCs with deficient mismatch repair showed little overlap with the variants noted for CRCs with proficient mismatch repair, consistent with our evolving understanding of the distinct biology underlying these two groups. © 2016 The Authors Genes, Chromosomes & Cancer Published by Wiley Periodicals, Inc.

Assuntos

Biomarcadores Tumorais/genética , Neoplasias Colorretais/genética , Variação Genética/genética , Mutação em Linhagem Germinativa/genética , MicroRNAs/genética , Estudos de Casos e Controles , Seguimentos , Humanos , Estadiamento de Neoplasias , Prognóstico , Fatores de Risco

16.

TP53 mutations, tetraploidy and homologous recombination repair defects in early stage high-grade serous ovarian cancer.

Chien, Jeremy; Sicotte, Hugues; Fan, Jian-Bing; Humphray, Sean; Cunningham, Julie M; Kalli, Kimberly R; Oberg, Ann L; Hart, Steven N; Li, Ying; Davila, Jaime I; Baheti, Saurabh; Wang, Chen; Dietmann, Sabine; Atkinson, Elizabeth J; Asmann, Yan W; Bell, Debra A; Ota, Takayo; Tarabishy, Yaman; Kuang, Rui; Bibikova, Marina; Cheetham, R Keira; Grocock, Russell J; Swisher, Elizabeth M; Peden, John; Bentley, David; Kocher, Jean-Pierre A; Kaufmann, Scott H; Hartmann, Lynn C; Shridhar, Viji; Goode, Ellen L.

Nucleic Acids Res ; 43(14): 6945-58, 2015 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-25916844

RESUMO

To determine early somatic changes in high-grade serous ovarian cancer (HGSOC), we performed whole genome sequencing on a rare collection of 16 low stage HGSOCs. The majority showed extensive structural alterations (one had an ultramutated profile), exhibited high levels of p53 immunoreactivity, and harboured a TP53 mutation, deletion or inactivation. BRCA1 and BRCA2 mutations were observed in two tumors, with nine showing evidence of a homologous recombination (HR) defect. Combined Analysis with The Cancer Genome Atlas (TCGA) indicated that low and late stage HGSOCs have similar mutation and copy number profiles. We also found evidence that deleterious TP53 mutations are the earliest events, followed by deletions or loss of heterozygosity (LOH) of chromosomes carrying TP53, BRCA1 or BRCA2. Inactivation of HR appears to be an early event, as 62.5% of tumours showed a LOH pattern suggestive of HR defects. Three tumours with the highest ploidy had little genome-wide LOH, yet one of these had a homozygous somatic frame-shift BRCA2 mutation, suggesting that some carcinomas begin as tetraploid then descend into diploidy accompanied by genome-wide LOH. Lastly, we found evidence that structural variants (SV) cluster in HGSOC, but are absent in one ultramutated tumor, providing insights into the pathogenesis of low stage HGSOC.

Assuntos

Genes p53 , Mutação , Neoplasias Ovarianas/genética , Reparo de DNA por Recombinação , Tetraploidia , Carcinoma/genética , DNA Primase/genética , Feminino , Humanos , Perda de Heterozigosidade , Taxa de Mutação

17.

Targeted alignment and end repair elimination increase alignment and methylation measure accuracy for reduced representation bisulfite sequencing data.

Baheti, Saurabh; Kanwar, Rahul; Goelzenleuchter, Meike; Kocher, Jean-Pierre A; Beutler, Andreas S; Sun, Zhifu.

BMC Genomics ; 17: 149, 2016 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-26922377

RESUMO

BACKGROUND: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform. RESULTS: TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification. CONCLUSIONS: TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.

Assuntos

Citosina , Metilação de DNA , Epigênese Genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Ilhas de CpG , Humanos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência

18.

Gene expression, methylation and neuropathology correlations at progressive supranuclear palsy risk loci.

Allen, Mariet; Burgess, Jeremy D; Ballard, Travis; Serie, Daniel; Wang, Xue; Younkin, Curtis S; Sun, Zhifu; Kouri, Naomi; Baheti, Saurabh; Wang, Chen; Carrasquillo, Minerva M; Nguyen, Thuy; Lincoln, Sarah; Malphrus, Kimberly; Murray, Melissa; Golde, Todd E; Price, Nathan D; Younkin, Steven G; Schellenberg, Gerard D; Asmann, Yan; Ordog, Tamas; Crook, Julia; Dickson, Dennis; Ertekin-Taner, Nilüfer.

Acta Neuropathol ; 132(2): 197-211, 2016 08.

Artigo em Inglês | MEDLINE | ID: mdl-27115769

RESUMO

To determine the effects of single nucleotide polymorphisms (SNPs) identified in a genome-wide association study of progressive supranuclear palsy (PSP), we tested their association with brain gene expression, CpG methylation and neuropathology. In 175 autopsied PSP subjects, we performed associations between seven PSP risk variants and temporal cortex levels of 20 genes in-cis, within ±100 kb. Methylation measures were collected using reduced representation bisulfite sequencing in 43 PSP brains. To determine whether SNP/expression associations are due to epigenetic modifications, CpG methylation levels of associated genes were tested against relevant variants. Quantitative neuropathology endophenotypes were tested for SNP associations in 422 PSP subjects. Brain levels of LRRC37A4 and ARL17B were associated with rs8070723; MOBP with rs1768208 and both ARL17A and ARL17B with rs242557. Expression associations for LRRC37A4 and MOBP were available in an additional 100 PSP subjects. Meta-analysis revealed highly significant associations for PSP risk alleles of rs8070723 and rs1768208 with higher LRRC37A4 and MOBP brain levels, respectively. Methylation levels of one CpG in the 3' region of ARL17B associated with rs242557 and rs8070723. Additionally, methylation levels of an intronic ARL17A CpG associated with rs242557 and that of an intronic MOBP CpG with rs1768208. MAPT and MOBP region risk alleles also associated with higher levels of neuropathology. Strongest associations were observed for rs242557/coiled bodies and tufted astrocytes; and for rs1768208/coiled bodies and tau threads. These findings suggest that PSP variants at MAPT and MOBP loci may confer PSP risk via influencing gene expression and tau neuropathology. MOBP, LRRC37A4, ARL17A and ARL17B warrant further assessment as candidate PSP risk genes. Our findings have implications for the mechanism of action of variants at some of the top PSP risk loci.

Assuntos

Alelos , Metilação de DNA , Expressão Gênica/fisiologia , Estudo de Associação Genômica Ampla , Paralisia Supranuclear Progressiva/genética , Paralisia Supranuclear Progressiva/metabolismo , Idoso , Idoso de 80 Anos ou mais , Feminino , Expressão Gênica/genética , Loci Gênicos , Humanos , Masculino , Neuropatologia/métodos , Polimorfismo de Nucleotídeo Único/genética , Risco , Proteínas tau/genética , Proteínas tau/metabolismo

19.

The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data.

Tang, Xiaojia; Baheti, Saurabh; Shameer, Khader; Thompson, Kevin J; Wills, Quin; Niu, Nifang; Holcomb, Ilona N; Boutet, Stephane C; Ramakrishnan, Ramesh; Kachergus, Jennifer M; Kocher, Jean-Pierre A; Weinshilboum, Richard M; Wang, Liewei; Thompson, E Aubrey; Kalari, Krishna R.

Nucleic Acids Res ; 42(22): e172, 2014 Dec 16.

Artigo em Inglês | MEDLINE | ID: mdl-25352556

RESUMO

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

Assuntos

Perfilação da Expressão Gênica/métodos , Variação Genética , Análise de Sequência de RNA/métodos , Neoplasias da Mama/genética , Linhagem Celular , Linhagem Celular Tumoral , Exoma , Feminino , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência , Análise de Célula Única , Software

20.

RVboost: RNA-seq variants prioritization using a boosting method.

Wang, Chen; Davila, Jaime I; Baheti, Saurabh; Bhagwate, Aditya V; Wang, Xue; Kocher, Jean-Pierre A; Slager, Susan L; Feldman, Andrew L; Novak, Anne J; Cerhan, James R; Thompson, E Aubrey; Asmann, Yan W.

Bioinformatics ; 30(23): 3414-6, 2014 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-25170027

RESUMO

MOTIVATION: RNA-seq has become the method of choice to quantify genes and exons, discover novel transcripts and detect fusion genes. However, reliable variant identification from RNA-seq data remains challenging because of the complexities of the transcriptome, the challenges of accurately mapping exon boundary spanning reads and the bias introduced during the sequencing library preparation. METHOD: We developed RVboost, a novel method specific for RNA variant prioritization. RVboost uses several attributes unique in the process of RNA library preparation, sequencing and RNA-seq data analyses. It uses a boosting method to train a model of 'good quality' variants using common variants from HapMap, and prioritizes and calls the RNA variants based on the trained model. We packaged RVboost in a comprehensive workflow, which integrates tools of variant calling, annotation and filtering. RESULTS: RVboost consistently outperforms the variant quality score recalibration from the Genome Analysis Tool Kit and the RNA-seq variant-calling pipeline SNPiR in 12 RNA-seq samples using ground-truth variants from paired exome sequencing data. Several RNA-seq-specific attributes were identified as critical to differentiate true and false variants, including the distance of the variant positions to exon boundaries, and the percent of the reads supporting the variant in the first six base pairs. The latter identifies false variants introduced by the random hexamer priming during the library construction. AVAILABILITY AND IMPLEMENTATION: The RVboost package is implemented to readily run in Mac or Linux environments. The software and user manual are available at http://bioinformaticstools.mayo.edu/research/rvboost/.

Assuntos

Variação Genética , Análise de Sequência de RNA/métodos , Software , Exoma , Éxons , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA