Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
NAR Genom Bioinform ; 5(4): lqad102, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38025047

RESUMO

Analyses of cell-free DNA (cfDNA) are increasingly being employed for various diagnostic and research applications. Many technologies aim to increase resolution, e.g. for detecting early-stage cancer or minimal residual disease. However, these efforts may be confounded by inherent base composition biases of cfDNA, specifically the over - and underrepresentation of guanine (G) and cytosine (C) sequences. Currently, there is no universally applicable tool to correct these effects on sequencing read-level data. Here, we present GCparagon, a two-stage algorithm for computing and correcting GC biases in cfDNA samples. In the initial step, length and GC base count parameters are determined. Here, our algorithm minimizes the inclusion of known problematic genomic regions, such as low-mappability regions, in its calculations. In the second step, GCparagon computes weights counterbalancing the distortion of cfDNA attributes (correction matrix). These fragment weights are added to a binary alignment map (BAM) file as alignment tags for individual reads. The GC correction matrix or the tagged BAM file can be used for downstream analyses. Parallel computing allows for a GC bias estimation below 1 min. We demonstrate that GCparagon vastly improves the analysis of regulatory regions, which frequently show specific GC composition patterns and will contribute to standardized cfDNA applications.

2.
Genome Res ; 32(4): 766-777, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35197310

RESUMO

Although technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human- and chimpanzee-derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.


Assuntos
Genoma Humano , Humanos
3.
PLoS One ; 15(12): e0237412, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33259518

RESUMO

Regulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.


Assuntos
Sequências Reguladoras de Ácido Nucleico/genética , Células A549 , Linhagem Celular Tumoral , Cromatina/genética , DNA/genética , Genoma/genética , Genômica/métodos , Células HeLa , Células Hep G2 , Humanos , Células K562 , Células MCF-7 , Redes Neurais de Computação , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA , Máquina de Vetores de Suporte
4.
Nat Methods ; 17(11): 1083-1091, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33046894

RESUMO

Massively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. To date, there are limited studies that systematically compare differences in MPRA design. Here, we screen a library of 2,440 candidate liver enhancers and controls for regulatory activity in HepG2 cells using nine different MPRA designs. We identify subtle but significant differences that correlate with epigenetic and sequence-level features, as well as differences in dynamic range and reproducibility. We also validate that enhancer activity is largely independent of orientation, at least for our library and designs. Finally, we assemble and test the same enhancers as 192-mers, 354-mers and 678-mers and observe sizable differences. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements and to a lesser degree the precise assay, influence MPRA results.


Assuntos
Biblioteca Gênica , Genes Reporter , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Elementos Facilitadores Genéticos , Células Hep G2 , Humanos , Reprodutibilidade dos Testes , Fatores de Transcrição/genética
5.
J Inherit Metab Dis ; 42(5): 993-997, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30945312

RESUMO

The translocon-associated protein (TRAP) complex facilitates the translocation of proteins across the endoplasmic reticulum membrane and associates with the oligosaccharyl transferase (OST) complex to maintain proper glycosylation of nascent polypeptides. Pathogenic variants in either complex cause a group of rare genetic disorders termed, congenital disorders of glycosylation (CDG). We report an individual who presented with severe intellectual and developmental disabilities and sensorineural deafness with an unsolved type I CDG, and sought to identify the underlying genetic basis. Exome sequencing identified a novel homozygous variant c.278_281delAGGA [p.Glu93Valfs*7] in the signal sequence receptor 3 (SSR3) subunit of the TRAP complex. Biochemical studies in patient fibroblasts showed the variant destabilized the TRAP complex with a complete loss of SSR3 protein and partial loss of SSR1 and SSR4. Importantly, all subunit levels were corrected by expression of wild-type SSR3. Abnormal glycosylation status in fibroblasts was confirmed using two markers proteins, GP130 and ICAM1. Our findings confirm mutations in SSR3 cause a novel CDG. A novel frameshift variant in the translocon associated protein, SSR3, disrupts the stability of the TRAP complex and causes a novel Congenital Disorder of Glycosylation.


Assuntos
Proteínas de Ligação ao Cálcio/genética , Defeitos Congênitos da Glicosilação/genética , Deficiências do Desenvolvimento/etiologia , Glicoproteínas de Membrana/genética , Mutação , Receptores Citoplasmáticos e Nucleares/genética , Receptores de Peptídeos/genética , Pré-Escolar , Defeitos Congênitos da Glicosilação/patologia , Exoma , Glicosilação , Homozigoto , Humanos , Masculino
6.
Am J Hum Genet ; 104(1): 35-44, 2019 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-30554721

RESUMO

Baratela-Scott syndrome (BSS) is a rare, autosomal-recessive disorder characterized by short stature, facial dysmorphisms, developmental delay, and skeletal dysplasia caused by pathogenic variants in XYLT1. We report clinical and molecular investigation of 10 families (12 individuals) with BSS. Standard sequencing methods identified biallelic pathogenic variants in XYLT1 in only two families. Of the remaining cohort, two probands had no variants and six probands had only a single variant, including four with a heterozygous 3.1 Mb 16p13 deletion encompassing XYLT1 and two with a heterozygous truncating variant. Bisulfite sequencing revealed aberrant hypermethylation in exon 1 of XYLT1, always in trans with the sequence variant or deletion when present; both alleles were methylated in those with no identified variant. Expression of the methylated XYLT1 allele was severely reduced in fibroblasts from two probands. Southern blot studies combined with repeat expansion analysis of genome sequence data showed that the hypermethylation is associated with expansion of a GGC repeat in the XYLT1 promoter region that is not present in the reference genome, confirming that BSS is a trinucleotide repeat expansion disorder. The hypermethylated allele accounts for 50% of disease alleles in our cohort and is not present in 130 control subjects. Our study highlights the importance of investigating non-sequence-based alterations, including epigenetic changes, to identify the missing heritability in genetic disorders.


Assuntos
Anormalidades Múltiplas/genética , Metilação de DNA/genética , Epigênese Genética/genética , Éxons/genética , Mutação , Pentosiltransferases/genética , Expansão das Repetições de Trinucleotídeos/genética , Alelos , Southern Blotting , Estudos de Coortes , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Linhagem , Sulfitos/metabolismo , Síndrome , UDP Xilose-Proteína Xilosiltransferase
7.
Nat Genet ; 50(6): 874-882, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29785012

RESUMO

Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.


Assuntos
Mutação de Sentido Incorreto , Aminoácidos/genética , Linhagem Celular , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , PTEN Fosfo-Hidrolase/genética , Análise de Sequência de DNA/métodos
8.
JCI Insight ; 1(9)2016 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-27631024

RESUMO

Mosaicism is increasingly recognized as a cause of developmental disorders with the advent of next-generation sequencing (NGS). Mosaic mutations of PIK3CA have been associated with the widest spectrum of phenotypes associated with overgrowth and vascular malformations. We performed targeted NGS using 2 independent deep-coverage methods that utilize molecular inversion probes and amplicon sequencing in a cohort of 241 samples from 181 individuals with brain and/or body overgrowth. We identified PIK3CA mutations in 60 individuals. Several other individuals (n = 12) were identified separately to have mutations in PIK3CA by clinical targeted-panel testing (n = 6), whole-exome sequencing (n = 5), or Sanger sequencing (n = 1). Based on the clinical and molecular features, this cohort segregated into three distinct groups: (a) severe focal overgrowth due to low-level but highly activating (hotspot) mutations, (b) predominantly brain overgrowth and less severe somatic overgrowth due to less-activating mutations, and (c) intermediate phenotypes (capillary malformations with overgrowth) with intermediately activating mutations. Sixteen of 29 PIK3CA mutations were novel. We also identified constitutional PIK3CA mutations in 10 patients. Our molecular data, combined with review of the literature, show that PIK3CA-related overgrowth disorders comprise a discontinuous spectrum of disorders that correlate with the severity and distribution of mutations.


Assuntos
Classe I de Fosfatidilinositol 3-Quinases/genética , Malformações do Desenvolvimento Cortical/genética , Mosaicismo , Malformações Vasculares/genética , Feminino , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Lactente , Masculino , Mutação , Fenótipo , Distribuição Tecidual
9.
Am J Med Genet A ; 170(12): 3165-3171, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27480077

RESUMO

Increasing numbers of congenital disorders of glycosylation (CDG) have been reported recently resulting in an expansion of the phenotypes associated with this group of disorders. SRD5A3 codes for polyprenol reductase which converts polyprenol to dolichol. This is a major pathway for dolichol biosynthesis for N-glycosylation, O-mannosylation, C-mannosylation, and GPI anchor synthesis. We present the features of five individuals (three children and two adults) with mutations in SRD5A3 focusing on the variable eye and skin involvement. We compare that to 13 affected individuals from the literature including five adults allowing us to delineate the features that may develop over time with this disorder including kyphosis, retinitis pigmentosa, and cataracts. © 2016 Wiley Periodicals, Inc.


Assuntos
3-Oxo-5-alfa-Esteroide 4-Desidrogenase/genética , Defeitos Congênitos da Glicosilação/genética , Olho/fisiopatologia , Proteínas de Membrana/genética , Pele/fisiopatologia , Adulto , Criança , Defeitos Congênitos da Glicosilação/fisiopatologia , Dolicóis/metabolismo , Feminino , Glicosilação , Homozigoto , Humanos , Masculino , Mutação , Fenótipo , Tretinoína/análogos & derivados , Tretinoína/metabolismo
10.
Cell ; 164(1-2): 57-68, 2016 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-26771485

RESUMO

Nucleosome positioning varies between cell types. By deep sequencing cell-free DNA (cfDNA), isolated from circulating blood plasma, we generated maps of genome-wide in vivo nucleosome occupancy and found that short cfDNA fragments harbor footprints of transcription factors. The cfDNA nucleosome occupancies correlate well with the nuclear architecture, gene structure, and expression observed in cells, suggesting that they could inform the cell type of origin. Nucleosome spacing inferred from cfDNA in healthy individuals correlates most strongly with epigenetic features of lymphoid and myeloid cells, consistent with hematopoietic cell death as the normal source of cfDNA. We build on this observation to show how nucleosome footprints can be used to infer cell types contributing to cfDNA in pathological states such as cancer. Since this strategy does not rely on genetic differences to distinguish between contributing tissues, it may enable the noninvasive monitoring of a much broader set of clinical conditions than currently possible.


Assuntos
DNA/química , Nucleossomos/química , Especificidade de Órgãos , Fator de Ligação a CCCTC , Linhagem Celular , Montagem e Desmontagem da Cromatina , DNA/metabolismo , Pegada de DNA , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Neoplasias/genética , Proteínas Repressoras/metabolismo , Análise de Sequência de DNA
11.
Cancer Discov ; 5(2): 135-42, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25472942

RESUMO

UNLABELLED: Deficiency in BRCA-dependent DNA interstrand crosslink (ICL) repair is intimately connected to breast cancer susceptibility and to the rare developmental syndrome Fanconi anemia. Bona fide Fanconi anemia proteins, BRCA2 (FANCD1), PALB2 (FANCN), and BRIP1 (FANCJ), interact with BRCA1 during ICL repair. However, the lack of detailed phenotypic and cellular characterization of a patient with biallelic BRCA1 mutations has precluded assignment of BRCA1 as a definitive Fanconi anemia susceptibility gene. Here, we report the presence of biallelic BRCA1 mutations in a woman with multiple congenital anomalies consistent with a Fanconi anemia-like disorder and breast cancer at age 23. Patient cells exhibited deficiency in BRCA1 and RAD51 localization to DNA-damage sites, combined with radial chromosome formation and hypersensitivity to ICL-inducing agents. Restoration of these functions was achieved by ectopic introduction of a BRCA1 transgene. These observations provide evidence in support of BRCA1 as a new Fanconi anemia gene (FANCS). SIGNIFICANCE: We establish that biallelic BRCA1 mutations cause a distinct FA-S, which has implications for risk counselling in families where both parents harbor BRCA1 mutations. The genetic basis of hereditary cancer susceptibility syndromes provides diagnostic information, insights into treatment strategies, and more accurate recurrence risk counseling to families.


Assuntos
Neoplasias da Mama/genética , Anemia de Fanconi/genética , Genes BRCA1 , Mutação , Adulto , Alelos , Proteína BRCA1/genética , Sequência de Bases , Proteínas de Grupos de Complementação da Anemia de Fanconi/genética , Feminino , Predisposição Genética para Doença , Humanos , Adulto Jovem
12.
N Engl J Med ; 371(8): 733-43, 2014 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-25140959

RESUMO

BACKGROUND: Although there is increasing recognition of the role of somatic mutations in genetic disorders, the prevalence of somatic mutations in neurodevelopmental disease and the optimal techniques to detect somatic mosaicism have not been systematically evaluated. METHODS: Using a customized panel of known and candidate genes associated with brain malformations, we applied targeted high-coverage sequencing (depth, ≥200×) to leukocyte-derived DNA samples from 158 persons with brain malformations, including the double-cortex syndrome (subcortical band heterotopia, 30 persons), polymicrogyria with megalencephaly (20), periventricular nodular heterotopia (61), and pachygyria (47). We validated candidate mutations with the use of Sanger sequencing and, for variants present at unequal read depths, subcloning followed by colony sequencing. RESULTS: Validated, causal mutations were found in 27 persons (17%; range, 10 to 30% for each phenotype). Mutations were somatic in 8 of the 27 (30%), predominantly in persons with the double-cortex syndrome (in whom we found mutations in DCX and LIS1), persons with periventricular nodular heterotopia (FLNA), and persons with pachygyria (TUBB2B). Of the somatic mutations we detected, 5 (63%) were undetectable with the use of traditional Sanger sequencing but were validated through subcloning and subsequent sequencing of the subcloned DNA. We found potentially causal mutations in the candidate genes DYNC1H1, KIF5C, and other kinesin genes in persons with pachygyria. CONCLUSIONS: Targeted sequencing was found to be useful for detecting somatic mutations in patients with brain malformations. High-coverage sequencing panels provide an important complement to whole-exome and whole-genome sequencing in the evaluation of somatic mutations in neuropsychiatric disease. (Funded by the National Institute of Neurological Disorders and Stroke and others.).


Assuntos
Córtex Cerebral/anormalidades , Análise Mutacional de DNA/métodos , Malformações do Desenvolvimento Cortical/genética , Mutação , Lissencefalias Clássicas e Heterotopias Subcorticais em Banda/genética , Humanos , Lisencefalia/genética , Imageamento por Ressonância Magnética , Malformações do Desenvolvimento Cortical/patologia , Heterotopia Nodular Periventricular/genética
13.
Nucleic Acids Res ; 40(1): e3, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22021376

RESUMO

Due to the increasing throughput of current DNA sequencing instruments, sample multiplexing is necessary for making economical use of available sequencing capacities. A widely used multiplexing strategy for the Illumina Genome Analyzer utilizes sample-specific indexes, which are embedded in one of the library adapters. However, this and similar multiplex approaches come with a risk of sample misidentification. By introducing indexes into both library adapters (double indexing), we have developed a method that reveals the rate of sample misidentification within current multiplex sequencing experiments. With ~0.3% these rates are orders of magnitude higher than expected and may severely confound applications in cancer genomics and other fields requiring accurate detection of rare variants. We identified the occurrence of mixed clusters on the flow as the predominant source of error. The accuracy of sample identification is further impaired if indexed oligonucleotides are cross-contaminated or if indexed libraries are amplified in bulk. Double-indexing eliminates these problems and increases both the scope and accuracy of multiplex sequencing on the Illumina platform.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Biblioteca Gênica , Reprodutibilidade dos Testes
14.
Mol Syst Biol ; 7: 548, 2011 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-22068331

RESUMO

While the number and identity of proteins expressed in a single human cell type is currently unknown, this fundamental question can be addressed by advanced mass spectrometry (MS)-based proteomics. Online liquid chromatography coupled to high-resolution MS and MS/MS yielded 166 420 peptides with unique amino-acid sequence from HeLa cells. These peptides identified 10 255 different human proteins encoded by 9207 human genes, providing a lower limit on the proteome in this cancer cell line. Deep transcriptome sequencing revealed transcripts for nearly all detected proteins. We calculate copy numbers for the expressed proteins and show that the abundances of > 90% of them are within a factor 60 of the median protein expression level. Comparisons of the proteome and the transcriptome, and analysis of protein complex databases and GO categories, suggest that we achieved deep coverage of the functional transcriptome and the proteome of a single cell type.


Assuntos
Perfilação da Expressão Gênica/métodos , Proteoma , Proteômica/métodos , Transcriptoma , Sequência de Bases , Linhagem Celular Tumoral , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Espectrometria de Massas/métodos , Modelos Biológicos , Proteoma/genética , Proteoma/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA