Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
medRxiv ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38496498

ABSTRACT

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

2.
Am J Med Genet A ; 194(6): e63548, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38264805

ABSTRACT

Pathogenic PHF21A variation causes PHF21A-related neurodevelopmental disorders (NDDs). Although amorphic alleles, including haploinsufficiency, have been established as a disease mechanism, increasing evidence suggests that missense variants as well as frameshift variants extending the BHC80 carboxyl terminus also cause disease. Expanding on these, we report a proposita with intellectual disability and overgrowth and a novel de novo heterozygous PHF21A splice variant (NM_001352027.3:c.[153+1G>C];[=]) causing skipping of exon 6, which encodes an in-frame BHC80 deletion (p.(Asn30_Gln51del)). This deletion disrupts a predicted leucine zipper domain and implicates this domain in BHC80 function and as a target of variation causing PHF21A-related NDDs. This extension of understanding emphasizes the application of RNA analysis in precision genomic medicine practice.


Subject(s)
Intellectual Disability , Neurodevelopmental Disorders , RNA Splicing , Female , Humans , Alleles , Exons/genetics , Intellectual Disability/genetics , Intellectual Disability/pathology , Neurodevelopmental Disorders/genetics , Neurodevelopmental Disorders/pathology , RNA Splicing/genetics , Sequence Analysis, RNA , Child
3.
Am J Med Genet A ; 191(8): 2219-2224, 2023 08.
Article in English | MEDLINE | ID: mdl-37196051

ABSTRACT

Tandem splice acceptors (NAGNn AG) are a common mechanism of alternative splicing, but variants that are likely to generate or to disrupt tandem splice sites have rarely been reported as disease causing. We identify a pathogenic intron 23 CLTC variant (NM_004859.4:c.[3766-13_3766-5del];[=]) in a propositus with intellectual disability and behavioral problems. By RNAseq analysis of peripheral blood mRNA, this variant generates transcripts using cryptic proximal splice acceptors (NM_004859.4: r.3765_3766insTTCACAGAAAGGAACTAG, and NM_004859.4:r.3765_3766insAAAGGAACTAG). Given that the propositus expresses 38% the level of CLTC transcripts as unaffected controls, these variant transcripts, which encode premature termination codons, likely undergo nonsense mediated mRNA decay (NMD). This is the first functional evidence for CLTC haploinsufficiency as a cause of CLTC-related disorder and the first evidence that the generation of tandem alternative splice sites causes CLTC-related disorder. We suggest that variants creating tandem alternative splice sites are an underreported disease mechanism and that transcriptome-level analysis should be routinely pursued to define the pathogenicity of such variants.


Subject(s)
Haploinsufficiency , RNA Splice Sites , Humans , RNA Splice Sites/genetics , Haploinsufficiency/genetics , Alternative Splicing/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Mutation , Clathrin Heavy Chains/genetics
4.
J Exp Med ; 220(5)2023 05 01.
Article in English | MEDLINE | ID: mdl-36884218

ABSTRACT

STAT6 (signal transducer and activator of transcription 6) is a transcription factor that plays a central role in the pathophysiology of allergic inflammation. We have identified 16 patients from 10 families spanning three continents with a profound phenotype of early-life onset allergic immune dysregulation, widespread treatment-resistant atopic dermatitis, hypereosinophilia with esosinophilic gastrointestinal disease, asthma, elevated serum IgE, IgE-mediated food allergies, and anaphylaxis. The cases were either sporadic (seven kindreds) or followed an autosomal dominant inheritance pattern (three kindreds). All patients carried monoallelic rare variants in STAT6 and functional studies established their gain-of-function (GOF) phenotype with sustained STAT6 phosphorylation, increased STAT6 target gene expression, and TH2 skewing. Precision treatment with the anti-IL-4Rα antibody, dupilumab, was highly effective improving both clinical manifestations and immunological biomarkers. This study identifies heterozygous GOF variants in STAT6 as a novel autosomal dominant allergic disorder. We anticipate that our discovery of multiple kindreds with germline STAT6 GOF variants will facilitate the recognition of more affected individuals and the full definition of this new primary atopic disorder.


Subject(s)
Asthma , Food Hypersensitivity , Humans , STAT6 Transcription Factor , Gain of Function Mutation , Immunoglobulin E/genetics
5.
Genome Med ; 14(1): 84, 2022 08 11.
Article in English | MEDLINE | ID: mdl-35948990

ABSTRACT

BACKGROUND: Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads. RESULTS: We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions and FlipBook, a companion image viewer designed for manual curation of large collections of REViewer images. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in short tandem repeat research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR. CONCLUSIONS: Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer and FlipBook to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions. REViewer and FlipBook are available under open-source licenses at https://github.com/illumina/REViewer and https://github.com/broadinstitute/flipbook respectively.


Subject(s)
Amyotrophic Lateral Sclerosis , Tandem Repeat Sequences , Alleles , Amyotrophic Lateral Sclerosis/genetics , Exome , Fragile X Mental Retardation Protein/genetics , Haplotypes , High-Throughput Nucleotide Sequencing/methods , Humans
6.
Am J Med Genet A ; 188(10): 3089-3095, 2022 10.
Article in English | MEDLINE | ID: mdl-35946377

ABSTRACT

Alternative use of short distance tandem sites such as NAGNn AG are a common mechanism of alternative splicing; however, single nucleotide variants are rarely reported as likely to generate or to disrupt tandem splice sites. We identify a pathogenic intron 5 STK11 variant (NM_000455.4:c.[735-6A>G];[=]) segregating with the mucocutaneous features but not the hamartomatous polyps of Peutz-Jeghers syndrome in two individuals. By RNAseq analysis of peripheral blood mRNA, this variant was shown to generate a novel and preferentially used tandem proximal splice acceptor (AAGTGAAG). The variant transcript (NM_000455.4:c.734_734 + 1insTGAAG), which encodes a frameshift (p.[Tyr246Glufs*43]) constituted 36%-43% of STK11 transcripts suggesting partial escape from nonsense mediated mRNA decay and translation of a truncated protein. A review of the ClinVar database identified other similar variants. We suggest that nucleotide changes creating or disrupting tandem alternative splice sites are a pertinent disease mechanism and require contextualization for clinical reporting. Additionally, we hypothesize that some pathogenic STK11 variants cause an attenuated phenotype.


Subject(s)
Peutz-Jeghers Syndrome , AMP-Activated Protein Kinase Kinases , Alternative Splicing , Codon, Nonsense , Humans , Nucleotides , Peutz-Jeghers Syndrome/genetics , Peutz-Jeghers Syndrome/pathology
7.
HGG Adv ; 3(3): 100108, 2022 Jul 14.
Article in English | MEDLINE | ID: mdl-35599849

ABSTRACT

Genome-wide sequencing (GWS) is a standard of care for diagnosis of suspected genetic disorders, but the proportion of patients found to have pathogenic or likely pathogenic variants ranges from less than 30% to more than 60% in reported studies. It has been suggested that the diagnostic rate can be improved by interpreting genomic variants in the context of each affected individual's full clinical picture and by regular follow-up and reinterpretation of GWS laboratory results. Trio exome sequencing was performed in 415 families and trio genome sequencing in 85 families in the CAUSES study. The variants observed were interpreted by a multidisciplinary team including laboratory geneticists, bioinformaticians, clinical geneticists, genetic counselors, pediatric subspecialists, and the referring physician, and independently by a clinical laboratory using standard American College of Medical Genetics and Genomics (ACMG) criteria. Individuals were followed for an average of 5.1 years after testing, with clinical reassessment and reinterpretation of the GWS results as necessary. The multidisciplinary team established a diagnosis of genetic disease in 43.0% of the families at the time of initial GWS interpretation, and longitudinal follow-up and reinterpretation of GWS results produced new diagnoses in 17.2% of families whose initial GWS interpretation was uninformative or uncertain. Reinterpretation also resulted in rescinding a diagnosis in four families (1.9%). Of the families studied, 33.6% had ACMG pathogenic or likely pathogenic variants related to the clinical indication. Close collaboration among clinical geneticists, genetic counselors, laboratory geneticists, bioinformaticians, and individuals' primary physicians, with ongoing follow-up, reanalysis, and reinterpretation over time, can improve the clinical value of GWS.

8.
J Med Genet ; 59(1): 46-55, 2022 01.
Article in English | MEDLINE | ID: mdl-33257509

ABSTRACT

Strabismus is a common condition, affecting 1%-4% of individuals. Isolated strabismus has been studied in families with Mendelian inheritance patterns. Despite the identification of multiple loci via linkage analyses, no specific genes have been identified from these studies. The current study is based on a seven-generation family with isolated strabismus inherited in an autosomal dominant manner. A total of 13 individuals from a common ancestor have been included for linkage analysis. Among these, nine are affected and four are unaffected. A single linkage signal has been identified at an 8.5 Mb region of chromosome 14q12 with a multipoint LOD (logarithm of the odds) score of 4.69. Disruption of this locus is known to cause FOXG1 syndrome (or congenital Rett syndrome; OMIM #613454 and *164874), in which 84% of affected individuals present with strabismus. With the incorporation of next-generation sequencing and in-depth bioinformatic analyses, a 4 bp non-coding deletion was prioritised as the top candidate for the observed strabismus phenotype. The deletion is predicted to disrupt regulation of FOXG1, which encodes a transcription factor of the Forkhead family. Suggestive of an autoregulation effect, the disrupted sequence matches the consensus FOXG1 and Forkhead family transcription factor binding site and has been observed in previous ChIP-seq studies to be bound by Foxg1 in early mouse brain development. Future study of this specific deletion may shed light on the regulation of FOXG1 expression and may enhance our understanding of the mechanisms contributing to strabismus and FOXG1 syndrome.


Subject(s)
Forkhead Transcription Factors/genetics , Nerve Tissue Proteins/genetics , Rett Syndrome/genetics , Sequence Deletion , Strabismus/genetics , Adolescent , Aged , Aged, 80 and over , Animals , Genetic Linkage , High-Throughput Nucleotide Sequencing , Humans , Middle Aged , Pedigree , Exome Sequencing , Whole Genome Sequencing , Young Adult
9.
Mol Genet Metab Rep ; 27: 100761, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33996490

ABSTRACT

Guanidinoacetate methyltransferase (GAMT) deficiency is a creatine deficiency disorder and an inborn error of metabolism presenting with progressive intellectual and neurological deterioration. As most cases are identified and treated in early childhood, adult phenotypes that can help in understanding the natural history of the disorder are rare. We describe two adult cases of GAMT deficiency from a consanguineous family in Pakistan that presented with a history of global developmental delay, cognitive impairments, excessive drooling, behavioral abnormalities, contractures and apparent bone deformities initially presumed to be the reason for abnormal gait. Exome sequencing identified a homozygous nonsense variant in GAMT: NM_000156.5:c.134G>A (p.Trp45*). We also performed a literature review and compiled the genetic and clinical characteristics of all adult cases of GAMT deficiency reported to date. When compared to the adult cases previously reported, the musculoskeletal phenotype and the rapidly progressive nature of neurological and motor decline seen in our patients is striking. This study presents an opportunity to gain insights into the adult presentation of GAMT deficiency and highlights the need for in-depth evaluation and reporting of clinical features to expand our understanding of the phenotypic spectrum.

10.
PLoS Comput Biol ; 17(3): e1008815, 2021 03.
Article in English | MEDLINE | ID: mdl-33750951

ABSTRACT

Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a "reverse mapping" approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper's utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample's population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Genome, Human/genetics , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods
11.
Allergy Asthma Clin Immunol ; 17(1): 9, 2021 Jan 14.
Article in English | MEDLINE | ID: mdl-33446255

ABSTRACT

X-linked hypohidrotic ectodermal dysplasia (XLHED) is the most common form of ectodermal dysplasia. Clinical and genetic heterogeneity between different ectodermal dysplasia types and evidence of incomplete penetrance and variable expressivity increase the potential for misdiagnosis. We describe a family with X-linked hypohidrotic ectodermal dysplasia (XLHED) presenting with variable expressivity of symptoms between affected siblings. In addition to the classical signs of hypohidrosis, hypotrichosis and hypodontia, the index patient-a 5 year old boy, also presented with a severe atopy phenotype that was not observed in the other two affected brothers. Exome sequencing in the index and the mother identified a pathogenic nonsense variant in EDA (NM_001399.4: c.766 C>T; p. Gln256Ter). This study highlights how exome sequencing was crucial in establishing a precise molecular diagnosis of XLHED by enabling us to rule out other differential diagnoses including NEMO deficiency syndrome, that was initially presented as a clinical diagnosis to the family.

12.
Hum Mutat ; 42(4): 346-358, 2021 04.
Article in English | MEDLINE | ID: mdl-33368787

ABSTRACT

Mendelian rare genetic diseases affect 5%-10% of the population, and with over 5300 genes responsible for ∼7000 different diseases, they are challenging to diagnose. The use of whole-genome sequencing (WGS) has bolstered the diagnosis rate significantly. The effective use of WGS relies on the ability to identify the disrupted gene responsible for disease phenotypes. This process involves genomic variant calling and prioritization, and is the beneficiary of improvements to sequencing technology, variant calling approaches, and increased capacity to prioritize genomic variants with potential pathogenicity. As analysis pipelines continue to improve, careful testing of their efficacy is paramount. However, real-life cases typically emerge anecdotally, and utilization of clinically sensitive and identifiable data for testing pipeline improvements is regulated and limiting. We identified the need for a gene-based variant simulation framework that can create mock rare disease scenarios, utilizing known pathogenic variants or through the creation of novel gene-disrupting variants. To fill this need, we present GeneBreaker, a tool that creates synthetic rare disease cases with utility for benchmarking variant calling approaches, testing the efficacy of variant prioritization, and as an educational mechanism for training diagnostic practitioners in the expanding field of genomic medicine. GeneBreaker is freely available at http://GeneBreaker.cmmt.ubc.ca.


Subject(s)
Genomics , Rare Diseases , Computer Simulation , High-Throughput Nucleotide Sequencing , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/genetics , Whole Genome Sequencing
14.
PLoS One ; 15(10): e0240253, 2020.
Article in English | MEDLINE | ID: mdl-33095786

ABSTRACT

We have been using the Inbred Long- and Short-Sleep mouse strains (ILS, ISS) and a recombinant inbred panel derived from them, the LXS, to investigate the genetic underpinnings of acute ethanol tolerance which is considered to be a risk factor for alcohol use disorders (AUDs). Here, we have used RNA-seq to examine the transcriptome of whole brain in 40 of the LXS strains 8 hours after a saline or ethanol "pretreatment" as in previous behavioral studies. Approximately 1/3 of the 14,184 expressed genes were significantly heritable and many were unique to the pretreatment. Several thousand cis- and trans-eQTLs were mapped; a portion of these also were unique to pretreatment. Ethanol pretreatment caused differential expression (DE) of 1,230 genes. Gene Ontology (GO) enrichment analysis suggested involvement in numerous biological processes including astrocyte differentiation, histone acetylation, mRNA splicing, and neuron projection development. Genetic correlation analysis identified hundreds of genes that were correlated to the behaviors. GO analysis indicated that these genes are involved in gene expression, chromosome organization, and protein transport, among others. The expression profiles of the DE genes and genes correlated to AFT in the ethanol pretreatment group (AFT-Et) were found to be similar to profiles of HDAC inhibitors. Hdac1, a cis-regulated gene that is located at the peak of a previously mapped QTL for AFT-Et, was correlated to 437 genes, most of which were also correlated to AFT-Et. GO analysis of these genes identified several enriched biological process terms including neuron-neuron synaptic transmission and potassium transport. In summary, the results suggest widespread genetic effects on gene expression, including effects that are pretreatment-specific. A number of candidate genes and biological functions were identified that could be mediating the behavioral responses. The most prominent of these was Hdac1 which may be regulating genes associated with glutamatergic signaling and potassium conductance.


Subject(s)
Drug Tolerance/genetics , Ethanol/pharmacology , Alcoholism , Animals , Brain/drug effects , Brain/metabolism , Chromosome Mapping , Female , Genotype , Male , Mice , Mice, Inbred Strains , Quantitative Trait Loci/genetics
15.
Front Cell Dev Biol ; 8: 520, 2020.
Article in English | MEDLINE | ID: mdl-32671069

ABSTRACT

X-linked adrenoleukodystrophy (ALD) is a peroxisomal metabolic disorder with a highly complex clinical presentation. ALD is caused by mutations in the ABCD1 gene, and is characterized by the accumulation of very long-chain fatty acids in plasma and tissues. Disease-causing mutations are 'loss of function' mutations, with no prognostic value with respect to the clinical outcome of an individual. All male patients with ALD develop spinal cord disease and a peripheral neuropathy in adulthood, although age of onset is highly variable. However, the lifetime prevalence to develop progressive white matter lesions, termed cerebral ALD (CALD), is only about 60%. Early identification of transition to CALD is critical since it can be halted by allogeneic hematopoietic stem cell therapy only in an early stage. The primary goal of this study is to identify molecular markers which may be prognostic of cerebral demyelination from a simple blood sample, with the hope that blood-based assays can replace the current protocols for diagnosis. We collected six well-characterized brother pairs affected by ALD and discordant for the presence of CALD and performed multi-omic profiling of blood samples including genome, epigenome, transcriptome, metabolome/lipidome, and proteome profiling. In our analysis we identify discordant genomic alleles present across all families as well as differentially abundant molecular features across the omics technologies. The analysis was focused on univariate modeling to discriminate the two phenotypic groups, but was unable to identify statistically significant candidate molecular markers. Our study highlights the issues caused by a large amount of inter-individual variation, and supports the emerging hypothesis that cerebral demyelination is a complex mix of environmental factors and/or heterogeneous genomic alleles. We confirm previous observations about the role of immune response, specifically auto-immunity and the potential role of PFN1 protein overabundance in CALD in a subset of the families. We envision our methodology as well as dataset has utility to the field for reproducing previous or enabling future modifier investigations.

16.
NPJ Genom Med ; 5: 25, 2020.
Article in English | MEDLINE | ID: mdl-32637154

ABSTRACT

Many inborn errors of metabolism (IEMs) are amenable to treatment, therefore early diagnosis is imperative. Whole-exome sequencing (WES) variant prioritization coupled with phenotype-guided clinical and bioinformatics expertise is typically used to identify disease-causing variants; however, it can be challenging to identify the causal candidate gene when a large number of rare and potentially pathogenic variants are detected. Here, we present a network-based approach, metPropagate, that uses untargeted metabolomics (UM) data from a single patient and a group of controls to prioritize candidate genes in patients with suspected IEMs. We validate metPropagate on 107 patients with IEMs diagnosed in Miller et al. (2015) and 11 patients with both CNS and metabolic abnormalities. The metPropagate method ranks candidate genes by label propagation, a graph-smoothing algorithm that considers each gene's metabolic perturbation in addition to the network of interactions between neighbors. metPropagate was able to prioritize at least one causative gene in the top 20th percentile of candidate genes for 92% of patients with known IEMs. Applied to patients with suspected neurometabolic disease, metPropagate placed at least one causative gene in the top 20th percentile in 9/11 patients, and ranked the causative gene more highly than Exomiser's phenotype-based ranking in 6/11 patients. Interestingly, ranking by a weighted combination of metPropagate and Exomiser scores resulted in improved prioritization. The results of this study indicate that network-based analysis of UM data can provide an additional mode of evidence to prioritize causal genes in patients with suspected IEMs.

17.
Genome Biol ; 21(1): 102, 2020 04 28.
Article in English | MEDLINE | ID: mdl-32345345

ABSTRACT

Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.


Subject(s)
DNA Repeat Expansion , Software , Case-Control Studies , Fragile X Syndrome/genetics , Friedreich Ataxia/genetics , High-Throughput Nucleotide Sequencing , Humans , Huntington Disease/genetics , Microsatellite Repeats , Myotonic Dystrophy/genetics , Whole Genome Sequencing
18.
Nucleic Acids Res ; 48(D1): D87-D92, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31701148

ABSTRACT

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.


Subject(s)
Binding Sites , Computational Biology , Databases, Genetic , Software , Transcription Factors , Animals , Genomics/methods , Protein Binding , Transcription Factors/metabolism , User-Computer Interface , Web Browser
19.
F1000Res ; 8: 1221, 2019.
Article in English | MEDLINE | ID: mdl-31602299

ABSTRACT

Researchers in the life sciences are increasingly faced with the task of obtaining compute resources and training to analyze large, high-throughput technology generated datasets. As demand for compute resources has grown, high performance computing (HPC) systems have been implemented by research organizations and international consortiums to support academic researchers. However, life science researchers lack effective time-of-need training resources for utilization of these systems. Current training options have drawbacks that inhibit the effective training of researchers without experience in computational analysis. We identified the need for flexible, centrally-organized, easily accessible, interactive, and compute resource specific training for academic HPC use.  In our delivery of a modular workshop series, we provided foundational training to a group of researchers in a coordinated manner, allowing them to further pursue additional training and analysis on compute resources available to them. Efficacy measures indicate that the material was effectively delivered to a broad audience in a short time period, including both virtual and on-site students. The practical approach to catalyze academic HPC use is amenable to diverse systems worldwide.


Subject(s)
Genomics , Research Personnel , Humans
20.
Gigascience ; 8(7)2019 07 01.
Article in English | MEDLINE | ID: mdl-31289836

ABSTRACT

BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. RESULTS: Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). CONCLUSIONS: Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.


Subject(s)
Chromosomes, Human, X/genetics , Chromosomes, Human, Y/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Sequence Homology, Nucleic Acid , Artifacts , Contig Mapping/methods , Contig Mapping/standards , Female , High-Throughput Nucleotide Sequencing/standards , Humans , Male , Sequence Alignment/methods , Sequence Alignment/standards , Sequence Analysis, DNA/standards
SELECTION OF CITATIONS
SEARCH DETAIL
...