Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nucleic Acids Res ; 49(12): 6687-6701, 2021 07 09.
Article in English | MEDLINE | ID: mdl-34157124

ABSTRACT

Nucleic acid microarrays are the only tools that can supply very large oligonucleotide libraries, cornerstones of the nascent fields of de novo gene assembly and DNA data storage. Although the chemical synthesis of oligonucleotides is highly developed and robust, it is not error free, requiring the design of methods that can correct or compensate for errors, or select for high-fidelity oligomers. However, outside the realm of array manufacturers, little is known about the sources of errors and their extent. In this study, we look at the error rate of DNA libraries synthesized by photolithography and dissect the proportion of deletion, insertion and substitution errors. We find that the deletion rate is governed by the photolysis yield. We identify the most important substitution error and correlate it to phosphoramidite coupling. Besides synthetic failures originating from the coupling cycle, we uncover the role of imperfections and limitations related to optics, highlight the importance of absorbing UV light to avoid internal reflections and chart the dependence of error rate on both position on the array and position within individual oligonucleotides. Being able to precisely quantify all types of errors will allow for optimal choice of fabrication parameters and array design.


Subject(s)
Gene Library , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Light , Nucleotides/analysis , Oligonucleotide Array Sequence Analysis , Photochemical Processes
2.
Nature ; 538(7624): 201-206, 2016 Oct 13.
Article in English | MEDLINE | ID: mdl-27654912

ABSTRACT

Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.


Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Genomics , Mutation Rate , Phylogeny , Racial Groups/genetics , Animals , Australia , Black People/genetics , Datasets as Topic , Genetics, Population , History, Ancient , Human Migration/history , Humans , Native Hawaiian or Other Pacific Islander/genetics , Neanderthals/genetics , New Guinea , Sequence Analysis, DNA , Species Specificity , Time Factors
3.
PLoS Genet ; 15(5): e1008124, 2019 05.
Article in English | MEDLINE | ID: mdl-31071088

ABSTRACT

The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records.


Subject(s)
Genealogy and Heraldry , Genetics, Population , Longevity/genetics , Models, Genetic , Pedigree , Animals , Computer Simulation , Female , Genetic Fitness , Humans , Linear Models , Male , Plants/genetics
4.
Nat Methods ; 14(6): 590-592, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28436466

ABSTRACT

Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, it has proven problematic to genotype STRs from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data, and we report a genome-wide analysis and validation of de novo STR mutations. HipSTR is freely available at https://hipstr-tool.github.io/HipSTR.


Subject(s)
Chromosome Mapping/methods , DNA Fingerprinting/methods , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Genome, Human/genetics , Microsatellite Repeats/genetics , Algorithms , High-Throughput Nucleotide Sequencing , Humans , Sequence Alignment , Sequence Analysis, DNA , Software
5.
Bioinformatics ; 35(12): 2162-2164, 2019 06 01.
Article in English | MEDLINE | ID: mdl-30445428

ABSTRACT

MOTIVATION: Hidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state, and the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a naïve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of sub-chains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications. RESULTS: We developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (i) the (Viterbi) sequence of states with the highest posterior probability; (ii) the likelihood of the data and (iii) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications. AVAILABILITY AND IMPLEMENTATION: https://github.com/regevs/factorial_hmm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Genome , Genomics , Markov Chains , Probability , Software
6.
Nat Rev Genet ; 15(6): 409-21, 2014 06.
Article in English | MEDLINE | ID: mdl-24805122

ABSTRACT

We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.


Subject(s)
Computer Security , Genetic Privacy , Genetics, Medical , Humans
7.
Am J Hum Genet ; 98(5): 919-933, 2016 05 05.
Article in English | MEDLINE | ID: mdl-27126583

ABSTRACT

Short tandem repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs by using capillary electrophoresis and pedigree-based designs. Although this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of Y chromosome STRs (Y-STRs) with 2-6 bp repeat units that are accessible to Illumina sequencing. We genotyped 4,500 Y-STRs by using data from the 1000 Genomes Project and the Simons Genome Diversity Project. Next, we developed MUTEA, an algorithm that infers STR mutation rates from population-scale data by using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we harnessed MUTEA to derive mutation-rate estimates for 702 polymorphic STRs by tracing each locus over 222,000 meioses, resulting in the largest collection of Y-STR mutation rates to date. Using our estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. These predictions indicate that the load of de novo STR mutations is at least 75 mutations per generation, rivaling the load of all other known variant types. Finally, we identified Y-STRs with potential applications in forensics and genetic genealogy, assessed the ability to differentiate between the Y chromosomes of father-son pairs, and imputed Y-STR genotypes.


Subject(s)
Chromosomes, Human, Y/genetics , Genome, Human , Haplotypes/genetics , Microsatellite Repeats/genetics , Mutation Rate , Mutation/genetics , Genotype , Humans , Male
8.
Genome Res ; 25(10): 1411-6, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26430149

ABSTRACT

Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors.


Subject(s)
Genome, Human , Genomics , Sequence Analysis, DNA/economics , Sequence Analysis, DNA/instrumentation , Biosensing Techniques , Costs and Cost Analysis , Food Technology/trends , Forensic Sciences/trends , Genomics/economics , Genomics/legislation & jurisprudence , Genomics/methods , Humans , Miniaturization
9.
Bioinformatics ; 33(14): 2191-2193, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28334237

ABSTRACT

MOTIVATION: Millions of individuals have access to raw genomic data using direct-to-consumer companies. The advent of large-scale sequencing projects, such as the Precision Medicine Initiative, will further increase the number of individuals with access to their own genomic information. However, querying genomic data requires a computer terminal and computational skill to analyze the data-an impediment for the general public. RESULTS: DNA Compass is a website designed to empower the public by enabling simple navigation of personal genomic data. Users can query the status of their genomic variants for over 1658 markers or tens of millions of documented single nucleotide polymorphisms (SNPs). DNA Compass presents the relevant genotypes of the user side-by-side with explanatory scientific resources. The genotype data never leaves the user's computer, a feature that provides improved security and performance. More than 12 000 unique users, mainly from the general genetic genealogy community, have already used DNA Compass, demonstrating its utility. AVAILABILITY AND IMPLEMENTATION: DNA Compass is freely available on https://compass.dna.land . CONTACT: yaniv@cs.columbia.edu.


Subject(s)
Genome, Human , Information Storage and Retrieval , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/instrumentation , Software , Genetic Privacy , Humans , Precision Medicine
10.
Nucleic Acids Res ; 44(8): 3750-62, 2016 05 05.
Article in English | MEDLINE | ID: mdl-27060133

ABSTRACT

Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.


Subject(s)
DNA Methylation , Gene Expression Regulation , Polymorphism, Single Nucleotide , Promoter Regions, Genetic , Tandem Repeat Sequences , Genotyping Techniques , Humans , Linkage Disequilibrium , Quantitative Trait Loci , Sequence Analysis, DNA
11.
Plant J ; 86(4): 349-59, 2016 05.
Article in English | MEDLINE | ID: mdl-26959378

ABSTRACT

Screening large populations for carriers of known or de novo rare single nucleotide polymorphisms (SNPs) is required both in Targeting induced local lesions in genomes (TILLING) experiments in plants and in screening of human populations. We previously suggested an approach that combines the mathematical field of compressed sensing with next-generation sequencing to allow such large-scale screening. Based on pooled measurements, this method identifies multiple carriers of heterozygous or homozygous rare alleles while using only a small fraction of resources. Its rigorous mathematical foundations allow scalable and robust detection, and provide error correction and resilience to experimental noise. Here we present a large-scale experimental demonstration of our computational approach, in which we targeted a TILLING population of 1024 Sorghum bicolor lines to detect carriers of de novo SNPs whose frequency was less than 0.1%, using only 48 pools. Subsequent validation confirmed that all detected lines were indeed carriers of the predicted mutations. This novel approach provides a highly cost-effective and robust tool for biologists and breeders to allow identification of novel alleles and subsequent functional analysis.


Subject(s)
Genome, Plant , Polymorphism, Single Nucleotide , Sorghum/genetics , Alleles , Computational Biology/methods , Genes, Plant , Heterozygote
12.
Genome Res ; 24(11): 1894-904, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25135957

ABSTRACT

Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.


Subject(s)
Genetics, Population/methods , Genome, Human/genetics , Microsatellite Repeats/genetics , Polymorphism, Single Nucleotide , Alleles , Gene Frequency , Genetic Variation , Genomics/methods , Genotype , Humans , Linkage Disequilibrium
13.
PLoS Biol ; 12(11): e1001983, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25369215

ABSTRACT

Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game. Instead, we propose the use of trust-enabling techniques to create a solution in which researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.


Subject(s)
Genetic Privacy , Genome, Human , Genomics/legislation & jurisprudence , Informed Consent , Trust , Community-Based Participatory Research , Humans
14.
Mol Cell ; 36(3): 445-56, 2009 Nov 13.
Article in English | MEDLINE | ID: mdl-19917252

ABSTRACT

Drosophila Argonaute-1 and Argonaute-2 differ in function and small RNA content. AGO2 binds to siRNAs, whereas AGO1 is almost exclusively occupied by microRNAs. MicroRNA duplexes are intrinsically asymmetric, with one strand, the miR strand, preferentially entering AGO1 to recognize and regulate the expression of target mRNAs. The other strand, miR*, has been viewed as a byproduct of microRNA biogenesis. Here, we show that miR*s are often loaded as functional species into AGO2. This indicates that each microRNA precursor can potentially produce two mature small RNA strands that are differentially sorted within the RNAi pathway. miR* biogenesis depends upon the canonical microRNA pathway, but loading into AGO2 is mediated by factors traditionally dedicated to siRNAs. By inferring and validating hierarchical rules that predict differential AGO loading, we find that intrinsic determinants, including structural and thermodynamic properties of the processed duplex, regulate the fate of each RNA strand within the RNAi pathway.


Subject(s)
Arabidopsis Proteins/metabolism , Drosophila Proteins/metabolism , MicroRNAs/metabolism , RNA, Small Interfering/metabolism , RNA-Induced Silencing Complex/metabolism , 3' Untranslated Regions , Animals , Arabidopsis Proteins/genetics , Argonaute Proteins , Base Pairing , Blotting, Northern , Cell Line , Drosophila Proteins/genetics , Drosophila melanogaster/cytology , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Immunoprecipitation , MicroRNAs/chemistry , MicroRNAs/genetics , Models, Biological , Nucleic Acid Conformation , Protein Binding , RNA Interference , RNA Precursors/genetics , RNA Precursors/metabolism , RNA, Double-Stranded/genetics , RNA, Double-Stranded/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Small Interfering/chemistry , RNA, Small Interfering/genetics , RNA-Induced Silencing Complex/genetics , Thermodynamics
15.
Genes Dev ; 23(16): 1971-9, 2009 Aug 15.
Article in English | MEDLINE | ID: mdl-19684116

ABSTRACT

In some organisms, small RNA pathways can act nonautonomously, with responses spreading from cell to cell. Dedicated intercellular RNA delivery pathways have not yet been characterized in mammals, although secretory compartments have been found to contain RNA. Here we show that, upon cell contact, T cells acquire from B cells small RNAs that can impact the expression of target genes in the recipient T cells. Synthetic microRNA (miRNA) mimetics, viral miRNAs expressed by infected B cells, and endogenous miRNAs could all be transferred into T cells. These mechanisms may allow small RNA-mediated communication between immune cells. The documented transfer of viral miRNAs raises the possible exploitation of these pathways for viral manipulation of the host immune response.


Subject(s)
B-Lymphocytes/metabolism , Cell Communication , Gene Expression Regulation , MicroRNAs/metabolism , RNA, Viral/metabolism , T-Lymphocytes/metabolism , Cells, Cultured , Humans , Jurkat Cells
16.
Genome Res ; 22(6): 1154-62, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22522390

ABSTRACT

Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.


Subject(s)
Genome, Human , Genomics/methods , Microsatellite Repeats , Software , Algorithms , Electrophoresis/methods , Female , Genetic Variation , HapMap Project , Humans , Male , Pedigree , Reproducibility of Results
17.
Genome Res ; 21(5): 658-64, 2011 May.
Article in English | MEDLINE | ID: mdl-21487076

ABSTRACT

Whole exome sequencing has become a pivotal methodology for rapid and cost-effective detection of pathogenic variations in Mendelian disorders. A major challenge of this approach is determining the causative mutation from a substantial number of bystander variations that do not play any role in the disease etiology. Current strategies to analyze variations have mainly relied on genetic and functional arguments such as mode of inheritance, conservation, and loss of function prediction. Here, we demonstrate that disease-network analysis provides an additional layer of information to stratify variations even in the presence of incomplete sequencing coverage, a known limitation of exome sequencing. We studied a case of Hereditary Spastic Paraparesis (HSP) in a single inbred Palestinian family. HSP is a group of neuropathological disorders that are characterized by abnormal gait and spasticity of the lower limbs. Forty-five loci have been associated with HSP and lesions in 20 genes have been documented to induce the disorder. We used whole exome sequencing and homozygosity mapping to create a list of possible candidates. After exhausting the genetic and functional arguments, we stratified the remaining candidates according to their similarity to the previously known disease genes. Our analysis implicated the causative mutation in the motor domain of KIF1A, a gene that has not yet associated with HSP, which functions in anterograde axonal transportation. Our strategy can be useful for a large class of disorders that are characterized by locus heterogeneity, particularly when studying disorders in single families.


Subject(s)
Kinesins/genetics , Sequence Analysis, DNA/methods , Spastic Paraplegia, Hereditary/genetics , Adolescent , Adult , Amino Acid Sequence , Databases, Genetic , Exons/genetics , Genotype , Homozygote , Humans , Male , Models, Molecular , Mutation , Pedigree , Phenotype , Polymorphism, Single Nucleotide , Spastic Paraplegia, Hereditary/pathology , Young Adult
18.
bioRxiv ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38496508

ABSTRACT

Whether neurodegenerative diseases linked to misfolding of the same protein share genetic risk drivers or whether different protein-aggregation pathologies in neurodegeneration are mechanistically related remains uncertain. Conventional genetic analyses are underpowered to address these questions. Through careful selection of patients based on protein aggregation phenotype (rather than clinical diagnosis) we can increase statistical power to detect associated variants in a targeted set of genes that modify proteotoxicities. Genetic modifiers of alpha-synuclein (ɑS) and beta-amyloid (Aß) cytotoxicity in yeast are enriched in risk factors for Parkinson's disease (PD) and Alzheimer's disease (AD), respectively. Here, along with known AD/PD risk genes, we deeply sequenced exomes of 430 ɑS/Aß modifier genes in patients across alpha-synucleinopathies (PD, Lewy body dementia and multiple system atrophy). Beyond known PD genes GBA1 and LRRK2, rare variants AD genes (CD33, CR1 and PSEN2) and Aß toxicity modifiers involved in RhoA/actin cytoskeleton regulation (ARGHEF1, ARHGEF28, MICAL3, PASK, PKN2, PSEN2) were shared risk factors across synucleinopathies. Actin pathology occurred in iPSC synucleinopathy models and RhoA downregulation exacerbated ɑS pathology. Even in sporadic PD, the expression of these genes was altered across CNS cell types. Genome-wide CRISPR screens revealed the essentiality of PSEN2 in both human cortical and dopaminergic neurons, and PSEN2 mutation carriers exhibited diffuse brainstem and cortical synucleinopathy independent of AD pathology. PSEN2 contributes to a common-risk signal in PD GWAS and regulates ɑS expression in neurons. Our results identify convergent mechanisms across synucleinopathies, some shared with AD.

19.
Am J Hum Genet ; 86(1): 93-7, 2010 Jan.
Article in English | MEDLINE | ID: mdl-20036350

ABSTRACT

Patients with Joubert syndrome 2 (JBTS2) suffer from a neurological disease manifested by psychomotor retardation, hypotonia, ataxia, nystagmus, and oculomotor apraxia and variably associated with dysmorphism, as well as retinal and renal involvement. Brain MRI results show cerebellar vermis hypoplasia and additional anomalies of the fourth ventricle, corpus callosum, and occipital cortex. The disease has previously been mapped to the centromeric region of chromosome 11. Using homozygosity mapping in 13 patients from eight Ashkenazi Jewish families, we identified a homozygous mutation, R12L, in the TMEM216 gene, in all affected individuals. Thirty individuals heterozygous for the mutation were detected among 2766 anonymous Ashkenazi Jews, indicating a carrier rate of 1:92. Given the small size of the TMEM216 gene relative to other JBTS genes, its sequence analysis is warranted in all JBTS patients, especially those who suffer from associated anomalies.


Subject(s)
Mutation , Nervous System Diseases/genetics , Adolescent , Adult , Alleles , Brain/pathology , Child , Child, Preschool , DNA Mutational Analysis , Homozygote , Humans , Infant , Jews , Magnetic Resonance Imaging/methods , Nervous System Diseases/ethnology , Syndrome
20.
Bioinformatics ; 28(12): i197-206, 2012 Jun 15.
Article in English | MEDLINE | ID: mdl-22689761

ABSTRACT

MOTIVATION: Despite the rapid decline in sequencing costs, sequencing large cohorts of individuals is still prohibitively expensive. Recently, several sophisticated pooling designs were suggested that can identify carriers of rare alleles in large cohorts with a significantly smaller number of pools, thus dramatically reducing the cost of such large-scale sequencing projects. These approaches use combinatorial pooling designs where each individual is either present or absent from a pool. One can then infer the number of carriers in a pool, and by combining information across pools, reconstruct the identity of the carriers. RESULTS: We show that one can gain further efficiency and cost reduction by using 'weighted' designs, in which different individuals donate different amounts of DNA to the pools. Intuitively, in this situation, the number of mutant reads in a pool does not only indicate the number of carriers, but also their identity. We describe and study a powerful example of such weighted designs, using non-overlapping pools. We demonstrate that this approach is not only easier to implement and analyze but is also competitive in terms of accuracy with combinatorial designs when identifying rare variants, and is superior when sequencing common variants. We then discuss how weighting can be incorporated into existing combinatorial designs to increase their accuracy and demonstrate the resulting improvement using simulations. Finally, we argue that weighted designs have enough power to facilitate detection of common alleles, so they can be used as a cornerstone of whole-exome sequencing projects.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Alleles , DNA/analysis , Genetic Carrier Screening , High-Throughput Nucleotide Sequencing/economics , Humans , Models, Theoretical , Sequence Analysis, DNA/economics
SELECTION OF CITATIONS
SEARCH DETAIL