RESUMO
Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically â²104 barcodes. We here consider the use of random barcodes of sufficient length that they remain accurately decodable even with â³6 errors and even at [Formula: see text] or 20% nucleotide error rates. We show that length â¼34 nt is sufficient even with â³106 barcodes. The obvious objection to this scheme is that it requires comparing every read to every possible barcode by a slow Levenshtein or Needleman-Wunsch comparison. We show that several orders of magnitude speedup can be achieved by (i) a fast triage method that compares only trimer (three consecutive nucleotide) occurence statistics, precomputed in linear time for both reads and barcodes, and (ii) the massive parallelism available on today's even commodity-grade Graphics Processing Units (GPUs). With 106 barcodes of length 34 and 10% DNA errors (substitutions and indels), we achieve in simulation 99.9% precision (decode accuracy) with 98.8% recall (read acceptance rate). Similarly high precision with somewhat smaller recall is achievable even with 20% DNA errors. The amortized computation cost on a commodity workstation with two GPUs (2022 capability and price) is estimated as between US$ 0.15 and US$ 0.60 per million decoded reads.
RESUMO
We need a definitive public reference for the history of events.
Assuntos
Comitês Consultivos , COVID-19 , Pandemias , Comitês Consultivos/economia , Comitês Consultivos/organização & administração , COVID-19/epidemiologia , COVID-19/prevenção & controle , Controle de Doenças Transmissíveis , Planejamento em Desastres , Financiamento Governamental , Equidade em Saúde , Humanos , Saúde Pública , Informática em Saúde Pública , Estoque Estratégico , Estados UnidosRESUMO
Engineered SpCas9s and AsCas12a cleave fewer off-target genomic sites than wild-type (wt) Cas9. However, understanding their fidelity, mechanisms and cleavage outcomes requires systematic profiling across mispaired target DNAs. Here we describe NucleaSeq-nuclease digestion and deep sequencing-a massively parallel platform that measures the cleavage kinetics and time-resolved cleavage products for over 10,000 targets containing mismatches, insertions and deletions relative to the guide RNA. Combining cleavage rates and binding specificities on the same target libraries, we benchmarked five SpCas9 variants and AsCas12a. A biophysical model built from these data sets revealed mechanistic insights into off-target cleavage. Engineered Cas9s, especially Cas9-HF1, dramatically increased cleavage specificity but not binding specificity compared to wtCas9. Surprisingly, AsCas12a cleavage specificity differed little from that of wtCas9. Initial DNA cleavage sites and end trimming varied by nuclease, guide RNA and the positions of mispaired nucleotides. More broadly, NucleaSeq enables rapid, quantitative and systematic comparisons of specificity and cleavage outcomes across engineered and natural nucleases.
Assuntos
Proteínas de Bactérias , Proteína 9 Associada à CRISPR , Proteínas Associadas a CRISPR , Endodesoxirribonucleases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteína 9 Associada à CRISPR/química , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Proteínas Associadas a CRISPR/química , Proteínas Associadas a CRISPR/genética , Proteínas Associadas a CRISPR/metabolismo , Sistemas CRISPR-Cas , Endodesoxirribonucleases/química , Endodesoxirribonucleases/genética , Endodesoxirribonucleases/metabolismo , Edição de Genes , Cinética , Ligação Proteica/genética , Engenharia de Proteínas , RNA Guia de Cinetoplastídeos/química , RNA Guia de Cinetoplastídeos/genética , RNA Guia de Cinetoplastídeos/metabolismo , Especificidade por Substrato/genéticaAssuntos
COVID-19/epidemiologia , Pandemias , Previsões , Humanos , Modelos Biológicos , Estados Unidos/epidemiologiaRESUMO
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Assuntos
DNA/genética , Mutação INDEL , Replicação do DNA , Armazenamento e Recuperação da Informação , Modelos EstatísticosRESUMO
Long thought to be dispensable after establishing X chromosome inactivation (XCI), Xist RNA is now known to also maintain the inactive X (Xi). To what extent somatic X reactivation causes physiological abnormalities is an active area of inquiry. Here, we use multiple mouse models to investigate in vivo consequences. First, when Xist is deleted systemically in post-XCI embryonic cells using the Meox2-Cre driver, female pups exhibit no morbidity or mortality despite partial X reactivation. Second, when Xist is conditionally deleted in epithelial cells using Keratin14-Cre or in B cells using CD19-Cre, female mice have a normal life span without obvious illness. Third, when Xist is deleted in gut using Villin-Cre, female mice remain healthy despite significant X-autosome dosage imbalance. Finally, when the gut is acutely stressed by azoxymethane/dextran sulfate (AOM/DSS) exposure, both Xist-deleted and wild-type mice develop gastrointestinal tumors. Intriguingly, however, under prolonged stress, mutant mice develop larger tumors and have a higher tumor burden. The effect is female specific. Altogether, these observations reveal a surprising systemic tolerance to Xist loss but importantly reveal that Xist and XCI are protective to females during chronic stress.
Assuntos
Neoplasias Gastrointestinais/fisiopatologia , Doenças Genéticas Ligadas ao Cromossomo X/genética , Doenças Genéticas Ligadas ao Cromossomo X/microbiologia , RNA Longo não Codificante/genética , Cromossomo X/genética , Animais , Feminino , Neoplasias Gastrointestinais/etiologia , Neoplasias Gastrointestinais/genética , Neoplasias Gastrointestinais/metabolismo , Trato Gastrointestinal/metabolismo , Doenças Genéticas Ligadas ao Cromossomo X/complicações , Doenças Genéticas Ligadas ao Cromossomo X/metabolismo , Humanos , Masculino , Camundongos , RNA Longo não Codificante/metabolismo , Estresse Fisiológico , Carga Tumoral , Inativação do Cromossomo XRESUMO
Historically, the evolution of bats has been analyzed using a small number of genetic loci for many species or many genetic loci for a few species. Here we present a phylogeny of 18 bat species, each of which is represented in 1,107 orthologous gene alignments used to build the tree. We generated a transcriptome sequence of Hypsignathus monstrosus, the African hammer-headed bat, and additional transcriptome sequence for Rousettus aegyptiacus, the Egyptian fruit bat. We then combined these data with existing genomic and transcriptomic data from 16 other bat species. In the analysis of such datasets, there is no clear consensus on the most reliable computational methods for the curation of quality multiple sequence alignments since these public datasets represent multiple investigators and methods, including different source materials (chromosomal DNA or expressed RNA). Here we lay out a systematic analysis of parameters and produce an advanced pipeline for curating orthologous gene alignments from combined transcriptomic and genomic data, including a software package: the Mismatching Isoform eXon Remover (MIXR). Using this method, we created alignments of 11,677 bat genes, 1,107 of which contain orthologs from all 18 species. Using the orthologous gene alignments created, we assessed bat phylogeny and also performed a holistic analysis of positive selection acting in bat genomes. We found that 181 genes have been subject to positive natural selection. This list is dominated by genes involved in immune responses and genes involved in the production of collagens.
Assuntos
Quirópteros/genética , Genoma/genética , Seleção Genética/genética , Transcriptoma/genética , Sequência de Aminoácidos , Animais , Estudo de Associação Genômica Ampla/métodos , Filogenia , Alinhamento de SequênciaRESUMO
Animal cloning can be achieved through somatic cell nuclear transfer (SCNT), although the live birth rate is relatively low. Recent studies have identified H3K9me3 in donor cells and abnormal Xist activation as epigenetic barriers that impede SCNT. Here we overcome these barriers using a combination of Xist knockout donor cells and overexpression of Kdm4 to achieve more than 20% efficiency of mouse SCNT. However, post-implantation defects and abnormal placentas were still observed, indicating that additional epigenetic barriers impede SCNT cloning. Comparative DNA methylome analysis of IVF and SCNT blastocysts identified abnormally methylated regions in SCNT embryos despite successful global reprogramming of the methylome. Strikingly, allelic transcriptomic and ChIP-seq analyses of pre-implantation SCNT embryos revealed complete loss of H3K27me3 imprinting, which may account for the postnatal developmental defects observed in SCNT embryos. Together, these results provide an efficient method for mouse cloning while paving the way for further improving SCNT efficiency.
Assuntos
Implantação do Embrião/genética , Embrião de Mamíferos/metabolismo , Impressão Genômica , Histonas/metabolismo , Técnicas de Transferência Nuclear , Animais , Embrião de Mamíferos/embriologia , Feminino , Masculino , Camundongos , Camundongos Endogâmicos , Camundongos KnockoutRESUMO
Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error-correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.
Assuntos
Sequência de Bases , Código de Barras de DNA Taxonômico , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDELRESUMO
The X-chromosome harbors hundreds of disease genes whose associated diseases predominantly affect males. However, a subset, including neurodevelopmental disorders, Rett syndrome (RTT), fragile X syndrome, and CDKL5 syndrome, also affects females. These disorders lack disease-specific treatment. Because female cells carry two X chromosomes, an emerging treatment strategy has been to reawaken the healthy allele on the inactive X (Xi). Here, we focus on methyl-CpG binding protein 2 (MECP2) restoration for RTT and combinatorially target factors in the interactome of Xist, the noncoding RNA responsible for X inactivation. We identify a mixed modality approach combining an Xist antisense oligonucleotide and a small-molecule inhibitor of DNA methylation, which, together, achieve 30,000-fold MECP2 up-regulation from the Xi in cultured cells. Combining a brain-specific genetic Xist ablation with short-term 5-aza-2'-deoxycytidine (Aza) treatment models the synergy in vivo without evident toxicity. The Xi is selectively reactivated. These experiments provide proof of concept for a mixed modality approach for treating X-linked disorders in females.
Assuntos
Azacitidina/análogos & derivados , Terapia Genética/métodos , Proteína 2 de Ligação a Metil-CpG/genética , Oligonucleotídeos Antissenso/uso terapêutico , Síndrome de Rett/terapia , Animais , Azacitidina/farmacologia , Azacitidina/uso terapêutico , Encéfalo/metabolismo , Linhagem Celular , Metilação de DNA/efeitos dos fármacos , Decitabina , Feminino , Perfilação da Expressão Gênica , Masculino , Camundongos , Síndrome de Rett/genética , Inativação do Cromossomo XRESUMO
CRISPR-Cas nucleoproteins target foreign DNA via base pairing with a crRNA. However, a quantitative description of protein binding and nuclease activation at off-target DNA sequences remains elusive. Here, we describe a chip-hybridized association-mapping platform (CHAMP) that repurposes next-generation sequencing chips to simultaneously measure the interactions between proteins and â¼107 unique DNA sequences. Using CHAMP, we provide the first comprehensive survey of DNA recognition by a type I-E CRISPR-Cas (Cascade) complex and Cas3 nuclease. Analysis of mutated target sequences and human genomic DNA reveal that Cascade recognizes an extended protospacer adjacent motif (PAM). Cascade recognizes DNA with a surprising 3-nt periodicity. The identity of the PAM and the PAM-proximal nucleotides control Cas3 recruitment by releasing the Cse1 subunit. These findings are used to develop a model for the biophysical constraints governing off-target DNA binding. CHAMP provides a framework for high-throughput, quantitative analysis of protein-DNA interactions on synthetic and genomic DNA. PAPERCLIP.
Assuntos
Proteínas de Ligação a DNA/análise , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Ligação Proteica , Análise de Sequência de DNA/métodos , Sistemas CRISPR-Cas , Ensaio de Desvio de Mobilidade Eletroforética , Microscopia de Fluorescência , Motivos de NucleotídeosRESUMO
Schlafen11 (encoded by the SLFN11 gene) has been shown to inhibit the accumulation of HIV-1 proteins. We show that the SLFN11 gene is under positive selection in simian primates and is species-specific in its activity against HIV-1. The activity of human Schlafen11 is relatively weak compared to that of some other primate versions of this protein, with the versions encoded by chimpanzee, orangutan, gibbon, and marmoset being particularly potent inhibitors of HIV-1 protein production. Interestingly, we find that Schlafen11 is functional in the absence of infection and reduces protein production from certain non-viral (GFP) and even host (Vinculin and GAPDH) transcripts. This suggests that Schlafen11 may just generally block protein production from non-codon optimized transcripts. Because Schlafen11 is an interferon-stimulated gene with a broad ability to inhibit protein production from many host and viral transcripts, its role may be to create a general antiviral state in the cell. Interestingly, the strong inhibitors such as marmoset Schlafen11 consistently block protein production better than weak primate Schlafen11 proteins, regardless of the virus or host target being analyzed. Further, we show that the residues to which species-specific differences in Schlafen11 potency map are distinct from residues that have been targeted by positive selection. We speculate that the positive selection of SLFN11 could have been driven by a number of different factors, including interaction with one or more viral antagonists that have yet to be identified.
Assuntos
Evolução Molecular , Proteínas Nucleares/imunologia , Proteínas Virais/imunologia , Viroses/imunologia , Animais , Callithrix , Citometria de Fluxo , HIV-1/imunologia , Humanos , Hylobates , Immunoblotting , Mutagênese Sítio-Dirigida , Proteínas Nucleares/genética , Pan troglodytes , Reação em Cadeia da Polimerase , Pongo , Primatas , Seleção Genética , Especificidade da Espécie , Transfecção , Proteínas Virais/genéticaRESUMO
Ribosome profiling produces snapshots of the locations of actively translating ribosomes on messenger RNAs. These snapshots can be used to make inferences about translation dynamics. Recent ribosome profiling studies in yeast, however, have reached contradictory conclusions regarding the average translation rate of each codon. Some experiments have used cycloheximide (CHX) to stabilize ribosomes before measuring their positions, and these studies all counterintuitively report a weak negative correlation between the translation rate of a codon and the abundance of its cognate tRNA. In contrast, some experiments performed without CHX report strong positive correlations. To explain this contradiction, we identify unexpected patterns in ribosome density downstream of each type of codon in experiments that use CHX. These patterns are evidence that elongation continues to occur in the presence of CHX but with dramatically altered codon-specific elongation rates. The measured positions of ribosomes in these experiments therefore do not reflect the amounts of time ribosomes spend at each position in vivo. These results suggest that conclusions from experiments in yeast using CHX may need reexamination. In particular, we show that in all such experiments, codons decoded by less abundant tRNAs were in fact being translated more slowly before the addition of CHX disrupted these dynamics.
Assuntos
Elongação Traducional da Cadeia Peptídica , Ribossomos/metabolismo , Saccharomyces cerevisiae/genética , Códon , Cicloeximida/farmacologia , Inibidores da Síntese de Proteínas/farmacologia , RNA de Transferência/genética , RNA de Transferência/metabolismo , Ribossomos/genética , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
The long noncoding X-inactivation-specific transcript (Xist gene) is responsible for mammalian X-chromosome dosage compensation between the sexes, the process by which one of the two X chromosomes is inactivated in the female soma. Xist is essential for both the random and imprinted forms of X-chromosome inactivation. In the imprinted form, Xist is paternally marked to be expressed in female embryos. To investigate the mechanism of Xist imprinting, we introduce Xist transgenes (Tg) into the male germ line. Although ectopic high-level Xist expression on autosomes can be compatible with viability, transgenic animals demonstrate reduced fitness, subfertility, defective meiotic pairing, and other germ-cell abnormalities. In the progeny, paternal-specific expression is recapitulated by the 200-kb Xist Tg. However, Xist imprinting occurs efficiently only when it is in an unpaired or unpartnered state during male meiosis. When transmitted from a hemizygous father (+/Tg), the Xist Tg demonstrates paternal-specific expression in the early embryo. When transmitted by a homozygous father (Tg/Tg), the Tg fails to show imprinted expression. Thus, Xist imprinting is directed by sequences within a 200-kb X-linked region, and the hemizygous (unpaired) state of the Xist region promotes its imprinting in the male germ line.
Assuntos
Impressão Genômica , Células Germinativas/metabolismo , RNA Longo não Codificante/genética , Animais , Blastocisto/metabolismo , Epigênese Genética , Feminino , Hemizigoto , Infertilidade Masculina/genética , Infertilidade Masculina/patologia , Masculino , Camundongos Transgênicos , Fenótipo , RNA Longo não Codificante/síntese química , RNA Longo não Codificante/metabolismo , TransgenesRESUMO
It has been proposed that patterns in the usage of synonymous codons provide evidence that individual tRNA molecules are recycled through the ribosome, translating several occurrences of the same amino acid before diffusing away. The claimed evidence is based on counting the frequency with which pairs of synonymous codons are used at nearby occurrences of the same amino acid, as compared to the frequency expected if each codon were chosen independently from a single genome-wide distribution. We show that such statistics simply measure variation in codon preferences across a genome. As a negative control on the potential contribution of pressure to exploit tRNA recycling on these signals, we examine correlations in the usage of codons that encode different amino acids. We find that these controls are statistically as strong as the claimed evidence and conclude that there is no informatic evidence that tRNA recycling is a force shaping codon usage.
Assuntos
Modelos Genéticos , RNA de Transferência/metabolismo , Códon , Bases de Dados Genéticas , Ribossomos/genética , Ribossomos/metabolismo , Leveduras/genéticaRESUMO
A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ~0.1-1 × 10(-2) per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, "circle sequencing," which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10(-6) per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors.