Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Cell ; 170(1): 35-47.e13, 2017 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-28666121

RESUMO

CRISPR-Cas nucleoproteins target foreign DNA via base pairing with a crRNA. However, a quantitative description of protein binding and nuclease activation at off-target DNA sequences remains elusive. Here, we describe a chip-hybridized association-mapping platform (CHAMP) that repurposes next-generation sequencing chips to simultaneously measure the interactions between proteins and ∼107 unique DNA sequences. Using CHAMP, we provide the first comprehensive survey of DNA recognition by a type I-E CRISPR-Cas (Cascade) complex and Cas3 nuclease. Analysis of mutated target sequences and human genomic DNA reveal that Cascade recognizes an extended protospacer adjacent motif (PAM). Cascade recognizes DNA with a surprising 3-nt periodicity. The identity of the PAM and the PAM-proximal nucleotides control Cas3 recruitment by releasing the Cse1 subunit. These findings are used to develop a model for the biophysical constraints governing off-target DNA binding. CHAMP provides a framework for high-throughput, quantitative analysis of protein-DNA interactions on synthetic and genomic DNA. PAPERCLIP.


Assuntos
Proteínas de Ligação a DNA/análise , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Ligação Proteica , Análise de Sequência de DNA/métodos , Sistemas CRISPR-Cas , Ensaio de Desvio de Mobilidade Eletroforética , Microscopia de Fluorescência , Motivos de Nucleotídeos
2.
Proc Natl Acad Sci U S A ; 117(31): 18489-18496, 2020 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32675237

RESUMO

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.


Assuntos
DNA/genética , Mutação INDEL , Replicação do DNA , Armazenamento e Recuperação da Informação , Modelos Estatísticos
3.
Proc Natl Acad Sci U S A ; 116(23): 11351-11360, 2019 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-31113885

RESUMO

Historically, the evolution of bats has been analyzed using a small number of genetic loci for many species or many genetic loci for a few species. Here we present a phylogeny of 18 bat species, each of which is represented in 1,107 orthologous gene alignments used to build the tree. We generated a transcriptome sequence of Hypsignathus monstrosus, the African hammer-headed bat, and additional transcriptome sequence for Rousettus aegyptiacus, the Egyptian fruit bat. We then combined these data with existing genomic and transcriptomic data from 16 other bat species. In the analysis of such datasets, there is no clear consensus on the most reliable computational methods for the curation of quality multiple sequence alignments since these public datasets represent multiple investigators and methods, including different source materials (chromosomal DNA or expressed RNA). Here we lay out a systematic analysis of parameters and produce an advanced pipeline for curating orthologous gene alignments from combined transcriptomic and genomic data, including a software package: the Mismatching Isoform eXon Remover (MIXR). Using this method, we created alignments of 11,677 bat genes, 1,107 of which contain orthologs from all 18 species. Using the orthologous gene alignments created, we assessed bat phylogeny and also performed a holistic analysis of positive selection acting in bat genomes. We found that 181 genes have been subject to positive natural selection. This list is dominated by genes involved in immune responses and genes involved in the production of collagens.


Assuntos
Quirópteros/genética , Genoma/genética , Seleção Genética/genética , Transcriptoma/genética , Sequência de Aminoácidos , Animais , Estudo de Associação Genômica Ampla/métodos , Filogenia , Alinhamento de Sequência
5.
Proc Natl Acad Sci U S A ; 115(27): E6217-E6226, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29925596

RESUMO

Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error-correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.


Assuntos
Sequência de Bases , Código de Barras de DNA Taxonômico , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL
6.
PLoS Pathog ; 12(12): e1006066, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28027315

RESUMO

Schlafen11 (encoded by the SLFN11 gene) has been shown to inhibit the accumulation of HIV-1 proteins. We show that the SLFN11 gene is under positive selection in simian primates and is species-specific in its activity against HIV-1. The activity of human Schlafen11 is relatively weak compared to that of some other primate versions of this protein, with the versions encoded by chimpanzee, orangutan, gibbon, and marmoset being particularly potent inhibitors of HIV-1 protein production. Interestingly, we find that Schlafen11 is functional in the absence of infection and reduces protein production from certain non-viral (GFP) and even host (Vinculin and GAPDH) transcripts. This suggests that Schlafen11 may just generally block protein production from non-codon optimized transcripts. Because Schlafen11 is an interferon-stimulated gene with a broad ability to inhibit protein production from many host and viral transcripts, its role may be to create a general antiviral state in the cell. Interestingly, the strong inhibitors such as marmoset Schlafen11 consistently block protein production better than weak primate Schlafen11 proteins, regardless of the virus or host target being analyzed. Further, we show that the residues to which species-specific differences in Schlafen11 potency map are distinct from residues that have been targeted by positive selection. We speculate that the positive selection of SLFN11 could have been driven by a number of different factors, including interaction with one or more viral antagonists that have yet to be identified.


Assuntos
Evolução Molecular , Proteínas Nucleares/imunologia , Proteínas Virais/imunologia , Viroses/imunologia , Animais , Callithrix , Citometria de Fluxo , HIV-1/imunologia , Humanos , Hylobates , Immunoblotting , Mutagênese Sítio-Dirigida , Proteínas Nucleares/genética , Pan troglodytes , Reação em Cadeia da Polimerase , Pongo , Primatas , Seleção Genética , Especificidade da Espécie , Transfecção , Proteínas Virais/genética
7.
PLoS Genet ; 11(12): e1005732, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26656907

RESUMO

Ribosome profiling produces snapshots of the locations of actively translating ribosomes on messenger RNAs. These snapshots can be used to make inferences about translation dynamics. Recent ribosome profiling studies in yeast, however, have reached contradictory conclusions regarding the average translation rate of each codon. Some experiments have used cycloheximide (CHX) to stabilize ribosomes before measuring their positions, and these studies all counterintuitively report a weak negative correlation between the translation rate of a codon and the abundance of its cognate tRNA. In contrast, some experiments performed without CHX report strong positive correlations. To explain this contradiction, we identify unexpected patterns in ribosome density downstream of each type of codon in experiments that use CHX. These patterns are evidence that elongation continues to occur in the presence of CHX but with dramatically altered codon-specific elongation rates. The measured positions of ribosomes in these experiments therefore do not reflect the amounts of time ribosomes spend at each position in vivo. These results suggest that conclusions from experiments in yeast using CHX may need reexamination. In particular, we show that in all such experiments, codons decoded by less abundant tRNAs were in fact being translated more slowly before the addition of CHX disrupted these dynamics.


Assuntos
Elongação Traducional da Cadeia Peptídica , Ribossomos/metabolismo , Saccharomyces cerevisiae/genética , Códon , Cicloeximida/farmacologia , Inibidores da Síntese de Proteínas/farmacologia , RNA de Transferência/genética , RNA de Transferência/metabolismo , Ribossomos/genética , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
8.
Proc Natl Acad Sci U S A ; 110(49): 19872-7, 2013 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-24243955

RESUMO

A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ~0.1-1 × 10(-2) per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, "circle sequencing," which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10(-6) per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors.


Assuntos
Biologia Computacional/métodos , DNA Circular/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Projetos de Pesquisa , Biblioteca Gênica
9.
Proc Natl Acad Sci U S A ; 109(26): 10409-13, 2012 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-22615375

RESUMO

The two-player Iterated Prisoner's Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y's score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player's best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner's Dilemma is an Ultimatum Game.


Assuntos
Teoria dos Jogos , Modelos Psicológicos , Prisioneiros , Humanos
10.
Proc Natl Acad Sci U S A ; 106(6): 1716-9, 2009 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-19188610

RESUMO

The use of profiling by ethnicity or nationality to trigger secondary security screening is a controversial social and political issue. Overlooked is the question of whether such actuarial methods are in fact mathematically justified, even under the most idealized assumptions of completely accurate prior probabilities, and secondary screenings concentrated on the highest-probability individuals. We show here that strong profiling (defined as screening at least in proportion to prior probability) is no more efficient than uniform random sampling of the entire population, because resources are wasted on the repeated screening of higher probability, but innocent, individuals. A mathematically optimal strategy would be "square-root biased sampling," the geometric mean between strong profiling and uniform sampling, with secondary screenings distributed broadly, although not uniformly, over the population. Square-root biased sampling is a general idea that can be applied whenever a "bell-ringer" event must be found by sampling with replacement, but can be recognized (either with certainty, or with some probability) when seen.


Assuntos
Probabilidade , Medidas de Segurança/normas , Etnicidade , Humanos , Modelos Estatísticos , Psicologia Social , Terrorismo/prevenção & controle
11.
Proc Natl Acad Sci U S A ; 106(52): 22387-92, 2009 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-20018711

RESUMO

As electronic medical records enable increasingly ambitious studies of treatment outcomes, ethical issues previously important only to limited clinical trials become relevant to unlimited whole populations. For randomized clinical trials, adaptive assignment strategies are known to expose substantially fewer patients to avoidable treatment failures than strategies with fixed assignments (e.g., equal sample sizes). An idealized adaptive case--the two-armed Bernoulli bandit problem--can be exactly optimized for a variety of ethically motivated cost functions that embody principles of duty-to-patient, but the solutions have been thought computationally infeasible when the numbers of patients in the study (the "horizon") is large. We report numerical experiments that yield a heuristic approximation that applies even to very large horizons, and we propose a near-optimal strategy that remains valid even when the horizon is unknown or unbounded, thus applicable to comparative effectiveness studies on large populations or to standard-of-care recommendations. For the case in which the economic cost of treatment is a parameter, we give a heuristic, near-optimal strategy for determining the superior treatment (whether more or less costly) while minimizing resources wasted on any inferior, more expensive, treatment. Key features of our heuristics can be generalized to more complicated protocols.


Assuntos
Pesquisa Comparativa da Efetividade/ética , Ensaios Clínicos Controlados Aleatórios como Assunto/ética , Teorema de Bayes , Pesquisa Comparativa da Efetividade/estatística & dados numéricos , Análise Custo-Benefício/ética , Análise Custo-Benefício/estatística & dados numéricos , Registros Eletrônicos de Saúde/ética , Registros Eletrônicos de Saúde/estatística & dados numéricos , Medicina Baseada em Evidências , Humanos , Modelos Estatísticos , Avaliação de Resultados em Cuidados de Saúde , Ensaios Clínicos Controlados Aleatórios como Assunto/economia , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos
13.
PNAS Nexus ; 1(5): pgac252, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36712375

RESUMO

Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically ≲104 barcodes. We here consider the use of random barcodes of sufficient length that they remain accurately decodable even with ≳6 errors and even at [Formula: see text] or 20% nucleotide error rates. We show that length ∼34 nt is sufficient even with ≳106 barcodes. The obvious objection to this scheme is that it requires comparing every read to every possible barcode by a slow Levenshtein or Needleman-Wunsch comparison. We show that several orders of magnitude speedup can be achieved by (i) a fast triage method that compares only trimer (three consecutive nucleotide) occurence statistics, precomputed in linear time for both reads and barcodes, and (ii) the massive parallelism available on today's even commodity-grade Graphics Processing Units (GPUs). With 106 barcodes of length 34 and 10% DNA errors (substitutions and indels), we achieve in simulation 99.9% precision (decode accuracy) with 98.8% recall (read acceptance rate). Similarly high precision with somewhat smaller recall is achievable even with 20% DNA errors. The amortized computation cost on a commodity workstation with two GPUs (2022 capability and price) is estimated as between US$ 0.15 and US$ 0.60 per million decoded reads.

16.
Nat Biotechnol ; 39(1): 84-93, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32895548

RESUMO

Engineered SpCas9s and AsCas12a cleave fewer off-target genomic sites than wild-type (wt) Cas9. However, understanding their fidelity, mechanisms and cleavage outcomes requires systematic profiling across mispaired target DNAs. Here we describe NucleaSeq-nuclease digestion and deep sequencing-a massively parallel platform that measures the cleavage kinetics and time-resolved cleavage products for over 10,000 targets containing mismatches, insertions and deletions relative to the guide RNA. Combining cleavage rates and binding specificities on the same target libraries, we benchmarked five SpCas9 variants and AsCas12a. A biophysical model built from these data sets revealed mechanistic insights into off-target cleavage. Engineered Cas9s, especially Cas9-HF1, dramatically increased cleavage specificity but not binding specificity compared to wtCas9. Surprisingly, AsCas12a cleavage specificity differed little from that of wtCas9. Initial DNA cleavage sites and end trimming varied by nuclease, guide RNA and the positions of mispaired nucleotides. More broadly, NucleaSeq enables rapid, quantitative and systematic comparisons of specificity and cleavage outcomes across engineered and natural nucleases.


Assuntos
Proteínas de Bactérias , Proteína 9 Associada à CRISPR , Proteínas Associadas a CRISPR , Endodesoxirribonucleases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteína 9 Associada à CRISPR/química , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Proteínas Associadas a CRISPR/química , Proteínas Associadas a CRISPR/genética , Proteínas Associadas a CRISPR/metabolismo , Sistemas CRISPR-Cas , Endodesoxirribonucleases/química , Endodesoxirribonucleases/genética , Endodesoxirribonucleases/metabolismo , Edição de Genes , Cinética , Ligação Proteica/genética , Engenharia de Proteínas , RNA Guia de Cinetoplastídeos/química , RNA Guia de Cinetoplastídeos/genética , RNA Guia de Cinetoplastídeos/metabolismo , Especificidade por Substrato/genética
17.
Genetics ; 174(2): 1029-40, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16951086

RESUMO

The genomes of mammals and birds can be partitioned into megabase-long regions, termed isochores, with consistently high, or low, average C + G content. Isochores with high CG contain a mixture of CG-rich and AT-rich genes, while high-AT isochores contain predominantly AT-rich genes. The two gene populations in the high-CG isochores are functionally distinguishable by statistical analysis of their gene ontology categories. However, the aggregate of the two populations in CG isochores is not statistically distinct from AT-rich genes in AT isochores. Genes tend to be located at local extrema of composition within the isochores, indicating that the CG-enriching mechanism acted differently when near to genes. On the other hand, maximum-likelihood reconstruction of molecular phylogenetic trees shows that branch lengths (evolutionary distances) for third codon positions in CG-rich genes are not substantially larger than those for AT-rich genes. In the context of neutral mutation theory this argues against any strong positive selection. Disparate features of isochores might be explained by a model in which about half of all genes functionally require AT richness, while, in warm-blooded organisms, about half the genome (in large coherent blocks) acquired a strong bias for mutations to CG. Using mutations in CG-rich genes as convenient indicators, we show that approximately 20% of amino acids in proteins are broadly substitutable, without regard to chemical similarity.


Assuntos
Isocoros/genética , Animais , Repetições de Dinucleotídeos/genética , Genética Populacional , Genoma Humano/fisiologia , Humanos , Isocoros/fisiologia , Filogenia , Análise de Sequência de DNA , Peixe-Zebra/genética
18.
19.
Cell Rep ; 8(6): 1624-1629, 2014 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-25199837

RESUMO

It has been proposed that patterns in the usage of synonymous codons provide evidence that individual tRNA molecules are recycled through the ribosome, translating several occurrences of the same amino acid before diffusing away. The claimed evidence is based on counting the frequency with which pairs of synonymous codons are used at nearby occurrences of the same amino acid, as compared to the frequency expected if each codon were chosen independently from a single genome-wide distribution. We show that such statistics simply measure variation in codon preferences across a genome. As a negative control on the potential contribution of pressure to exploit tRNA recycling on these signals, we examine correlations in the usage of codons that encode different amino acids. We find that these controls are statistically as strong as the claimed evidence and conclude that there is no informatic evidence that tRNA recycling is a force shaping codon usage.


Assuntos
Modelos Genéticos , RNA de Transferência/metabolismo , Códon , Bases de Dados Genéticas , Ribossomos/genética , Ribossomos/metabolismo , Leveduras/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA