RESUMO
We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Assuntos
Livros , Cultura , Ciências Humanas , Linguística , Literatura , Ciências Sociais , Vocabulário , Algoritmos , Evolução Cultural , Coleta de Dados , Dicionários como Assunto , Enciclopédias como Assunto , Pessoas Famosas , TecnologiaRESUMO
Transposition of bacteriophage Mu uses two DNA cleavage sites and six transposase recognition sites, with each recognition site divided into two half-sites. The recognition sites can activate transposition of non-Mu DNA sequences if a complete set of Mu sequences is not available. We have analyzed 18 sequences from a non-Mu DNA molecule, selected in a functional assay for the ability to be transposed by MuA transposase. These sequences are remarkably diverse. Nonetheless, when viewed as a group they resemble a Mu DNA end, with a cleavage site and a single recognition site. Analysis of these "pseudo-Mu ends" indicates that most positions in the cleavage and recognition sites contribute sequence-specific information that helps drive transposition, though only the strongest contributors are apparent from mutagenesis data. The sequence analysis also suggests variability in the alignment of recognition half-sites. Transposition assays of specifically designed DNA substrates support the conclusion that the transposition machinery is flexible enough to permit variability in half-site spacing and also perhaps variability in the placement of the recognition site with respect to the cleavage site. This variability causes only local perturbations in the protein-DNA complex, as indicated by experiments in which altered and unaltered DNA substrates are paired.