RESUMEN
The increasing proportion of variance in human complex traits explained by polygenic scores, along with progress in preimplantation genetic diagnosis, suggests the possibility of screening embryos for traits such as height or cognitive ability. However, the expected outcomes of embryo screening are unclear, which undermines discussion of associated ethical concerns. Here, we use theory, simulations, and real data to evaluate the potential gain of embryo screening, defined as the difference in trait value between the top-scoring embryo and the average embryo. The gain increases very slowly with the number of embryos but more rapidly with the variance explained by the score. Given current technology, the average gain due to screening would be ≈2.5 cm for height and ≈2.5 IQ points for cognitive ability. These mean values are accompanied by wide prediction intervals, and indeed, in large nuclear families, the majority of children top-scoring for height are not the tallest.
Asunto(s)
Embrión de Mamíferos/metabolismo , Pruebas Genéticas , Herencia Multifactorial/genética , Adulto , Familia , Estudio de Asociación del Genoma Completo , Humanos , FenotipoRESUMEN
Despite intensive study, most of the specific genetic factors that contribute to variation in human height remain undiscovered. We conducted a family-based linkage study of height in a unique cohort of very large nuclear families from a founder (Jewish) population. This design allowed for increased power to detect linkage, compared to previous family-based studies. Loci we identified in discovery families could explain an estimated lower bound of 6% of the variance in height in validation families. We showed that these loci are not tagging known common variants associated with height. Rather, we suggest that the observed signals arise from variants with large effects that are rare globally but elevated in frequency in the Jewish population.
Asunto(s)
Estatura/genética , Mapeo Cromosómico/métodos , Judíos/genética , Sitios de Carácter Cuantitativo , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Cohortes , Femenino , Frecuencia de los Genes , Ligamiento Genético , Humanos , Masculino , Persona de Mediana Edad , Linaje , Adulto JovenRESUMEN
Recent studies have shown a surprising phenomenon, whereby orthologous regulatory regions from different species drive similar expression levels despite being highly diverged in sequence. Here, we investigated this phenomenon by genomically integrating hundreds of ribosomal protein (RP) promoters from nine different yeast species into S. cerevisiae and accurately measuring their activity. We found that orthologous RP promoters have extreme expression conservation even across evolutionarily distinct yeast species. Notably, our measurements reveal two distinct mechanisms that underlie this conservation and which act in different regions of the promoter. In the core promoter region, we found compensatory changes, whereby effects of sequence variations in one part of the core promoter were reversed by variations in another part. In contrast, we observed robustness in Rap1 transcription factor binding sites, whereby significant sequence variations had little effect on promoter activity. Finally, cases in which orthologous promoter activities were not conserved could largely be explained by the sequence variation within the core promoter. Together, our results provide novel insights into the mechanisms by which expression is conserved throughout evolution across diverged promoter sequences.
Asunto(s)
Regiones Promotoras Genéticas , Proteínas Ribosómicas/genética , Saccharomyces cerevisiae/genética , Sitios de Unión , Evolución Molecular , Regulación Fúngica de la Expresión Génica , Variación Genética , Mutación , Unión Proteica , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismoRESUMEN
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.
Asunto(s)
Colaboración de las Masas , Expresión Génica , Regiones Promotoras Genéticas , Proteínas Ribosómicas/genética , Ribosomas/genética , Saccharomyces cerevisiae/genética , Algoritmos , Sitios de Unión/genética , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Genes Fúngicos , Modelos Genéticos , Mutación , Elementos Reguladores de la Transcripción , Ribosomas/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Biología de SistemasRESUMEN
The 5'-untranslated region (5'-UTR) of mRNAs contains elements that affect expression, yet the rules by which these regions exert their effect are poorly understood. Here, we studied the impact of 5'-UTR sequences on protein levels in yeast, by constructing a large-scale library of mutants that differ only in the 10 bp preceding the translational start site of a fluorescent reporter. Using a high-throughput sequencing strategy, we obtained highly accurate measurements of protein abundance for over 2,000 unique sequence variants. The resulting pool spanned an approximately sevenfold range of protein levels, demonstrating the powerful consequences of sequence manipulations of even 1-10 nucleotides immediately upstream of the start codon. We devised computational models that predicted over 70% of the measured expression variability in held-out sequence variants. Notably, a combined model of the most prominent features successfully explained protein abundance in an additional, independently constructed library, whose nucleotide composition differed greatly from the library used to parameterize the model. Our analysis reveals the dominant contribution of the start codon context at positions -3 to -1, mRNA secondary structure, and out-of-frame upstream AUGs (uAUGs) to phenotypic diversity, thereby advancing our understanding of how protein levels are modulated by 5'-UTR sequences, and paving the way toward predictably tuning protein expression through manipulations of 5'-UTRs.
Asunto(s)
Regiones no Traducidas 5' , Proteínas Fúngicas/metabolismo , Saccharomyces cerevisiae/metabolismo , Secuencia de Bases , Codón Iniciador , Cartilla de ADN , Proteínas Fúngicas/genética , Conformación de Ácido Nucleico , ARN Mensajero/genética , Saccharomyces cerevisiae/genéticaRESUMEN
Coordinate regulation of ribosomal protein (RP) genes is key for controlling cell growth. In yeast, it is unclear how this regulation achieves the required equimolar amounts of the different RP components, given that some RP genes exist in duplicate copies, while others have only one copy. Here, we tested whether the solution to this challenge is partly encoded within the DNA sequence of the RP promoters, by fusing 110 different RP promoters to a fluorescent gene reporter, allowing us to robustly detect differences in their promoter activities that are as small as ~10%. We found that single-copy RP promoters have significantly higher activities, suggesting that proper RP stoichiometry is indeed partly encoded within the RP promoters. Notably, we also partially uncovered how this regulation is encoded by finding that RP promoters with higher activity have more nucleosome-disfavoring sequences and characteristic spatial organizations of these sequences and of binding sites for key RP regulators. Mutations in these elements result in a significant decrease of RP promoter activity. Thus, our results suggest that intrinsic (DNA-dependent) nucleosome organization may be a key mechanism by which genomes encode biologically meaningful promoter activities. Our approach can readily be applied to uncover how transcriptional programs of other promoters are encoded.
Asunto(s)
Dosificación de Gen/fisiología , Regulación Fúngica de la Expresión Génica/fisiología , Genoma Fúngico/fisiología , Proteínas Ribosómicas/biosíntesis , Proteínas de Saccharomyces cerevisiae/biosíntesis , Saccharomyces cerevisiae/metabolismo , Nucleosomas/genética , Nucleosomas/metabolismo , Proteínas Ribosómicas/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMEN
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of ~900 S. cerevisiae and ~1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60-90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulation-promoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
Asunto(s)
Proteínas de Escherichia coli/genética , Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Regiones Promotoras Genéticas , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Medios de Cultivo , Escherichia coli/crecimiento & desarrollo , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Genes Reporteros , Glucosa/metabolismo , Proteínas Luminiscentes/genética , Proteínas Luminiscentes/metabolismo , Modelos Genéticos , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteína Fluorescente RojaRESUMEN
A full understanding of gene regulation requires an understanding of the contributions that the various regulatory regions have on gene expression. Although it is well established that sequences downstream of the main promoter can affect expression, our understanding of the scale of this effect and how it is encoded in the DNA is limited. Here, to measure the effect of native S. cerevisiae 3' end sequences on expression, we constructed a library of 85 fluorescent reporter strains that differ only in their 3' end region. Notably, despite being driven by the same strong promoter, our library spans a continuous twelve-fold range of expression values. These measurements correlate with endogenous mRNA levels, suggesting that the 3' end contributes to constitutive differences in mRNA levels. We used deep sequencing to map the 3'UTR ends of our strains and show that determination of polyadenylation sites is intrinsic to the local 3' end sequence. Polyadenylation mapping was followed by sequence analysis, we found that increased A/T content upstream of the main polyadenylation site correlates with higher expression, both in the library and genome-wide, suggesting that native genes differ by the encoded efficiency of 3' end processing. Finally, we use single cells fluorescence measurements, in different promoter activation levels, to show that 3' end sequences modulate protein expression dynamics differently than promoters, by predominantly affecting the size of protein production bursts as opposed to the frequency at which these bursts occur. Altogether, our results lead to a more complete understanding of gene regulation by demonstrating that 3' end regions have a unique and sequence dependent effect on gene expression.
Asunto(s)
Regiones no Traducidas 3' , Regulación Fúngica de la Expresión Génica , ARN Mensajero/genética , ARN Mensajero/metabolismo , Composición de Base , Biología Computacional , Genes Fúngicos , Genes Reporteros , Poli A/genética , Poli A/metabolismo , Regiones Promotoras Genéticas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
Background: For the past 50 years, standard guidelines have recommended the use of sex-adjusted mid-parental height to predict a child's final height. Here, we studied the accuracy of this procedure. Methods: We used height data in a cohort of 23 very large nuclear families (mean = 11 adult children per family). We compared the actual final height of the children to their height predicted by the standard procedure, as well as to alternative height predictions that incorporate corrections of mid-parental height for age, sex, and regression to the mean. Results: Standard mid-parental height explained 36% of the variance in children's heights, with a heritability of 74%, and children were on average 2.7 cm taller than predicted by their target heights. When we introduced a nonlinear correction for the age of the parents, employed a multiplicative (rather than additive) correction for sex, and accounted for regression to the mean, the variance explained increased to 40%, heritability increased to 80%, and prediction bias was reduced from 2.7 cm to 0.14 cm (representing an improvement in prediction by half a standard deviation of the height distribution). We further measured the empirical distribution of the heights of adult children around their predicted height. We describe how this distribution can be used to estimate the probability that a child's height is within the normal expected range. Conclusions and Relevance: Based on these observations, we propose an improved method for predicting children's target heights. Our procedure for determining whether the deviation of a child's projected height from the target height is in the normal range can be used to assess whether the child should be tested further for potential medical abnormalities.
RESUMEN
OBJECTIVES: Despite the success in developing COVID-19 vaccines, containment of the disease is obstructed worldwide by vaccine production bottlenecks, logistics hurdles, vaccine refusal, transmission through unvaccinated children, and the appearance of new viral variants. This underscores the need for effective strategies for identifying carriers/patients, which was the main aim of this study. METHODS: We present a bubble-based PCR testing approach using swab-pooling into lysis buffer. A bubble is a cluster of people who can be periodically tested for SARS-CoV-2 by swab-pooling. A positive test of a pool mandates quarantining each of its members, who are then individually tested while in isolation to identify the carrier(s) for further epidemiological contact tracing. RESULTS: We tested an overall sample of 25 831 individuals, divided into 1273 bubbles, with an average size of 20.3 ± 7.7 swabs/test tube, obtaining for all pools (≤37 swabs/pool) a specificity of 97.5% (lower bound 96.6%) and a sensitivity of 86.3% (lower bound 78.2%) and a post hoc analyzed sensitivity of 94.6% (lower bound 86.7%) and a specificity of 97.2% (lower bound 96.2%) in pools with ≤25 swabs, relative to individual testing. DISCUSSION: This approach offers a significant scale-up in sampling and testing throughput and savings in testing cost, without reducing sensitivity or affecting the standard PCR testing laboratory routine. It can be used in school classes, airplanes, hospitals, military units, and workplaces, and may be applicable to future pandemics.
Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , Prueba de COVID-19 , Vacunas contra la COVID-19 , Niño , Humanos , Pandemias , ARN Viral , SARS-CoV-2/genética , Sensibilidad y Especificidad , Manejo de EspecímenesRESUMEN
Conducting numerous, rapid, and reliable PCR tests for SARS-CoV-2 is essential for our ability to monitor and control the current COVID-19 pandemic. Here, we tested the sensitivity and efficiency of SARS-CoV-2 detection in clinical samples collected directly into a mix of lysis buffer and RNA preservative, thus inactivating the virus immediately after sampling. We tested 79 COVID-19 patients and 20 healthy controls. We collected two samples (nasopharyngeal swabs) from each participant: one swab was inserted into a test tube with Viral Transport Medium (VTM), following the standard guideline used as the recommended method for sample collection; the other swab was inserted into a lysis buffer supplemented with nucleic acid stabilization mix (coined NSLB). We found that RT-qPCR tests of patients were significantly more sensitive with NSLB sampling, reaching detection threshold 2.1±0.6 (Mean±SE) PCR cycles earlier then VTM samples from the same patient. We show that this improvement is most likely since NSLB samples are not diluted in lysis buffer before RNA extraction. Re-extracting RNA from NSLB samples after 72 hours at room temperature did not affect the sensitivity of detection, demonstrating that NSLB allows for long periods of sample preservation without special cooling equipment. We also show that swirling the swab in NSLB and discarding it did not reduce sensitivity compared to retaining the swab in the tube, thus allowing improved automation of COVID-19 tests. Overall, we show that using NSLB instead of VTM can improve the sensitivity, safety, and rapidity of COVID-19 tests at a time most needed.
Asunto(s)
Límite de Detección , SARS-CoV-2/aislamiento & purificación , Seguridad , Manejo de Especímenes/métodos , Adulto , Tampones (Química) , Femenino , Humanos , Masculino , Pandemias , Reacción en Cadena de la Polimerasa , SARS-CoV-2/genética , Factores de TiempoRESUMEN
Understanding how precise control of gene expression is specified within regulatory DNA sequences is a key challenge with far-reaching implications. Many studies have focused on the regulatory role of transcription factor-binding sites. Here, we explore the transcriptional effects of different elements, nucleosome-disfavoring sequences and, specifically, poly(dA:dT) tracts that are highly prevalent in eukaryotic promoters. By measuring promoter activity for a large-scale promoter library, designed with systematic manipulations to the properties and spatial arrangement of poly(dA:dT) tracts, we show that these tracts significantly and causally affect transcription. We show that manipulating these elements offers a general genetic mechanism, applicable to promoters regulated by different transcription factors, for tuning expression in a predictable manner, with resolution that can be even finer than that attained by altering transcription factor sites. Overall, our results advance the understanding of the regulatory code and suggest a potential mechanism by which promoters yielding prespecified expression patterns can be designed.
Asunto(s)
Regulación Fúngica de la Expresión Génica , Genes Fúngicos , Nucleosomas/genética , Levaduras/genética , Secuencia de Bases , Sitios de Unión , ADN de Hongos/genética , Datos de Secuencia Molecular , Regiones Promotoras Genéticas , Factores de Transcripción/genética , Transcripción GenéticaRESUMEN
Despite extensive research, our understanding of the rules according to which cis-regulatory sequences are converted into gene expression is limited. We devised a method for obtaining parallel, highly accurate gene expression measurements from thousands of designed promoters and applied it to measure the effect of systematic changes in the location, number, orientation, affinity and organization of transcription-factor binding sites and nucleosome-disfavoring sequences. Our analyses reveal a clear relationship between expression and binding-site multiplicity, as well as dependencies of expression on the distance between transcription-factor binding sites and gene starts which are transcription-factor specific, including a striking â¼10-bp periodic relationship between gene expression and binding-site location. We show how this approach can measure transcription-factor sequence specificities and the sensitivity of transcription-factor sites to the surrounding sequence context, and compare the activity of 75 yeast transcription factors. Our method can be used to study both cis and trans effects of genotype on transcriptional, post-transcriptional and translational control.