RESUMO
The increasing proportion of variance in human complex traits explained by polygenic scores, along with progress in preimplantation genetic diagnosis, suggests the possibility of screening embryos for traits such as height or cognitive ability. However, the expected outcomes of embryo screening are unclear, which undermines discussion of associated ethical concerns. Here, we use theory, simulations, and real data to evaluate the potential gain of embryo screening, defined as the difference in trait value between the top-scoring embryo and the average embryo. The gain increases very slowly with the number of embryos but more rapidly with the variance explained by the score. Given current technology, the average gain due to screening would be ≈2.5 cm for height and ≈2.5 IQ points for cognitive ability. These mean values are accompanied by wide prediction intervals, and indeed, in large nuclear families, the majority of children top-scoring for height are not the tallest.
Assuntos
Embrião de Mamíferos/metabolismo , Testes Genéticos , Herança Multifatorial/genética , Adulto , Família , Estudo de Associação Genômica Ampla , Humanos , FenótipoRESUMO
Theoretical guarantees for causal inference using propensity scores are partially based on the scores behaving like conditional probabilities. However, prediction scores between zero and one do not necessarily behave like probabilities, especially when output by flexible statistical estimators. We perform a simulation study to assess the error in estimating the average treatment effect before and after applying a simple and well-established postprocessing method to calibrate the propensity scores. We observe that postcalibration reduces the error in effect estimation and that larger improvements in calibration result in larger improvements in effect estimation. Specifically, we find that expressive tree-based estimators, which are often less calibrated than logistic regression-based models initially, tend to show larger improvements relative to logistic regression-based models. Given the improvement in effect estimation and that postcalibration is computationally cheap, we recommend its adoption when modeling propensity scores with expressive models.
Assuntos
Probabilidade , Pontuação de Propensão , Humanos , Modelos Logísticos , Simulação por Computador , Calibragem , Modelos Estatísticos , CausalidadeRESUMO
Bacterial RNase III plays important roles in the processing and degradation of RNA transcripts. A major goal is to identify the cleavage targets of this endoribonuclease at a transcriptome-wide scale and delineate its in vivo cleavage rules. Here we applied to Escherichia coli grown to either exponential or stationary phase a tailored RNA-seq-based technology, which allows transcriptome-wide mapping of RNase III cleavage sites at a nucleotide resolution. Our analysis of the large-scale in vivo cleavage data substantiated the established cleavage pattern of a double cleavage in an intra-molecular stem structure, leaving 2-nt-long 3' overhangs, and refined the base-pairing preferences in the cleavage site vicinity. Intriguingly, we observed that the two stem positions between the cleavage sites are highly base-paired, usually involving at least one G-C or C-G base pair. We present a clear distinction between intra-molecular stem structures that are RNase III substrates and intra-molecular stem structures randomly selected across the transcriptome, emphasizing the in vivo specificity of RNase III. Our study provides a comprehensive map of the cleavage sites in both intra-molecular and inter-molecular duplex substrates, providing novel insights into the involvement of RNase III in post-transcriptional regulation in the bacterial cell.
Assuntos
Proteínas de Escherichia coli/genética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , RNA Bacteriano/genética , RNA Mensageiro/genética , Ribonuclease III/genética , Pareamento de Bases , Sequência de Bases , Sítios de Ligação , Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Conformação de Ácido Nucleico , Ligação Proteica , Clivagem do RNA , Processamento Pós-Transcricional do RNA , RNA Bacteriano/química , RNA Bacteriano/metabolismo , RNA Mensageiro/química , RNA Mensageiro/metabolismo , Ribonuclease III/química , Ribonuclease III/metabolismo , Análise de Sequência de RNA , Especificidade por Substrato , TranscriptomaRESUMO
Background Computational models on the basis of deep neural networks are increasingly used to analyze health care data. However, the efficacy of traditional computational models in radiology is a matter of debate. Purpose To evaluate the accuracy and efficiency of a combined machine and deep learning approach for early breast cancer detection applied to a linked set of digital mammography images and electronic health records. Materials and Methods In this retrospective study, 52 936 images were collected in 13 234 women who underwent at least one mammogram between 2013 and 2017, and who had health records for at least 1 year before undergoing mammography. The algorithm was trained on 9611 mammograms and health records of women to make two breast cancer predictions: to predict biopsy malignancy and to differentiate normal from abnormal screening examinations. The study estimated the association of features with outcomes by using t test and Fisher exact test. The model comparisons were performed with a 95% confidence interval (CI) or by using the DeLong test. Results The resulting algorithm was validated in 1055 women and tested in 2548 women (mean age, 55 years ± 10 [standard deviation]). In the test set, the algorithm identified 34 of 71 (48%) false-negative findings on mammograms. For the malignancy prediction objective, the algorithm obtained an area under the receiver operating characteristic curve (AUC) of 0.91 (95% CI: 0.89, 0.93), with specificity of 77.3% (95% CI: 69.2%, 85.4%) at a sensitivity of 87%. When trained on clinical data alone, the model performed significantly better than the Gail model (AUC, 0.78 vs 0.54, respectively; P < .004). Conclusion The algorithm, which combined machine-learning and deep-learning approaches, can be applied to assess breast cancer at a level comparable to radiologists and has the potential to substantially reduce missed diagnoses of breast cancer. © RSNA, 2019 Online supplemental material is available for this article.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Registros Eletrônicos de Saúde , Mamografia/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Mama/diagnóstico por imagem , Feminino , Humanos , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
Polygenic risk scores (PRS) are increasingly used to estimate the personal risk of a trait based on genetics. However, most genomic cohorts are of European populations, with a strong under-representation of non-European groups. Given that PRS poorly transport across racial groups, this has the potential to exacerbate health disparities if used in clinical care. Hence there is a need to generate PRS that perform comparably across ethnic groups. Borrowing from recent advancements in the domain adaption field of machine learning, we propose FairPRS - an Invariant Risk Minimization (IRM) approach for estimating fair PRS or debiasing a pre-computed PRS. We test our method on both a diverse set of synthetic data and real data from the UK Biobank. We show our method can create ancestry-invariant PRS distributions that are both racially unbiased and largely improve phenotype prediction. We hope that FairPRS will contribute to a fairer characterization of patients by genetics rather than by race.