RESUMO
OBJECTIVE: To investigate the association between infections and disability worsening in people with multiple sclerosis (MS) treated with either B-cell depleting therapy (rituximab) or interferon-beta/glatiramer acetate (IFN/GA). METHODS: This cohort study spanned from 2000 to 2021, using data from the Swedish MS Registry linked to national health care registries, comprising 8,759 rituximab and 7,561 IFN/GA treatment episodes. The risk of hospital-treated infection was estimated using multivariable Cox models. The association between infections and increase in Expanded Disability Status Scale (EDSS) scores was assessed using a doubly robust generalized estimating equations model. Additionally, a piece-wise exponential model analyzed events of increased disability beyond defined cut-off values, controlling for relapses, and MRI activity. RESULTS: Compared with IFN/GA, rituximab displayed increased risk of both inpatient- and outpatient-treated infections (hazard ratio [HR], 2.08; 95% confidence interval [CI], 1.50-2.90 and HR, 1.37; 95% CI, 1.13-1.67, respectively). An inpatient-treated infection was associated with a 0.19-unit increase in EDSS (95% CI, 0.12-0.26). Degree of worsening was greatest for progressive MS, and under IFN/GA treatment, which unlike rituximab, was more commonly associated with MRI activity. After controlling for relapses and MRI activity, inpatient-treated infections were associated with disability worsening in people with relapsing-remitting MS treated with IFN/GA (HR, 2.01; 95% CI, 1.59-2.53), but not in those treated with rituximab. INTERPRETATION: Compared to IFN/GA, rituximab doubled the infection risk, but reduced the risk of subsequent disability worsening. Further, the risk of worsening after hospital-treated infection was greater with progressive MS than with relapsing-remitting MS. Infection risk should be considered to improve long term outcomes. ANN NEUROL 2024.
RESUMO
Obesity has a highly complex genetic architecture, making it difficult to understand the genetic mechanisms, despite the large number of discovered loci via genome-wide association studies (GWAS). Omics techniques have provided a better resolution to view this problem. As a proxy of cell-level biology, extracellular vesicles (EVs) are useful for studying cellular regulation of complex phenotypes such as obesity. Here, in a well-established Scottish cohort, we utilized a novel technology to detect surface proteins across millions of single EVs in each individual's plasma sample. Integrating the results with established obesity GWAS, we inferred 78 types of EVs carrying one or two of 12 surface proteins to be associated with adiposity-related traits such as waist circumference. We then verified that particular EVs' abundance is negatively correlated with body adiposity, while no association with lean body mass. We also revealed that genetic variants associated with protein-specific EVs capture 2-4-fold heritability enrichment for blood cholesterol levels. Our findings provide evidence that EVs with specific surface proteins have phenotypic and genetic links to obesity and blood lipids, respectively, guiding future EV biomarker research.
Assuntos
Vesículas Extracelulares , Obesidade , Humanos , Vesículas Extracelulares/genética , Estudo de Associação Genômica Ampla , Proteínas de Membrana/genética , Obesidade/genética , FenótipoRESUMO
Finding an adequate dose of the drug by revealing the dose-response relationship is very crucial and a challenging problem in the clinical development. The main concerns in dose-finding study are to identify a minimum effective dose (MED) in anesthesia studies and maximum tolerated dose (MTD) in oncology clinical trials. For the estimation of MED and MTD, we propose two modifications of Firth's logistic regression using reparametrization, called reparametrized Firth's logistic regression (rFLR) and ridge-penalized reparametrized Firth's logistic regression (RrFLR). The proposed methods are designed by directly reducing the small-sample bias of the maximum likelihood estimate for the parameter of interest. In addition, we develop a method on how to construct confidence intervals for rFLR and RrFLR using profile penalized likelihood. In the up-and-down biased-coin design, numerical studies confirm the superior performance of the proposed methods in terms of the mean squared error, bias, and coverage accuracy of confidence intervals.
RESUMO
MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA Mensageiro/genética , RNARESUMO
BACKGROUND AND PURPOSE: Evidence has accumulated to support the early involvement of altered gastrointestinal (GI) function in neurodegenerative disease. However, risk of Alzheimer disease (AD) and Parkinson disease (PD) among individuals with a GI biopsy of normal mucosa or nonspecific inflammation is unknown. METHODS: This matched cohort study included all individuals in Sweden with a GI biopsy of normal mucosa (n = 480,346) or nonspecific inflammation (n = 655,937) during 1965-2016 (exposed group) as well as their individually matched population references and unexposed full siblings. A flexible parametric model and stratified Cox model were used to estimate hazard ratio (HR) and its 95% confidence interval (CI). RESULTS: Individuals with normal mucosa or nonspecific inflammation had a higher risk of AD and PD during the 20 years after biopsy. Compared with the population references, individuals with normal mucosa had an increased risk of AD (incidence rate [IR] difference = 13.53 per 100,000 person-years, HR [95% CI] = 1.15 [1.11-1.20]) and PD (IR difference = 6.72, HR [95% CI] = 1.16 [1.10-1.23]). Elevated risk was also observed for nonspecific inflammation regarding AD (IR difference = 13.28, HR [95% CI] = 1.11 [1.08-1.14]) and PD (IR difference = 6.83, HR [95% CI] = 1.10 [1.06-1.14]). Similar results were observed in subgroup and sensitivity analyses and when comparing with their unexposed siblings. CONCLUSIONS: Individuals with a GI biopsy of normal mucosa or nonspecific inflammation had an increased risk of AD and PD. This adds new evidence of the early involvement of GI dysfunction in neurodegenerative disease.
Assuntos
Doença de Alzheimer , Doenças Neurodegenerativas , Doença de Parkinson , Humanos , Estudos de Coortes , Doenças Neurodegenerativas/epidemiologia , Inflamação , Biópsia , Mucosa , Doença de Parkinson/epidemiologia , Suécia/epidemiologia , Fatores de RiscoRESUMO
BACKGROUND: The occurrence of misattributed paternity has consequences throughout society with implications ranging from inheritance and royal succession to transplantation. However, its frequency in Sweden is unknown. OBJECTIVE: To estimate the contemporary frequency of misattributed paternity in Sweden. METHODS: The study was based on nationwide ABO blood group data and a nationwide register of familial relationships in Sweden. These data were analysed using both a frequentist Poisson model and the Bayesian Gibbs model. The conduct of the study was approved by the regional ethics committee in Stockholm, Sweden (reference numbers 2018/167-31 and 2019-04656). RESULTS: Nearly two million mother-father-offspring family units were included. Overall, the frequency of misattributed paternity was estimated at 1.7% in both models. Misattributed paternity was more common among parents with low educational levels, and has decreased over time to a current 1%. CONCLUSIONS: The misattributed paternity rate is similar to the rates in other West European populations. Apart from widespread societal implications, studies on heritability may consider misattributed paternity as a minor source of error.
Assuntos
Paternidade , Revelação da Verdade , Teorema de Bayes , Estudos de Coortes , Humanos , Masculino , Suécia/epidemiologiaRESUMO
Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined P $$ P $$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.
Assuntos
Algoritmos , Humanos , Reprodutibilidade dos Testes , Simulação por ComputadorRESUMO
BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.
Assuntos
RNA Circular , RNA , Humanos , RNA/genética , Splicing de RNA , RNA-Seq , Análise de Sequência de RNARESUMO
MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análise de Sequência de RNA , SoftwareRESUMO
Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.
Assuntos
Leucemia Mieloide Aguda/genética , RNA Mensageiro/biossíntese , RNA Neoplásico/biossíntese , Transcriptoma , Biomarcadores Tumorais , Genes Neoplásicos , Humanos , Leucemia Mieloide Aguda/classificação , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Proteínas de Fusão Oncogênica/biossíntese , Proteínas de Fusão Oncogênica/genética , Proteoma , RNA Mensageiro/genética , RNA Neoplásico/genética , RNA-Seq , Estudos Retrospectivos , SuéciaRESUMO
In recent years, as a secondary analysis in genome-wide association studies (GWASs), conditional and joint multiple-SNP analysis (GCTA-COJO) has been successful in allowing the discovery of additional association signals within detected loci. This suggests that many loci mapped in GWASs harbor more than a single causal variant. In order to interpret the underlying mechanism regulating a complex trait of interest in each discovered locus, researchers must assess the magnitude of allelic heterogeneity within the locus. We developed a penalized selection operator for jointly analyzing multiple variants (SOJO) within each mapped locus on the basis of LASSO (least absolute shrinkage and selection operator) regression derived from summary association statistics. We found that, compared to stepwise conditional multiple-SNP analysis, SOJO provided better sensitivity and specificity in predicting the number of alleles associated with complex traits in each locus. SOJO suggested causal variants potentially missed by GCTA-COJO. Compared to using top variants from genome-wide significant loci in GWAS, using SOJO increased the proportion of variance prediction for height by 65% without additional discovery samples or additional loci in the genome. Our empirical results indicate that human height is not only a highly polygenic trait, but also has high allelic heterogeneity within its established hundreds of loci.
Assuntos
Estatura/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Alelos , Índice de Massa Corporal , Estudo de Associação Genômica Ampla , Humanos , Locos de Características QuantitativasRESUMO
MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Mutação , Perfilação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Análise de Célula Única , SoftwareRESUMO
PURPOSE: Breast cancer is a common disease with a relatively good prognosis. Therefore, understanding the spectrum of diseases and mortality among breast cancer patients is important, though currently incomplete. We systematically examined the incidence and mortality of all diseases following a breast cancer diagnosis, as well as the sequential association of disease occurrences (trajectories). METHODS: In this national cohort study, 57,501 breast cancer patients (2001-2011) were compared to 564,703 matched women from the general Swedish population and followed until 2012. The matching criteria included year of birth, county of residence, and socioeconomic status. Based on information from the Swedish Patient and Cause of Death Registries, hazard ratios (HR) were estimated for disease incidence and mortality. Conditional logistic regression models were used to identify disease trajectories among breast cancer patients. RESULTS: Among 225 diseases, 45 had HRs > 1.5 and p < 0.0002 when comparing breast cancer patients with the general population. Diseases with highest HRs included lymphedema, radiodermatitis, and neutropenia, which are side effects of surgery, radiotherapy, and chemotherapy. Other than breast cancer, the only significantly increased cause of death was other solid cancers (HR = 1.16, 95% CI = 1.08-1.24). Two main groups of disease trajectories were identified, which suggest menopausal disorders as indicators for other solid cancers, and both neutropenia and dorsalgia as diseases and symptoms preceding death due to breast cancer. CONCLUSIONS: While an increased incidence of other diseases was found among breast cancer patients, increased mortality was only due to other solid cancers. Preventing death due to breast cancer should be a priority to prolong life in breast cancer patients, but closer surveillance of other solid cancers is also needed.
Assuntos
Neoplasias da Mama/epidemiologia , Neoplasias da Mama/mortalidade , Adulto , Idade de Início , Idoso , Idoso de 80 Anos ou mais , Neoplasias da Mama/diagnóstico , Feminino , Humanos , Incidência , Pessoa de Meia-Idade , Mortalidade , Razão de Chances , Vigilância da População , Modelos de Riscos Proporcionais , Sistema de Registros , Fatores Socioeconômicos , Suécia/epidemiologiaRESUMO
Heritability is the most commonly used measure of genetic contribution to disease outcomes. Being the fraction of the variance of latent trait liability attributable to genetic factors, heritability of binary traits is a difficult technical concept that is sometimes misinterpreted as the more-easily understandable concept of attributable fraction. In this paper we use the liability threshold model to describe the analytical relationship between heritability and attributable fraction. Towards this end, we consider a hypothetical intervention that is aimed to reduce the genetic risk of the disease for a specified target group of the population. We show how the relation between the heritability and the attributable fraction depends on the disease prevalence, the intervention effect and the size of the target group. We use two real examples to illustrate the practical implications of our theoretical results.
Assuntos
Predisposição Genética para Doença/epidemiologia , Modelos Genéticos , Modelos Estatísticos , Herança Multifatorial , Característica Quantitativa Herdável , Causalidade , Doença/etiologia , Doença/genética , Humanos , Fenótipo , Densidade Demográfica , Prevalência , Fatores de Risco , Tamanho da AmostraRESUMO
Motivation: RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study, we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. Results: We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16 562 isoform-pairs from 4929 genes. Among those, 26% of the discovered patterns were significant (P<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. Finally, the effects of drop-out events and expression levels of isoforms on ISOP's performances were investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoform-level preference, commitment and heterogeneity in single-cell RNA-sequencing data. Availability and implementation: The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica , Isoformas de RNA/genética , Análise de Sequência de RNA/métodos , Software , Neoplasias da Mama/genética , Linhagem Celular Tumoral , Feminino , HumanosRESUMO
BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.
Assuntos
Fusão Gênica , RNA/genética , Análise de Sequência de RNA , Algoritmos , Linhagem Celular Tumoral , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodosRESUMO
It is a common causal inference problem that, even with theoretically infinite samples, we might be able to only provide bounds for the parameters of interest. This problem occurs naturally, for example, in estimating causal interaction between two risk factors and in estimating the average causal effect using the instrumental variable or Mendelian randomization method. Current procedures include linear programming to get the estimated bounds, plus bootstrapping to get confidence intervals. We describe a likelihood-based procedure that automatically yields the interval estimate from the flat likelihood region and show some theory that allows us to construct confidence intervals from this non-regular likelihood. Finally, we illustrate the procedure with examples from the estimation of causal interaction between two risk factors and the treatment effect under partial compliance.
Assuntos
Causalidade , Funções Verossimilhança , Intervalos de Confiança , Interpretação Estatística de Dados , Humanos , Modelos Lineares , Modelos Logísticos , Modelos Estatísticos , Cooperação do Paciente/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Fatores de Risco , Resultado do TratamentoRESUMO
Several entropy-based measures for detecting gene-gene interaction have been proposed recently. It has been argued that the entropy-based measures are preferred because entropy can better capture the nonlinear relationships between genotypes and traits, so they can be useful to detect gene-gene interactions for complex diseases. These suggested measures look reasonable at intuitive level, but so far there has been no detailed characterization of the interactions captured by them. Here we study analytically the properties of some entropy-based measures for detecting gene-gene interactions in detail. The relationship between interactions captured by the entropy-based measures and those of logistic regression models is clarified. In general we find that the entropy-based measures can suffer from a lack of specificity in terms of target parameters, i.e., they can detect uninteresting signals as interactions. Numerical studies are carried out to confirm theoretical findings.
Assuntos
Modelos Genéticos , Entropia , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Logísticos , FenótipoRESUMO
BACKGROUND: Most mammography screening programs are not individualized. To efficiently screen for breast cancer, the individual risk of the disease should be determined. We describe a model that could be used at most mammography screening units without adding substantial cost. METHODS: The study was based on the Karma cohort, which included 70,877 participants. Mammograms were collected up to 3 years following the baseline mammogram. A prediction protocol was developed using mammographic density, computer-aided detection of microcalcifications and masses, use of hormone replacement therapy (HRT), family history of breast cancer, menopausal status, age, and body mass index. Relative risks were calculated using conditional logistic regression. Absolute risks were calculated using the iCARE protocol. RESULTS: Comparing women at highest and lowest mammographic density yielded a fivefold higher risk of breast cancer for women at highest density. When adding microcalcifications and masses to the model, high-risk women had a nearly ninefold higher risk of breast cancer than those at lowest risk. In the full model, taking HRT use, family history of breast cancer, and menopausal status into consideration, the AUC reached 0.71. CONCLUSIONS: Measures of mammographic features and information on HRT use, family history of breast cancer, and menopausal status enabled early identification of women within the mammography screening program at such a high risk of breast cancer that additional examinations are warranted. In contrast, women at low risk could probably be screened less intensively.
Assuntos
Neoplasias da Mama/epidemiologia , Modelos Teóricos , Adulto , Idoso , Área Sob a Curva , Densidade da Mama , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/prevenção & controle , Estudos de Casos e Controles , Simulação por Computador , Detecção Precoce de Câncer , Feminino , Terapia de Reposição Hormonal/efeitos adversos , Humanos , Mamografia , Programas de Rastreamento , Pessoa de Meia-Idade , Risco , Fatores de Risco , Suécia/epidemiologiaRESUMO
To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index (BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based meta-analysis to identify regions with high allelic heterogeneity to discover additional obesity susceptibility loci. We included GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046 individuals from 43 cohorts in Stage 2, all within the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Each cohort was tested for association between â¼2.4 million (Stage 1) or â¼200 000 (Stage 2) imputed or genotyped single variants and BMI, and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the 'VErsatile Gene-based Association Study' (VEGAS) approach to assign variants to genes and to calculate gene-based P-values based on simulations. The VEGAS method was applied to each cohort separately before a gene-based meta-analysis was performed. In Stage 1, two known (FTO and TMEM18) and six novel (PEX2, MTFR2, SSFA2, IARS2, CEP295 and TXNDC12) loci were associated with BMI (P < 2.8 × 10(-6) for 17 941 gene tests). We confirmed all loci, and six of them were gene-wide significant in Stage 2 alone. We provide biological support for the loci by pathway, expression and methylation analyses. Our results indicate that gene-based meta-analysis of GWAS provides a useful strategy to find loci of interest that were not identified in standard single-marker analyses due to high allelic heterogeneity.