Pesquisa | Secretaria de Estado da Saúde

1.

Genetic and phenotypic links between obesity and extracellular vesicles.

Zhai, Ranran; Pan, Lu; Yang, Zhijian; Li, Ting; Ning, Zheng; Pawitan, Yudi; Wilson, James F; Wu, Di; Shen, Xia.

Hum Mol Genet ; 31(21): 3643-3651, 2022 10 28.

Artigo em Inglês | MEDLINE | ID: mdl-35357430

RESUMO

Obesity has a highly complex genetic architecture, making it difficult to understand the genetic mechanisms, despite the large number of discovered loci via genome-wide association studies (GWAS). Omics techniques have provided a better resolution to view this problem. As a proxy of cell-level biology, extracellular vesicles (EVs) are useful for studying cellular regulation of complex phenotypes such as obesity. Here, in a well-established Scottish cohort, we utilized a novel technology to detect surface proteins across millions of single EVs in each individual's plasma sample. Integrating the results with established obesity GWAS, we inferred 78 types of EVs carrying one or two of 12 surface proteins to be associated with adiposity-related traits such as waist circumference. We then verified that particular EVs' abundance is negatively correlated with body adiposity, while no association with lean body mass. We also revealed that genetic variants associated with protein-specific EVs capture 2-4-fold heritability enrichment for blood cholesterol levels. Our findings provide evidence that EVs with specific surface proteins have phenotypic and genetic links to obesity and blood lipids, respectively, guiding future EV biomarker research.

Assuntos

Vesículas Extracelulares , Obesidade , Humanos , Vesículas Extracelulares/genética , Estudo de Associação Genômica Ampla , Proteínas de Membrana/genética , Obesidade/genética , Fenótipo

2.

Isoform-level quantification for single-cell RNA sequencing.

Pan, Lu; Dinh, Huy Q; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 38(5): 1287-1294, 2022 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-34864849

RESUMO

MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA Mensageiro/genética , RNA

3.

Gastrointestinal biopsy of normal mucosa or nonspecific inflammation and risk of neurodegenerative disease: Nationwide matched cohort study.

Sun, Jiangwei; Ludvigsson, Jonas F; Roelstraete, Bjorn; Pedersen, Nancy L; Pawitan, Yudi; Wirdefeldt, Karin; Fang, Fang.

Eur J Neurol ; 30(11): 3430-3439, 2023 11.

Artigo em Inglês | MEDLINE | ID: mdl-36447380

RESUMO

BACKGROUND AND PURPOSE: Evidence has accumulated to support the early involvement of altered gastrointestinal (GI) function in neurodegenerative disease. However, risk of Alzheimer disease (AD) and Parkinson disease (PD) among individuals with a GI biopsy of normal mucosa or nonspecific inflammation is unknown. METHODS: This matched cohort study included all individuals in Sweden with a GI biopsy of normal mucosa (n = 480,346) or nonspecific inflammation (n = 655,937) during 1965-2016 (exposed group) as well as their individually matched population references and unexposed full siblings. A flexible parametric model and stratified Cox model were used to estimate hazard ratio (HR) and its 95% confidence interval (CI). RESULTS: Individuals with normal mucosa or nonspecific inflammation had a higher risk of AD and PD during the 20 years after biopsy. Compared with the population references, individuals with normal mucosa had an increased risk of AD (incidence rate [IR] difference = 13.53 per 100,000 person-years, HR [95% CI] = 1.15 [1.11-1.20]) and PD (IR difference = 6.72, HR [95% CI] = 1.16 [1.10-1.23]). Elevated risk was also observed for nonspecific inflammation regarding AD (IR difference = 13.28, HR [95% CI] = 1.11 [1.08-1.14]) and PD (IR difference = 6.83, HR [95% CI] = 1.10 [1.06-1.14]). Similar results were observed in subgroup and sensitivity analyses and when comparing with their unexposed siblings. CONCLUSIONS: Individuals with a GI biopsy of normal mucosa or nonspecific inflammation had an increased risk of AD and PD. This adds new evidence of the early involvement of GI dysfunction in neurodegenerative disease.

Assuntos

Doença de Alzheimer , Doenças Neurodegenerativas , Doença de Parkinson , Humanos , Estudos de Coortes , Doenças Neurodegenerativas/epidemiologia , Inflamação , Biópsia , Mucosa , Doença de Parkinson/epidemiologia , Suécia/epidemiologia , Fatores de Risco

4.

The frequency of misattributed paternity in Sweden is low and decreasing: A nationwide cohort study.

Dahlén, Torsten; Zhao, Jingcheng; Magnusson, Patrik K E; Pawitan, Yudi; Lavröd, Jakob; Edgren, Gustaf.

J Intern Med ; 291(1): 95-100, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34288189

RESUMO

BACKGROUND: The occurrence of misattributed paternity has consequences throughout society with implications ranging from inheritance and royal succession to transplantation. However, its frequency in Sweden is unknown. OBJECTIVE: To estimate the contemporary frequency of misattributed paternity in Sweden. METHODS: The study was based on nationwide ABO blood group data and a nationwide register of familial relationships in Sweden. These data were analysed using both a frequentist Poisson model and the Bayesian Gibbs model. The conduct of the study was approved by the regional ethics committee in Stockholm, Sweden (reference numbers 2018/167-31 and 2019-04656). RESULTS: Nearly two million mother-father-offspring family units were included. Overall, the frequency of misattributed paternity was estimated at 1.7% in both models. Misattributed paternity was more common among parents with low educational levels, and has decreased over time to a current 1%. CONCLUSIONS: The misattributed paternity rate is similar to the rates in other West European populations. Apart from widespread societal implications, studies on heritability may consider misattributed paternity as a minor source of error.

Assuntos

Paternidade , Revelação da Verdade , Teorema de Bayes , Estudos de Coortes , Humanos , Masculino , Suécia/epidemiologia

5.

Overall assessment for selected markers from high-throughput data.

Lee, Woojoo; Lee, Donghwan; Pawitan, Yudi.

Stat Med ; 41(30): 5830-5843, 2022 12 30.

Artigo em Inglês | MEDLINE | ID: mdl-36270585

RESUMO

Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined P $$ P $$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.

Assuntos

Algoritmos , Humanos , Reprodutibilidade dos Testes , Simulação por Computador

6.

Circall: fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data.

Nguyen, Dat Thanh; Trac, Quang Thinh; Nguyen, Thi-Hau; Nguyen, Ha-Nam; Ohad, Nir; Pawitan, Yudi; Vu, Trung Nghia.

BMC Bioinformatics ; 22(1): 495, 2021 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-34645386

RESUMO

BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.

Assuntos

RNA Circular , RNA , Humanos , RNA/genética , Splicing de RNA , RNA-Seq , Análise de Sequência de RNA

7.

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Deng, Wenjiang; Mou, Tian; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 36(3): 805-812, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31400221

RESUMO

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análise de Sequência de RNA , Software

8.

The transcriptome-wide landscape of molecular subtype-specific mRNA expression profiles in acute myeloid leukemia.

Mou, Tian; Pawitan, Yudi; Stahl, Matthias; Vesterlund, Mattias; Deng, Wenjiang; Jafari, Rozbeh; Bohlin, Anna; Österroos, Albin; Siavelis, Loannis; Bäckvall, Helena; Erkers, Tom; Kiviluoto, Santeri; Seashore-Ludlow, Brinton; Östling, Päivi; Orre, Lukas M; Kallioniemi, Olli; Lehmann, Sören; Lehtiö, Janne; Vu, Trung Nghia.

Am J Hematol ; 96(5): 580-588, 2021 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-33625756

RESUMO

Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.

Assuntos

Leucemia Mieloide Aguda/genética , RNA Mensageiro/biossíntese , RNA Neoplásico/biossíntese , Transcriptoma , Biomarcadores Tumorais , Genes Neoplásicos , Humanos , Leucemia Mieloide Aguda/classificação , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Proteínas de Fusão Oncogênica/biossíntese , Proteínas de Fusão Oncogênica/genética , Proteoma , RNA Mensageiro/genética , RNA Neoplásico/genética , RNA-Seq , Estudos Retrospectivos , Suécia

9.

A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits.

Ning, Zheng; Lee, Youngjo; Joshi, Peter K; Wilson, James F; Pawitan, Yudi; Shen, Xia.

Am J Hum Genet ; 101(6): 903-912, 2017 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-29198721

RESUMO

In recent years, as a secondary analysis in genome-wide association studies (GWASs), conditional and joint multiple-SNP analysis (GCTA-COJO) has been successful in allowing the discovery of additional association signals within detected loci. This suggests that many loci mapped in GWASs harbor more than a single causal variant. In order to interpret the underlying mechanism regulating a complex trait of interest in each discovered locus, researchers must assess the magnitude of allelic heterogeneity within the locus. We developed a penalized selection operator for jointly analyzing multiple variants (SOJO) within each mapped locus on the basis of LASSO (least absolute shrinkage and selection operator) regression derived from summary association statistics. We found that, compared to stepwise conditional multiple-SNP analysis, SOJO provided better sensitivity and specificity in predicting the number of alleles associated with complex traits in each locus. SOJO suggested causal variants potentially missed by GCTA-COJO. Compared to using top variants from genome-wide significant loci in GWAS, using SOJO increased the proportion of variance prediction for height by 65% without additional discovery samples or additional loci in the genome. Our empirical results indicate that human height is not only a highly polygenic trait, but also has high allelic heterogeneity within its established hundreds of loci.

Assuntos

Estatura/genética , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Alelos , Índice de Massa Corporal , Estudo de Associação Genômica Ampla , Humanos , Locos de Características Quantitativas

10.

Cell-level somatic mutation detection from single-cell RNA sequencing.

Vu, Trung Nghia; Nguyen, Ha-Nam; Calza, Stefano; Kalari, Krishna R; Wang, Liewei; Pawitan, Yudi.

Bioinformatics ; 35(22): 4679-4687, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31028395

RESUMO

MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mutação , Perfilação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Análise de Célula Única , Software

11.

Disease trajectories and mortality among women diagnosed with breast cancer.

Yang, Haomin; Pawitan, Yudi; He, Wei; Eriksson, Louise; Holowko, Natalie; Hall, Per; Czene, Kamila.

Breast Cancer Res ; 21(1): 95, 2019 08 16.

Artigo em Inglês | MEDLINE | ID: mdl-31420051

RESUMO

PURPOSE: Breast cancer is a common disease with a relatively good prognosis. Therefore, understanding the spectrum of diseases and mortality among breast cancer patients is important, though currently incomplete. We systematically examined the incidence and mortality of all diseases following a breast cancer diagnosis, as well as the sequential association of disease occurrences (trajectories). METHODS: In this national cohort study, 57,501 breast cancer patients (2001-2011) were compared to 564,703 matched women from the general Swedish population and followed until 2012. The matching criteria included year of birth, county of residence, and socioeconomic status. Based on information from the Swedish Patient and Cause of Death Registries, hazard ratios (HR) were estimated for disease incidence and mortality. Conditional logistic regression models were used to identify disease trajectories among breast cancer patients. RESULTS: Among 225 diseases, 45 had HRs > 1.5 and p < 0.0002 when comparing breast cancer patients with the general population. Diseases with highest HRs included lymphedema, radiodermatitis, and neutropenia, which are side effects of surgery, radiotherapy, and chemotherapy. Other than breast cancer, the only significantly increased cause of death was other solid cancers (HR = 1.16, 95% CI = 1.08-1.24). Two main groups of disease trajectories were identified, which suggest menopausal disorders as indicators for other solid cancers, and both neutropenia and dorsalgia as diseases and symptoms preceding death due to breast cancer. CONCLUSIONS: While an increased incidence of other diseases was found among breast cancer patients, increased mortality was only due to other solid cancers. Preventing death due to breast cancer should be a priority to prolong life in breast cancer patients, but closer surveillance of other solid cancers is also needed.

Assuntos

Neoplasias da Mama/epidemiologia , Neoplasias da Mama/mortalidade , Adulto , Idade de Início , Idoso , Idoso de 80 Anos ou mais , Neoplasias da Mama/diagnóstico , Feminino , Humanos , Incidência , Pessoa de Meia-Idade , Mortalidade , Razão de Chances , Vigilância da População , Modelos de Riscos Proporcionais , Sistema de Registros , Fatores Socioeconômicos , Suécia/epidemiologia

12.

On the relationship between the heritability and the attributable fraction.

Dahlqwist, Elisabeth; Magnusson, Patrik K E; Pawitan, Yudi; Sjölander, Arvid.

Hum Genet ; 138(4): 425-435, 2019 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-30941497

RESUMO

Heritability is the most commonly used measure of genetic contribution to disease outcomes. Being the fraction of the variance of latent trait liability attributable to genetic factors, heritability of binary traits is a difficult technical concept that is sometimes misinterpreted as the more-easily understandable concept of attributable fraction. In this paper we use the liability threshold model to describe the analytical relationship between heritability and attributable fraction. Towards this end, we consider a hypothetical intervention that is aimed to reduce the genetic risk of the disease for a specified target group of the population. We show how the relation between the heritability and the attributable fraction depends on the disease prevalence, the intervention effect and the size of the target group. We use two real examples to illustrate the practical implications of our theoretical results.

Assuntos

Predisposição Genética para Doença/epidemiologia , Modelos Genéticos , Modelos Estatísticos , Herança Multifatorial , Característica Quantitativa Herdável , Causalidade , Doença/etiologia , Doença/genética , Humanos , Fenótipo , Densidade Demográfica , Prevalência , Fatores de Risco , Tamanho da Amostra

13.

Isoform-level gene expression patterns in single-cell RNA-sequencing data.

Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias.

Bioinformatics ; 34(14): 2392-2400, 2018 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-29490015

RESUMO

Motivation: RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study, we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. Results: We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16 562 isoform-pairs from 4929 genes. Among those, 26% of the discovered patterns were significant (P<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. Finally, the effects of drop-out events and expression levels of isoforms on ISOP's performances were investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoform-level preference, commitment and heterogeneity in single-cell RNA-sequencing data. Availability and implementation: The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Expressão Gênica , Isoformas de RNA/genética , Análise de Sequência de RNA/métodos , Software , Neoplasias da Mama/genética , Linhagem Celular Tumoral , Feminino , Humanos

14.

A fast detection of fusion genes from paired-end RNA-seq data.

Vu, Trung Nghia; Deng, Wenjiang; Trac, Quang Thinh; Calza, Stefano; Hwang, Woochang; Pawitan, Yudi.

BMC Genomics ; 19(1): 786, 2018 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-30382840

RESUMO

BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.

Assuntos

Fusão Gênica , RNA/genética , Análise de Sequência de RNA , Algoritmos , Linhagem Celular Tumoral , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos

15.

Likelihood-based inference for bounds of causal parameters.

Lee, Woojoo; Sjölander, Arvid; Larsson, Anton; Pawitan, Yudi.

Stat Med ; 37(30): 4695-4706, 2018 12 30.

Artigo em Inglês | MEDLINE | ID: mdl-30155912

RESUMO

It is a common causal inference problem that, even with theoretically infinite samples, we might be able to only provide bounds for the parameters of interest. This problem occurs naturally, for example, in estimating causal interaction between two risk factors and in estimating the average causal effect using the instrumental variable or Mendelian randomization method. Current procedures include linear programming to get the estimated bounds, plus bootstrapping to get confidence intervals. We describe a likelihood-based procedure that automatically yields the interval estimate from the flat likelihood region and show some theory that allows us to construct confidence intervals from this non-regular likelihood. Finally, we illustrate the procedure with examples from the estimation of causal interaction between two risk factors and the treatment effect under partial compliance.

Assuntos

Causalidade , Funções Verossimilhança , Intervalos de Confiança , Interpretação Estatística de Dados , Humanos , Modelos Lineares , Modelos Logísticos , Modelos Estatísticos , Cooperação do Paciente/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Fatores de Risco , Resultado do Tratamento

16.

A Critical Look at Entropy-Based Gene-Gene Interaction Measures.

Lee, Woojoo; Sjölander, Arvid; Pawitan, Yudi.

Genet Epidemiol ; 40(5): 416-24, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27229752

RESUMO

Several entropy-based measures for detecting gene-gene interaction have been proposed recently. It has been argued that the entropy-based measures are preferred because entropy can better capture the nonlinear relationships between genotypes and traits, so they can be useful to detect gene-gene interactions for complex diseases. These suggested measures look reasonable at intuitive level, but so far there has been no detailed characterization of the interactions captured by them. Here we study analytically the properties of some entropy-based measures for detecting gene-gene interactions in detail. The relationship between interactions captured by the entropy-based measures and those of logistic regression models is clarified. In general we find that the entropy-based measures can suffer from a lack of specificity in terms of target parameters, i.e., they can detect uninteresting signals as interactions. Numerical studies are carried out to confirm theoretical findings.

Assuntos

Modelos Genéticos , Entropia , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Logísticos , Fenótipo

17.

A clinical model for identifying the short-term risk of breast cancer.

Eriksson, Mikael; Czene, Kamila; Pawitan, Yudi; Leifland, Karin; Darabi, Hatef; Hall, Per.

Breast Cancer Res ; 19(1): 29, 2017 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-28288659

RESUMO

BACKGROUND: Most mammography screening programs are not individualized. To efficiently screen for breast cancer, the individual risk of the disease should be determined. We describe a model that could be used at most mammography screening units without adding substantial cost. METHODS: The study was based on the Karma cohort, which included 70,877 participants. Mammograms were collected up to 3 years following the baseline mammogram. A prediction protocol was developed using mammographic density, computer-aided detection of microcalcifications and masses, use of hormone replacement therapy (HRT), family history of breast cancer, menopausal status, age, and body mass index. Relative risks were calculated using conditional logistic regression. Absolute risks were calculated using the iCARE protocol. RESULTS: Comparing women at highest and lowest mammographic density yielded a fivefold higher risk of breast cancer for women at highest density. When adding microcalcifications and masses to the model, high-risk women had a nearly ninefold higher risk of breast cancer than those at lowest risk. In the full model, taking HRT use, family history of breast cancer, and menopausal status into consideration, the AUC reached 0.71. CONCLUSIONS: Measures of mammographic features and information on HRT use, family history of breast cancer, and menopausal status enabled early identification of women within the mammography screening program at such a high risk of breast cancer that additional examinations are warranted. In contrast, women at low risk could probably be screened less intensively.

Assuntos

Neoplasias da Mama/epidemiologia , Modelos Teóricos , Adulto , Idoso , Área Sob a Curva , Densidade da Mama , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/prevenção & controle , Estudos de Casos e Controles , Simulação por Computador , Detecção Precoce de Câncer , Feminino , Terapia de Reposição Hormonal/efeitos adversos , Humanos , Mamografia , Programas de Rastreamento , Pessoa de Meia-Idade , Risco , Fatores de Risco , Suécia/epidemiologia

18.

Gene-based meta-analysis of genome-wide association studies implicates new loci involved in obesity.

Hägg, Sara; Ganna, Andrea; Van Der Laan, Sander W; Esko, Tonu; Pers, Tune H; Locke, Adam E; Berndt, Sonja I; Justice, Anne E; Kahali, Bratati; Siemelink, Marten A; Pasterkamp, Gerard; Strachan, David P; Speliotes, Elizabeth K; North, Kari E; Loos, Ruth J F; Hirschhorn, Joel N; Pawitan, Yudi; Ingelsson, Erik.

Hum Mol Genet ; 24(23): 6849-60, 2015 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-26376864

RESUMO

To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index (BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based meta-analysis to identify regions with high allelic heterogeneity to discover additional obesity susceptibility loci. We included GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046 individuals from 43 cohorts in Stage 2, all within the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Each cohort was tested for association between â¼2.4 million (Stage 1) or â¼200 000 (Stage 2) imputed or genotyped single variants and BMI, and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the 'VErsatile Gene-based Association Study' (VEGAS) approach to assign variants to genes and to calculate gene-based P-values based on simulations. The VEGAS method was applied to each cohort separately before a gene-based meta-analysis was performed. In Stage 1, two known (FTO and TMEM18) and six novel (PEX2, MTFR2, SSFA2, IARS2, CEP295 and TXNDC12) loci were associated with BMI (P < 2.8 × 10(-6) for 17 941 gene tests). We confirmed all loci, and six of them were gene-wide significant in Stage 2 alone. We provide biological support for the loci by pathway, expression and methylation analyses. Our results indicate that gene-based meta-analysis of GWAS provides a useful strategy to find loci of interest that were not identified in standard single-marker analyses due to high allelic heterogeneity.

Assuntos

Índice de Massa Corporal , Loci Gênicos , Predisposição Genética para Doença , Obesidade/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único , População Branca/genética

19.

The ABC model of prostate cancer: A conceptual framework for the design and interpretation of prognostic studies.

Pettersson, Andreas; Gerke, Travis; Fall, Katja; Pawitan, Yudi; Holmberg, Lars; Giovannucci, Edward L; Kantoff, Philip W; Adami, Hans-Olov; Rider, Jennifer R; Mucci, Lorelei A.

Cancer ; 123(9): 1490-1496, 2017 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-28152172

RESUMO

There has been limited success in identifying prognostic biomarkers in prostate cancer. A partial explanation may be that insufficient emphasis has been put on clearly defining what type of marker or patient category a biomarker study aims to identify and how different cohort characteristics affect the ability to identify such a marker. In this article, the authors put forth the ABC model of prostate cancer, which defines 3 groups of patients with localized disease that an investigator may seek to identify: patients who, within a given time frame, will not develop metastases even if untreated (category A), will not develop metastases because of radical treatment (category B), or will develop metastases despite radical treatment (category C). The authors demonstrate that follow-up time and prostate-specific antigen screening intensity influence the prevalence of patients in categories A, B, and C in a study cohort, and that prognostic markers must be tested in both treated and untreated cohorts to accurately distinguish the 3 groups. The authors suggest that more emphasis should be put on considering these factors when planning, conducting, and interpreting the results from prostate cancer biomarker studies, and propose the ABC model as a framework to aid in that process. Cancer 2017;123:1490-1496. © 2017 American Cancer Society.

Assuntos

Prostatectomia , Neoplasias da Próstata/terapia , Conduta Expectante , Biomarcadores/metabolismo , Intervalo Livre de Doença , Humanos , Masculino , Metástase Neoplásica , Prognóstico , Neoplasias da Próstata/metabolismo

20.

Doubly robust methods for handling confounding by cluster.

Zetterqvist, Johan; Vansteelandt, Stijn; Pawitan, Yudi; Sjölander, Arvid.

Biostatistics ; 17(2): 264-76, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26508769

RESUMO

In clustered designs such as family studies, the exposure-outcome association is usually confounded by both cluster-constant and cluster-varying confounders. The influence of cluster-constant confounders can be eliminated by studying the exposure-outcome association within (conditional on) clusters, but additional regression modeling is usually required to control for observed cluster-varying confounders. A problem is that the working regression model may be misspecified, in which case the estimated within-cluster association may be biased. To reduce sensitivity to model misspecification we propose to augment the standard working model for the outcome with an auxiliary working model for the exposure. We derive a doubly robust conditional generalized estimating equation (DRCGEE) estimator for the within-cluster association. This estimator combines the two models in such a way that it is consistent if either model is correct, not necessarily both. Thus, the DRCGEE estimator gives the researcher two chances instead of only one to make valid inference on the within-cluster association. We have implemented the estimator in an R package and we use it to examine the association between smoking during pregnancy and cognitive abilities in offspring, in a sample of siblings.

Assuntos

Interpretação Estatística de Dados , Modelos Estatísticos , Estudos Observacionais como Assunto , Feminino , Humanos , Gravidez , Efeitos Tardios da Exposição Pré-Natal/epidemiologia , Fumar/efeitos adversos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa