Búsqueda | OPS/OMS Uruguay

1.

Genetic and phenotypic links between obesity and extracellular vesicles.

Zhai, Ranran; Pan, Lu; Yang, Zhijian; Li, Ting; Ning, Zheng; Pawitan, Yudi; Wilson, James F; Wu, Di; Shen, Xia.

Hum Mol Genet ; 31(21): 3643-3651, 2022 10 28.

Artículo en Inglés | MEDLINE | ID: mdl-35357430

RESUMEN

Obesity has a highly complex genetic architecture, making it difficult to understand the genetic mechanisms, despite the large number of discovered loci via genome-wide association studies (GWAS). Omics techniques have provided a better resolution to view this problem. As a proxy of cell-level biology, extracellular vesicles (EVs) are useful for studying cellular regulation of complex phenotypes such as obesity. Here, in a well-established Scottish cohort, we utilized a novel technology to detect surface proteins across millions of single EVs in each individual's plasma sample. Integrating the results with established obesity GWAS, we inferred 78 types of EVs carrying one or two of 12 surface proteins to be associated with adiposity-related traits such as waist circumference. We then verified that particular EVs' abundance is negatively correlated with body adiposity, while no association with lean body mass. We also revealed that genetic variants associated with protein-specific EVs capture 2-4-fold heritability enrichment for blood cholesterol levels. Our findings provide evidence that EVs with specific surface proteins have phenotypic and genetic links to obesity and blood lipids, respectively, guiding future EV biomarker research.

Asunto(s)

Vesículas Extracelulares , Obesidad , Humanos , Vesículas Extracelulares/genética , Estudio de Asociación del Genoma Completo , Proteínas de la Membrana/genética , Obesidad/genética , Fenotipo

2.

Isoform-level quantification for single-cell RNA sequencing.

Pan, Lu; Dinh, Huy Q; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 38(5): 1287-1294, 2022 02 07.

Artículo en Inglés | MEDLINE | ID: mdl-34864849

RESUMEN

MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Perfilación de la Expresión Génica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , ARN Mensajero/genética , ARN

3.

Gastrointestinal biopsy of normal mucosa or nonspecific inflammation and risk of neurodegenerative disease: Nationwide matched cohort study.

Sun, Jiangwei; Ludvigsson, Jonas F; Roelstraete, Bjorn; Pedersen, Nancy L; Pawitan, Yudi; Wirdefeldt, Karin; Fang, Fang.

Eur J Neurol ; 30(11): 3430-3439, 2023 11.

Artículo en Inglés | MEDLINE | ID: mdl-36447380

RESUMEN

BACKGROUND AND PURPOSE: Evidence has accumulated to support the early involvement of altered gastrointestinal (GI) function in neurodegenerative disease. However, risk of Alzheimer disease (AD) and Parkinson disease (PD) among individuals with a GI biopsy of normal mucosa or nonspecific inflammation is unknown. METHODS: This matched cohort study included all individuals in Sweden with a GI biopsy of normal mucosa (n = 480,346) or nonspecific inflammation (n = 655,937) during 1965-2016 (exposed group) as well as their individually matched population references and unexposed full siblings. A flexible parametric model and stratified Cox model were used to estimate hazard ratio (HR) and its 95% confidence interval (CI). RESULTS: Individuals with normal mucosa or nonspecific inflammation had a higher risk of AD and PD during the 20 years after biopsy. Compared with the population references, individuals with normal mucosa had an increased risk of AD (incidence rate [IR] difference = 13.53 per 100,000 person-years, HR [95% CI] = 1.15 [1.11-1.20]) and PD (IR difference = 6.72, HR [95% CI] = 1.16 [1.10-1.23]). Elevated risk was also observed for nonspecific inflammation regarding AD (IR difference = 13.28, HR [95% CI] = 1.11 [1.08-1.14]) and PD (IR difference = 6.83, HR [95% CI] = 1.10 [1.06-1.14]). Similar results were observed in subgroup and sensitivity analyses and when comparing with their unexposed siblings. CONCLUSIONS: Individuals with a GI biopsy of normal mucosa or nonspecific inflammation had an increased risk of AD and PD. This adds new evidence of the early involvement of GI dysfunction in neurodegenerative disease.

Asunto(s)

Enfermedad de Alzheimer , Enfermedades Neurodegenerativas , Enfermedad de Parkinson , Humanos , Estudios de Cohortes , Enfermedades Neurodegenerativas/epidemiología , Inflamación , Biopsia , Membrana Mucosa , Enfermedad de Parkinson/epidemiología , Suecia/epidemiología , Factores de Riesgo

4.

The frequency of misattributed paternity in Sweden is low and decreasing: A nationwide cohort study.

Dahlén, Torsten; Zhao, Jingcheng; Magnusson, Patrik K E; Pawitan, Yudi; Lavröd, Jakob; Edgren, Gustaf.

J Intern Med ; 291(1): 95-100, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-34288189

RESUMEN

BACKGROUND: The occurrence of misattributed paternity has consequences throughout society with implications ranging from inheritance and royal succession to transplantation. However, its frequency in Sweden is unknown. OBJECTIVE: To estimate the contemporary frequency of misattributed paternity in Sweden. METHODS: The study was based on nationwide ABO blood group data and a nationwide register of familial relationships in Sweden. These data were analysed using both a frequentist Poisson model and the Bayesian Gibbs model. The conduct of the study was approved by the regional ethics committee in Stockholm, Sweden (reference numbers 2018/167-31 and 2019-04656). RESULTS: Nearly two million mother-father-offspring family units were included. Overall, the frequency of misattributed paternity was estimated at 1.7% in both models. Misattributed paternity was more common among parents with low educational levels, and has decreased over time to a current 1%. CONCLUSIONS: The misattributed paternity rate is similar to the rates in other West European populations. Apart from widespread societal implications, studies on heritability may consider misattributed paternity as a minor source of error.

Asunto(s)

Paternidad , Revelación de la Verdad , Teorema de Bayes , Estudios de Cohortes , Humanos , Masculino , Suecia/epidemiología

5.

Overall assessment for selected markers from high-throughput data.

Lee, Woojoo; Lee, Donghwan; Pawitan, Yudi.

Stat Med ; 41(30): 5830-5843, 2022 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-36270585

RESUMEN

Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined P $$ P $$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.

Asunto(s)

Algoritmos , Humanos , Reproducibilidad de los Resultados , Simulación por Computador

6.

Circall: fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data.

Nguyen, Dat Thanh; Trac, Quang Thinh; Nguyen, Thi-Hau; Nguyen, Ha-Nam; Ohad, Nir; Pawitan, Yudi; Vu, Trung Nghia.

BMC Bioinformatics ; 22(1): 495, 2021 Oct 13.

Artículo en Inglés | MEDLINE | ID: mdl-34645386

RESUMEN

BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.

Asunto(s)

ARN Circular , ARN , Humanos , ARN/genética , Empalme del ARN , RNA-Seq , Análisis de Secuencia de ARN

7.

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Deng, Wenjiang; Mou, Tian; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 36(3): 805-812, 2020 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-31400221

RESUMEN

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Perfilación de la Expresión Génica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN , Programas Informáticos

8.

The transcriptome-wide landscape of molecular subtype-specific mRNA expression profiles in acute myeloid leukemia.

Mou, Tian; Pawitan, Yudi; Stahl, Matthias; Vesterlund, Mattias; Deng, Wenjiang; Jafari, Rozbeh; Bohlin, Anna; Österroos, Albin; Siavelis, Loannis; Bäckvall, Helena; Erkers, Tom; Kiviluoto, Santeri; Seashore-Ludlow, Brinton; Östling, Päivi; Orre, Lukas M; Kallioniemi, Olli; Lehmann, Sören; Lehtiö, Janne; Vu, Trung Nghia.

Am J Hematol ; 96(5): 580-588, 2021 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-33625756

RESUMEN

Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.

Asunto(s)

Leucemia Mieloide Aguda/genética , ARN Mensajero/biosíntesis , ARN Neoplásico/biosíntesis , Transcriptoma , Biomarcadores de Tumor , Genes Relacionados con las Neoplasias , Humanos , Leucemia Mieloide Aguda/clasificación , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biosíntesis , Proteínas de Neoplasias/genética , Proteínas de Fusión Oncogénica/biosíntesis , Proteínas de Fusión Oncogénica/genética , Proteoma , ARN Mensajero/genética , ARN Neoplásico/genética , RNA-Seq , Estudios Retrospectivos , Suecia

9.

A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits.

Ning, Zheng; Lee, Youngjo; Joshi, Peter K; Wilson, James F; Pawitan, Yudi; Shen, Xia.

Am J Hum Genet ; 101(6): 903-912, 2017 Dec 07.

Artículo en Inglés | MEDLINE | ID: mdl-29198721

RESUMEN

In recent years, as a secondary analysis in genome-wide association studies (GWASs), conditional and joint multiple-SNP analysis (GCTA-COJO) has been successful in allowing the discovery of additional association signals within detected loci. This suggests that many loci mapped in GWASs harbor more than a single causal variant. In order to interpret the underlying mechanism regulating a complex trait of interest in each discovered locus, researchers must assess the magnitude of allelic heterogeneity within the locus. We developed a penalized selection operator for jointly analyzing multiple variants (SOJO) within each mapped locus on the basis of LASSO (least absolute shrinkage and selection operator) regression derived from summary association statistics. We found that, compared to stepwise conditional multiple-SNP analysis, SOJO provided better sensitivity and specificity in predicting the number of alleles associated with complex traits in each locus. SOJO suggested causal variants potentially missed by GCTA-COJO. Compared to using top variants from genome-wide significant loci in GWAS, using SOJO increased the proportion of variance prediction for height by 65% without additional discovery samples or additional loci in the genome. Our empirical results indicate that human height is not only a highly polygenic trait, but also has high allelic heterogeneity within its established hundreds of loci.

Asunto(s)

Estatura/genética , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Alelos , Índice de Masa Corporal , Estudio de Asociación del Genoma Completo , Humanos , Sitios de Carácter Cuantitativo

10.

Cell-level somatic mutation detection from single-cell RNA sequencing.

Vu, Trung Nghia; Nguyen, Ha-Nam; Calza, Stefano; Kalari, Krishna R; Wang, Liewei; Pawitan, Yudi.

Bioinformatics ; 35(22): 4679-4687, 2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-31028395

RESUMEN

MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Mutación , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos

11.

Disease trajectories and mortality among women diagnosed with breast cancer.

Yang, Haomin; Pawitan, Yudi; He, Wei; Eriksson, Louise; Holowko, Natalie; Hall, Per; Czene, Kamila.

Breast Cancer Res ; 21(1): 95, 2019 08 16.

Artículo en Inglés | MEDLINE | ID: mdl-31420051

RESUMEN

PURPOSE: Breast cancer is a common disease with a relatively good prognosis. Therefore, understanding the spectrum of diseases and mortality among breast cancer patients is important, though currently incomplete. We systematically examined the incidence and mortality of all diseases following a breast cancer diagnosis, as well as the sequential association of disease occurrences (trajectories). METHODS: In this national cohort study, 57,501 breast cancer patients (2001-2011) were compared to 564,703 matched women from the general Swedish population and followed until 2012. The matching criteria included year of birth, county of residence, and socioeconomic status. Based on information from the Swedish Patient and Cause of Death Registries, hazard ratios (HR) were estimated for disease incidence and mortality. Conditional logistic regression models were used to identify disease trajectories among breast cancer patients. RESULTS: Among 225 diseases, 45 had HRs > 1.5 and p < 0.0002 when comparing breast cancer patients with the general population. Diseases with highest HRs included lymphedema, radiodermatitis, and neutropenia, which are side effects of surgery, radiotherapy, and chemotherapy. Other than breast cancer, the only significantly increased cause of death was other solid cancers (HR = 1.16, 95% CI = 1.08-1.24). Two main groups of disease trajectories were identified, which suggest menopausal disorders as indicators for other solid cancers, and both neutropenia and dorsalgia as diseases and symptoms preceding death due to breast cancer. CONCLUSIONS: While an increased incidence of other diseases was found among breast cancer patients, increased mortality was only due to other solid cancers. Preventing death due to breast cancer should be a priority to prolong life in breast cancer patients, but closer surveillance of other solid cancers is also needed.

Asunto(s)

Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/mortalidad , Adulto , Edad de Inicio , Anciano , Anciano de 80 o más Años , Neoplasias de la Mama/diagnóstico , Femenino , Humanos , Incidencia , Persona de Mediana Edad , Mortalidad , Oportunidad Relativa , Vigilancia de la Población , Modelos de Riesgos Proporcionales , Sistema de Registros , Factores Socioeconómicos , Suecia/epidemiología

12.

On the relationship between the heritability and the attributable fraction.

Dahlqwist, Elisabeth; Magnusson, Patrik K E; Pawitan, Yudi; Sjölander, Arvid.

Hum Genet ; 138(4): 425-435, 2019 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-30941497

RESUMEN

Heritability is the most commonly used measure of genetic contribution to disease outcomes. Being the fraction of the variance of latent trait liability attributable to genetic factors, heritability of binary traits is a difficult technical concept that is sometimes misinterpreted as the more-easily understandable concept of attributable fraction. In this paper we use the liability threshold model to describe the analytical relationship between heritability and attributable fraction. Towards this end, we consider a hypothetical intervention that is aimed to reduce the genetic risk of the disease for a specified target group of the population. We show how the relation between the heritability and the attributable fraction depends on the disease prevalence, the intervention effect and the size of the target group. We use two real examples to illustrate the practical implications of our theoretical results.

Asunto(s)

Predisposición Genética a la Enfermedad/epidemiología , Modelos Genéticos , Modelos Estadísticos , Herencia Multifactorial , Carácter Cuantitativo Heredable , Causalidad , Enfermedad/etiología , Enfermedad/genética , Humanos , Fenotipo , Densidad de Población , Prevalencia , Factores de Riesgo , Tamaño de la Muestra

13.

Isoform-level gene expression patterns in single-cell RNA-sequencing data.

Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias.

Bioinformatics ; 34(14): 2392-2400, 2018 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-29490015

RESUMEN

Motivation: RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study, we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. Results: We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16 562 isoform-pairs from 4929 genes. Among those, 26% of the discovered patterns were significant (P<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. Finally, the effects of drop-out events and expression levels of isoforms on ISOP's performances were investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoform-level preference, commitment and heterogeneity in single-cell RNA-sequencing data. Availability and implementation: The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Expresión Génica , Isoformas de ARN/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Neoplasias de la Mama/genética , Línea Celular Tumoral , Femenino , Humanos

14.

A fast detection of fusion genes from paired-end RNA-seq data.

Vu, Trung Nghia; Deng, Wenjiang; Trac, Quang Thinh; Calza, Stefano; Hwang, Woochang; Pawitan, Yudi.

BMC Genomics ; 19(1): 786, 2018 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-30382840

RESUMEN

BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.

Asunto(s)

Fusión Génica , ARN/genética , Análisis de Secuencia de ARN , Algoritmos , Línea Celular Tumoral , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética , Proteínas de Fusión Oncogénica/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos

15.

Likelihood-based inference for bounds of causal parameters.

Lee, Woojoo; Sjölander, Arvid; Larsson, Anton; Pawitan, Yudi.

Stat Med ; 37(30): 4695-4706, 2018 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-30155912

RESUMEN

It is a common causal inference problem that, even with theoretically infinite samples, we might be able to only provide bounds for the parameters of interest. This problem occurs naturally, for example, in estimating causal interaction between two risk factors and in estimating the average causal effect using the instrumental variable or Mendelian randomization method. Current procedures include linear programming to get the estimated bounds, plus bootstrapping to get confidence intervals. We describe a likelihood-based procedure that automatically yields the interval estimate from the flat likelihood region and show some theory that allows us to construct confidence intervals from this non-regular likelihood. Finally, we illustrate the procedure with examples from the estimation of causal interaction between two risk factors and the treatment effect under partial compliance.

Asunto(s)

Causalidad , Funciones de Verosimilitud , Intervalos de Confianza , Interpretación Estadística de Datos , Humanos , Modelos Lineales , Modelos Logísticos , Modelos Estadísticos , Cooperación del Paciente/estadística & datos numéricos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Factores de Riesgo , Resultado del Tratamiento

16.

A Critical Look at Entropy-Based Gene-Gene Interaction Measures.

Lee, Woojoo; Sjölander, Arvid; Pawitan, Yudi.

Genet Epidemiol ; 40(5): 416-24, 2016 07.

Artículo en Inglés | MEDLINE | ID: mdl-27229752

RESUMEN

Several entropy-based measures for detecting gene-gene interaction have been proposed recently. It has been argued that the entropy-based measures are preferred because entropy can better capture the nonlinear relationships between genotypes and traits, so they can be useful to detect gene-gene interactions for complex diseases. These suggested measures look reasonable at intuitive level, but so far there has been no detailed characterization of the interactions captured by them. Here we study analytically the properties of some entropy-based measures for detecting gene-gene interactions in detail. The relationship between interactions captured by the entropy-based measures and those of logistic regression models is clarified. In general we find that the entropy-based measures can suffer from a lack of specificity in terms of target parameters, i.e., they can detect uninteresting signals as interactions. Numerical studies are carried out to confirm theoretical findings.

Asunto(s)

Modelos Genéticos , Entropía , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Modelos Logísticos , Fenotipo

17.

A clinical model for identifying the short-term risk of breast cancer.

Eriksson, Mikael; Czene, Kamila; Pawitan, Yudi; Leifland, Karin; Darabi, Hatef; Hall, Per.

Breast Cancer Res ; 19(1): 29, 2017 03 14.

Artículo en Inglés | MEDLINE | ID: mdl-28288659

RESUMEN

BACKGROUND: Most mammography screening programs are not individualized. To efficiently screen for breast cancer, the individual risk of the disease should be determined. We describe a model that could be used at most mammography screening units without adding substantial cost. METHODS: The study was based on the Karma cohort, which included 70,877 participants. Mammograms were collected up to 3 years following the baseline mammogram. A prediction protocol was developed using mammographic density, computer-aided detection of microcalcifications and masses, use of hormone replacement therapy (HRT), family history of breast cancer, menopausal status, age, and body mass index. Relative risks were calculated using conditional logistic regression. Absolute risks were calculated using the iCARE protocol. RESULTS: Comparing women at highest and lowest mammographic density yielded a fivefold higher risk of breast cancer for women at highest density. When adding microcalcifications and masses to the model, high-risk women had a nearly ninefold higher risk of breast cancer than those at lowest risk. In the full model, taking HRT use, family history of breast cancer, and menopausal status into consideration, the AUC reached 0.71. CONCLUSIONS: Measures of mammographic features and information on HRT use, family history of breast cancer, and menopausal status enabled early identification of women within the mammography screening program at such a high risk of breast cancer that additional examinations are warranted. In contrast, women at low risk could probably be screened less intensively.

Asunto(s)

Neoplasias de la Mama/epidemiología , Modelos Teóricos , Adulto , Anciano , Área Bajo la Curva , Densidad de la Mama , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/prevención & control , Estudios de Casos y Controles , Simulación por Computador , Detección Precoz del Cáncer , Femenino , Terapia de Reemplazo de Hormonas/efectos adversos , Humanos , Mamografía , Tamizaje Masivo , Persona de Mediana Edad , Riesgo , Factores de Riesgo , Suecia/epidemiología

18.

Gene-based meta-analysis of genome-wide association studies implicates new loci involved in obesity.

Hägg, Sara; Ganna, Andrea; Van Der Laan, Sander W; Esko, Tonu; Pers, Tune H; Locke, Adam E; Berndt, Sonja I; Justice, Anne E; Kahali, Bratati; Siemelink, Marten A; Pasterkamp, Gerard; Strachan, David P; Speliotes, Elizabeth K; North, Kari E; Loos, Ruth J F; Hirschhorn, Joel N; Pawitan, Yudi; Ingelsson, Erik.

Hum Mol Genet ; 24(23): 6849-60, 2015 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-26376864

RESUMEN

To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index (BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based meta-analysis to identify regions with high allelic heterogeneity to discover additional obesity susceptibility loci. We included GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046 individuals from 43 cohorts in Stage 2, all within the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Each cohort was tested for association between â¼2.4 million (Stage 1) or â¼200 000 (Stage 2) imputed or genotyped single variants and BMI, and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the 'VErsatile Gene-based Association Study' (VEGAS) approach to assign variants to genes and to calculate gene-based P-values based on simulations. The VEGAS method was applied to each cohort separately before a gene-based meta-analysis was performed. In Stage 1, two known (FTO and TMEM18) and six novel (PEX2, MTFR2, SSFA2, IARS2, CEP295 and TXNDC12) loci were associated with BMI (P < 2.8 × 10(-6) for 17 941 gene tests). We confirmed all loci, and six of them were gene-wide significant in Stage 2 alone. We provide biological support for the loci by pathway, expression and methylation analyses. Our results indicate that gene-based meta-analysis of GWAS provides a useful strategy to find loci of interest that were not identified in standard single-marker analyses due to high allelic heterogeneity.

Asunto(s)

Índice de Masa Corporal , Sitios Genéticos , Predisposición Genética a la Enfermedad , Obesidad/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Población Blanca/genética

19.

The ABC model of prostate cancer: A conceptual framework for the design and interpretation of prognostic studies.

Pettersson, Andreas; Gerke, Travis; Fall, Katja; Pawitan, Yudi; Holmberg, Lars; Giovannucci, Edward L; Kantoff, Philip W; Adami, Hans-Olov; Rider, Jennifer R; Mucci, Lorelei A.

Cancer ; 123(9): 1490-1496, 2017 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-28152172

RESUMEN

There has been limited success in identifying prognostic biomarkers in prostate cancer. A partial explanation may be that insufficient emphasis has been put on clearly defining what type of marker or patient category a biomarker study aims to identify and how different cohort characteristics affect the ability to identify such a marker. In this article, the authors put forth the ABC model of prostate cancer, which defines 3 groups of patients with localized disease that an investigator may seek to identify: patients who, within a given time frame, will not develop metastases even if untreated (category A), will not develop metastases because of radical treatment (category B), or will develop metastases despite radical treatment (category C). The authors demonstrate that follow-up time and prostate-specific antigen screening intensity influence the prevalence of patients in categories A, B, and C in a study cohort, and that prognostic markers must be tested in both treated and untreated cohorts to accurately distinguish the 3 groups. The authors suggest that more emphasis should be put on considering these factors when planning, conducting, and interpreting the results from prostate cancer biomarker studies, and propose the ABC model as a framework to aid in that process. Cancer 2017;123:1490-1496. © 2017 American Cancer Society.

Asunto(s)

Prostatectomía , Neoplasias de la Próstata/terapia , Espera Vigilante , Biomarcadores/metabolismo , Supervivencia sin Enfermedad , Humanos , Masculino , Metástasis de la Neoplasia , Pronóstico , Neoplasias de la Próstata/metabolismo

20.

Doubly robust methods for handling confounding by cluster.

Zetterqvist, Johan; Vansteelandt, Stijn; Pawitan, Yudi; Sjölander, Arvid.

Biostatistics ; 17(2): 264-76, 2016 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-26508769

RESUMEN

In clustered designs such as family studies, the exposure-outcome association is usually confounded by both cluster-constant and cluster-varying confounders. The influence of cluster-constant confounders can be eliminated by studying the exposure-outcome association within (conditional on) clusters, but additional regression modeling is usually required to control for observed cluster-varying confounders. A problem is that the working regression model may be misspecified, in which case the estimated within-cluster association may be biased. To reduce sensitivity to model misspecification we propose to augment the standard working model for the outcome with an auxiliary working model for the exposure. We derive a doubly robust conditional generalized estimating equation (DRCGEE) estimator for the within-cluster association. This estimator combines the two models in such a way that it is consistent if either model is correct, not necessarily both. Thus, the DRCGEE estimator gives the researcher two chances instead of only one to make valid inference on the within-cluster association. We have implemented the estimator in an R package and we use it to examine the association between smoking during pregnancy and cognitive abilities in offspring, in a sample of siblings.

Asunto(s)

Interpretación Estadística de Datos , Modelos Estadísticos , Estudios Observacionales como Asunto , Femenino , Humanos , Embarazo , Efectos Tardíos de la Exposición Prenatal/epidemiología , Fumar/efectos adversos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA