Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 256, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-37330471

RESUMO

BACKGROUND: Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. RESULTS: We avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. CONCLUSIONS: This new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson.


Assuntos
RNA , Análise de Célula Única , Análise de Sequência de RNA/métodos , Distribuição de Poisson , Análise de Célula Única/métodos , Análise por Conglomerados , RNA/genética , Perfilação da Expressão Gênica/métodos
2.
Osteoarthritis Cartilage ; 27(7): 994-1001, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31002938

RESUMO

OBJECTIVE: Knee osteoarthritis (KOA) is a heterogeneous condition representing a variety of potentially distinct phenotypes. The purpose of this study was to apply innovative machine learning approaches to KOA phenotyping in order to define progression phenotypes that are potentially more responsive to interventions. DESIGN: We used publicly available data from the Foundation for the National Institutes of Health (FNIH) osteoarthritis (OA) Biomarkers Consortium, where radiographic (medial joint space narrowing of ≥0.7 mm), and pain progression (increase of ≥9 Western Ontario and McMaster Universities Osteoarthritis Index [WOMAC] points) were defined at 48 months, as four mutually exclusive outcome groups (none, both, pain only, radiographic only), along with an extensive set of covariates. We applied distance weighted discrimination (DWD), direction-projection-permutation (DiProPerm) testing, and clustering methods to focus on the contrast (z-scores) between those progressing by both criteria ("progressors") and those progressing by neither ("non-progressors"). RESULTS: Using all observations (597 individuals, 59% women, mean age 62 years and BMI 31 kg/m2) and all 73 baseline variables available in the dataset, there was a clear separation among progressors and non-progressors (z = 10.1). Higher z-scores were seen for the magnetic resonance imaging (MRI)-based variables than for demographic/clinical variables or biochemical markers. Baseline variables with the greatest contribution to non-progression at 48 months included WOMAC pain, lateral meniscal extrusion, and serum N-terminal pro-peptide of collagen IIA (PIIANP), while those contributing to progression included bone marrow lesions, osteophytes, medial meniscal extrusion, and urine C-terminal crosslinked telopeptide type II collagen (CTX-II). CONCLUSIONS: Using methods that provide a way to assess numerous variables of different types and scalings simultaneously in relation to an outcome of interest enabled a data-driven approach that identified key variables associated with a progression phenotype.


Assuntos
Variação Biológica da População/genética , Cartilagem Articular/patologia , Aprendizado de Máquina , Osteoartrite do Joelho/genética , Osteoartrite do Joelho/patologia , Idoso , Biomarcadores/sangue , Cartilagem Articular/diagnóstico por imagem , Cartilagem Articular/fisiopatologia , Colágeno Tipo II/sangue , Congressos como Assunto , Bases de Dados Factuais , Progressão da Doença , Feminino , Humanos , Masculino , Meniscos Tibiais/patologia , Pessoa de Meia-Idade , National Institutes of Health (U.S.) , Osteoartrite do Joelho/diagnóstico por imagem , Medição da Dor , Índice de Gravidade de Doença , Estados Unidos
3.
Biometrics ; 74(2): 439-447, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28853138

RESUMO

Genotype eigenvectors are widely used as covariates for control of spurious stratification in genetic association. Significance testing for the accompanying eigenvalues has typically been based on a standard Tracy-Widom limiting distribution for the largest eigenvalue, derived under white-noise assumptions. It is known that even modest local correlation among markers inflates the largest eigenvalues, even in the absence of true stratification. In addition, a few sample eigenvalues may be extreme, creating further complications in accurate testing. We explore several methods to identify appropriate null eigenvalue thresholds, while remaining sensitive to eigenvalues corresponding to population stratification. We introduce a novel block permutation approach, designed to produce an appropriate null eigenvalue distribution by eliminating long-range genomic correlation while preserving local correlation. We also propose a fast approach based on eigenvalue distribution modeling, using a simple fit criterion and the general Marcenko-Pastur equation under a simple discrete eigenvalue model. Block permutation and the model-based approach work well for pure simulations and for data resampled from the 1000 Genomes project. In contrast, we find that the standard approach of computing an "effective" number of markers does not perform well. The performance of the methods is also demonstrated for a motivating example from the International Cystic Fibrosis Consortium.


Assuntos
Estudos de Associação Genética/métodos , Modelos Estatísticos , Simulação por Computador , Fibrose Cística/genética , Interpretação Estatística de Dados , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos
4.
Neuroimage ; 152: 38-49, 2017 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-28246033

RESUMO

A major goal in neuroscience is to understand the neural pathways underlying human behavior. We introduce the recently developed Joint and Individual Variation Explained (JIVE) method to the neuroscience community to simultaneously analyze imaging and behavioral data from the Human Connectome Project. Motivated by recent computational and theoretical improvements in the JIVE approach, we simultaneously explore the joint and individual variation between and within imaging and behavioral data. In particular, we demonstrate that JIVE is an effective and efficient approach for integrating task fMRI and behavioral variables using three examples: one example where task variation is strong, one where task variation is weak and a reference case where the behavior is not directly related to the image. These examples are provided to visualize the different levels of signal found in the joint variation including working memory regions in the image data and accuracy and response time from the in-task behavioral variables. Joint analysis provides insights not available from conventional single block decomposition methods such as Singular Value Decomposition. Additionally, the joint variation estimated by JIVE appears to more clearly identify the working memory regions than Partial Least Squares (PLS), while Canonical Correlation Analysis (CCA) gives grossly overfit results. The individual variation in JIVE captures the behavior unrelated signals such as a background activation that is spatially homogeneous and activation in the default mode network. The information revealed by this individual variation is not examined in traditional methods such as CCA and PLS. We suggest that JIVE can be used as an alternative to PLS and CCA to improve estimation of the signal common to two or more datasets and reveal novel insights into the signal unique to each dataset.


Assuntos
Encéfalo/anatomia & histologia , Encéfalo/fisiologia , Conectoma/métodos , Adulto , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Processamento de Sinais Assistido por Computador , Software , Adulto Jovem
5.
Osteoarthritis Cartilage ; 24(4): 640-6, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26620089

RESUMO

INTRODUCTION: Hip shape is a risk factor for the development of hip osteoarthritis (OA), and current methods to assess hip shape from radiographs are limited; therefore this study explored current and novel methods to assess hip shape. METHODS: Data from a prior case-control study nested in the Johnston County OA Project were used, including 382 hips (from 342 individuals). Hips were classified by radiographic hip OA (RHOA) status as RHOA cases (baseline Kellgren Lawrence grade [KLG] 0 or 1, follow-up [mean 6 years] KLG ≥ 2) or controls (KLG = 0 or 1 at both baseline and follow-up). Proximal femur shape was assessed using a 60-point model as previously described. The current analysis explored commonly used principal component analysis (PCA), as well as novel statistical methodologies suited to high dimension low sample size settings (Distance Weighted Discrimination [DWD] and Distance Projection Permutation [DiProPerm] hypothesis testing) to assess differences between cases and controls. RESULTS: Using these novel methodologies, we were able to better characterize morphologic differences by sex and race. In particular, the proximal femurs of African American women demonstrated significantly different shapes between cases and controls, implying an important role for sex and race in the development of RHOA. Notably, discrimination was improved with the use of DWD and DiProPerm compared to PCA. CONCLUSIONS: DWD with DiProPerm significance testing provides improved discrimination of variation in hip morphology between groups, and enables subgroup analyses even under small sample sizes.


Assuntos
Negro ou Afro-Americano/estatística & dados numéricos , Articulação do Quadril/patologia , Osteoartrite do Quadril/etnologia , Osteoartrite do Quadril/patologia , Idoso , Estudos de Casos e Controles , Interpretação Estatística de Dados , Feminino , Fêmur/diagnóstico por imagem , Fêmur/patologia , Articulação do Quadril/diagnóstico por imagem , Humanos , Masculino , Pessoa de Meia-Idade , North Carolina/epidemiologia , Osteoartrite do Quadril/diagnóstico por imagem , Análise de Componente Principal , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia/métodos , Fatores de Risco , Fatores Sexuais
6.
Nucleic Acids Res ; 42(14): e113, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25030904

RESUMO

High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Isoformas de RNA/metabolismo , Análise de Sequência de RNA/métodos , Software , Processamento Alternativo , Carcinoma de Células Escamosas/genética , Proteínas de Transporte/genética , Análise por Conglomerados , Éxons , Genes p16 , Loci Gênicos , Neoplasias de Cabeça e Pescoço/genética , Peptídeos e Proteínas de Sinalização Intracelular , Calicreínas/genética , Neoplasias Pulmonares/genética , Proteínas Nucleares , Carcinoma de Células Escamosas de Cabeça e Pescoço
7.
Stat Sin ; 26(4): 1747-1770, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28018116

RESUMO

The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

8.
Nucleic Acids Res ; 41(19): e178, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23935067

RESUMO

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Software , Artefatos , Linhagem Celular Tumoral , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos , Exoma , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
9.
Osteoarthritis Cartilage ; 22(10): 1657-67, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25278075

RESUMO

OBJECTIVE: To assess 3D morphological variations and local and systemic biomarker profiles in subjects with a diagnosis of temporomandibular joint osteoarthritis (TMJ OA). DESIGN: Twenty-eight patients with long-term TMJ OA (39.9 ± 16 years), 12 patients at initial diagnosis of OA (47.4 ± 16.1 years), and 12 healthy controls (41.8 ± 12.2 years) were recruited. All patients were female and had cone beam CT scans taken. TMJ arthrocentesis and venipuncture were performed on 12 OA and 12 age-matched healthy controls. Serum and synovial fluid levels of 50 biomarkers of arthritic inflammation were quantified by protein microarrays. Shape Analysis MANCOVA tested statistical correlations between biomarker levels and variations in condylar morphology. RESULTS: Compared with healthy controls, the OA average condyle was significantly smaller in all dimensions except its anterior surface, with areas indicative of bone resorption along the articular surface, particularly in the lateral pole. Synovial fluid levels of ANG, GDF15, TIMP-1, CXCL16, MMP-3 and MMP-7 were significantly correlated with bone apposition of the condylar anterior surface. Serum levels of ENA-78, MMP-3, PAI-1, VE-Cadherin, VEGF, GM-CSF, TGFßb1, IFNγg, TNFαa, IL-1αa, and IL-6 were significantly correlated with flattening of the lateral pole. Expression levels of ANG were significantly correlated with the articular morphology in healthy controls. CONCLUSIONS: Bone resorption at the articular surface, particularly at the lateral pole was statistically significant at initial diagnosis of TMJ OA. Synovial fluid levels of ANG, GDF15, TIMP-1, CXCL16, MMP-3 and MMP-7 were correlated with bone apposition. Serum levels of ENA-78, MMP-3, PAI-1, VE-Cadherin, VEGF, GM-CSF, TGFß1, IFNγ, TNFα, IL-1α, and IL-6 were correlated with bone resorption.


Assuntos
Mediadores da Inflamação/metabolismo , Osteoartrite/diagnóstico por imagem , Líquido Sinovial/metabolismo , Transtornos da Articulação Temporomandibular/diagnóstico por imagem , Articulação Temporomandibular/diagnóstico por imagem , Adulto , Biomarcadores/metabolismo , Reabsorção Óssea/diagnóstico por imagem , Reabsorção Óssea/etiologia , Estudos de Casos e Controles , Tomografia Computadorizada de Feixe Cônico , Feminino , Humanos , Imageamento Tridimensional , Pessoa de Meia-Idade , Osteoartrite/complicações , Transtornos da Articulação Temporomandibular/complicações , Adulto Jovem
10.
ArXiv ; 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38800658

RESUMO

Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep-learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort that contains 160 hematoxylin and eosin whole-slide images of primary melanomas (86) and nevi (74). We randomly assigned 80% (134) as a training set and built an in-house deep-learning method to allow for classification, at the slide level, of nevi and melanomas. The proposed method performed well on the other 20% (26) test dataset; the accuracy of the slide classification task was 92.3% and our model also performed well in terms of predicting the region of interest annotated by the pathologists, showing excellent performance of our model on melanocytic skin tumors. Even though we tested the experiments on the skin tumor dataset, our work could also be extended to other medical image detection problems to benefit the clinical evaluation and diagnosis of different tumors.

11.
Cancers (Basel) ; 16(13)2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-39001357

RESUMO

High intratumoral heterogeneity is thought to be a poor prognostic indicator. However, the source of heterogeneity may also be important, as genomic heterogeneity is not always reflected in histologic or 'visual' heterogeneity. We aimed to develop a predictor of histologic heterogeneity and evaluate its association with outcomes and molecular heterogeneity. We used VGG16 to train an image classifier to identify unique, patient-specific visual features in 1655 breast tumors (5907 core images) from the Carolina Breast Cancer Study (CBCS). Extracted features for images, as well as the epithelial and stromal image components, were hierarchically clustered, and visual heterogeneity was defined as a greater distance between images from the same patient. We assessed the association between visual heterogeneity, clinical features, and DNA-based molecular heterogeneity using generalized linear models, and we used Cox models to estimate the association between visual heterogeneity and tumor recurrence. Basal-like and ER-negative tumors were more likely to have low visual heterogeneity, as were the tumors from younger and Black women. Less heterogeneous tumors had a higher risk of recurrence (hazard ratio = 1.62, 95% confidence interval = 1.22-2.16), and were more likely to come from patients whose tumors were comprised of only one subclone or had a TP53 mutation. Associations were similar regardless of whether the image was based on stroma, epithelium, or both. Histologic heterogeneity adds complementary information to commonly used molecular indicators, with low heterogeneity predicting worse outcomes. Future work integrating multiple sources of heterogeneity may provide a more comprehensive understanding of tumor progression.

12.
Bioinformatics ; 28(8): 1182-3, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22368246

RESUMO

UNLABELLED: R/DWD is an extensible package for classification. It is built based on a recently developed powerful classification method called distance weighted discrimination (DWD). DWD is related to, and has been shown to be superior to, the support vector machine in situations that are fundamental to bioinformatics, such as very high dimensional data. DWD has proven to be very useful for several fundamental bioinformatics tasks, including classification, data visualization and removal of biases, such as batch effects. Earlier DWD implementations, however, relied on Matlab, which is not free and requires a license. The major contribution of the R/DWD package is an implementation that is completely in R and thus can be used without any requirements for licensing or software purchase. In addition, R/DWD also provides efficient solvers for second-order-cone-programming and quadratic programming. AVAILABILITY AND IMPLEMENTATION: The package is freely available from cran.r-project.org.


Assuntos
Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Simulação por Computador , Máquina de Vetores de Suporte
13.
Commun Biol ; 6(1): 179, 2023 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-36797360

RESUMO

Model systems are an essential resource in cancer research. They simulate effects that we can infer into humans, but come at a risk of inaccurately representing human biology. This inaccuracy can lead to inconclusive experiments or misleading results, urging the need for an improved process for translating model system findings into human-relevant data. We present a process for applying joint dimension reduction (jDR) to horizontally integrate gene expression data across model systems and human tumor cohorts. We then use this approach to combine human TCGA gene expression data with data from human cancer cell lines and mouse model tumors. By identifying the aspects of genomic variation joint-acting across cohorts, we demonstrate how predictive modeling and clinical biomarkers from model systems can be improved.


Assuntos
Neoplasias , Transcriptoma , Animais , Camundongos , Humanos , Neoplasias/genética , Neoplasias/patologia , Perfilação da Expressão Gênica , Biomarcadores
14.
Res Sq ; 2023 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-36798423

RESUMO

Background: Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. Results: We avoid the crude approximations entailed by such aggregation through proposing an Independent Poisson Distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. Conclusions: This new method has multiple advantages, including (1) no needfor prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson .

15.
Front Genet ; 14: 1093326, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37007972

RESUMO

Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/.

16.
Ann Appl Stat ; 17(4): 2924-2943, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38046186

RESUMO

In The Cancer Genome Atlas (TCGA) data set, there are many interesting nonlinear dependencies between pairs of genes that reveal important relationships and subtypes of cancer. Such genomic data analysis requires a rapid, powerful and interpretable detection process, especially in a high-dimensional environment. We study the nonlinear patterns among the expression of pairs of genes from TCGA using a powerful tool called Binary Expansion Testing. We find many nonlinear patterns, some of which are driven by known cancer subtypes, some of which are novel.

17.
NPJ Breast Cancer ; 9(1): 92, 2023 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-37952058

RESUMO

Approaches for rapidly identifying patients at high risk of early breast cancer recurrence are needed. Image-based methods for prescreening hematoxylin and eosin (H&E) stained tumor slides could offer temporal and financial efficiency. We evaluated a data set of 704 1-mm tumor core H&E images (2-4 cores per case), corresponding to 202 participants (101 who recurred; 101 non-recurrent matched on age and follow-up time) from breast cancers diagnosed between 2008-2012 in the Carolina Breast Cancer Study. We leveraged deep learning to extract image information and trained a model to identify recurrence. Cross-validation accuracy for predicting recurrence was 62.4% [95% CI: 55.7, 69.1], similar to grade (65.8% [95% CI: 59.3, 72.3]) and ER status (66.3% [95% CI: 59.8, 72.8]). Interestingly, 70% (19/27) of early-recurrent low-intermediate grade tumors were identified by our image model. Relative to existing markers, image-based analyses provide complementary information for predicting early recurrence.

18.
Osteoarthr Cartil Open ; 5(1): 100334, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36817090

RESUMO

Objective: To employ novel methodologies to identify phenotypes in knee OA based on variation among three baseline data blocks: 1) femoral cartilage thickness, 2) tibial cartilage thickness, and 3) participant characteristics and clinical features. Methods: Baseline data were from 3321 Osteoarthritis Initiative (OAI) participants with available cartilage thickness maps (6265 knees) and 77 clinical features. Cartilage maps were obtained from 3D DESS MR images using a deep-learning based segmentation approach and an atlas-based analysis developed by our group. Angle-based Joint and Individual Variation Explained (AJIVE) was used to capture and quantify variation, both shared among multiple data blocks and individual to each block, and to determine statistical significance. Results: Three major modes of variation were shared across the three data blocks. Mode 1 reflected overall thicker cartilage among men, those with higher education, and greater knee forces; Mode 2 showed associations between worsening Kellgren-Lawrence Grade, medial cartilage thinning, and worsening symptoms; and Mode 3 contrasted lateral and medial-predominant cartilage loss associated with BMI and malalignment. Each data block also demonstrated individual, independent modes of variation consistent with the known discordance between symptoms and structure in knee OA and reflecting the importance of features such as physical function, symptoms, and comorbid conditions independent of structural damage. Conclusions: This exploratory analysis, combining the rich OAI dataset with novel methods for determining and visualizing cartilage thickness, reinforces known associations in knee OA while providing insights into the potential for data integration in knee OA phenotyping.

19.
J Clin Oncol ; 41(26): 4192-4199, 2023 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-37672882

RESUMO

PURPOSE: To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes luminal A, luminal B, HER2-enriched, and basal-like. METHODS: A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. RESULTS: The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION: Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.

20.
BMC Bioinformatics ; 13: 221, 2012 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-22946927

RESUMO

BACKGROUND: Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. RESULTS: Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. CONCLUSION: ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Software , Calibragem , Genoma , Modelos Logísticos , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA