Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
1.
BMC Bioinformatics ; 24(1): 256, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-37330471

RESUMO

BACKGROUND: Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. RESULTS: We avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. CONCLUSIONS: This new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson.


Assuntos
RNA , Análise de Célula Única , Análise de Sequência de RNA/métodos , Distribuição de Poisson , Análise de Célula Única/métodos , Análise por Conglomerados , RNA/genética , Perfilação da Expressão Gênica/métodos
2.
Osteoarthritis Cartilage ; 27(7): 994-1001, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31002938

RESUMO

OBJECTIVE: Knee osteoarthritis (KOA) is a heterogeneous condition representing a variety of potentially distinct phenotypes. The purpose of this study was to apply innovative machine learning approaches to KOA phenotyping in order to define progression phenotypes that are potentially more responsive to interventions. DESIGN: We used publicly available data from the Foundation for the National Institutes of Health (FNIH) osteoarthritis (OA) Biomarkers Consortium, where radiographic (medial joint space narrowing of ≥0.7 mm), and pain progression (increase of ≥9 Western Ontario and McMaster Universities Osteoarthritis Index [WOMAC] points) were defined at 48 months, as four mutually exclusive outcome groups (none, both, pain only, radiographic only), along with an extensive set of covariates. We applied distance weighted discrimination (DWD), direction-projection-permutation (DiProPerm) testing, and clustering methods to focus on the contrast (z-scores) between those progressing by both criteria ("progressors") and those progressing by neither ("non-progressors"). RESULTS: Using all observations (597 individuals, 59% women, mean age 62 years and BMI 31 kg/m2) and all 73 baseline variables available in the dataset, there was a clear separation among progressors and non-progressors (z = 10.1). Higher z-scores were seen for the magnetic resonance imaging (MRI)-based variables than for demographic/clinical variables or biochemical markers. Baseline variables with the greatest contribution to non-progression at 48 months included WOMAC pain, lateral meniscal extrusion, and serum N-terminal pro-peptide of collagen IIA (PIIANP), while those contributing to progression included bone marrow lesions, osteophytes, medial meniscal extrusion, and urine C-terminal crosslinked telopeptide type II collagen (CTX-II). CONCLUSIONS: Using methods that provide a way to assess numerous variables of different types and scalings simultaneously in relation to an outcome of interest enabled a data-driven approach that identified key variables associated with a progression phenotype.


Assuntos
Variação Biológica da População/genética , Cartilagem Articular/patologia , Aprendizado de Máquina , Osteoartrite do Joelho/genética , Osteoartrite do Joelho/patologia , Idoso , Biomarcadores/sangue , Cartilagem Articular/diagnóstico por imagem , Cartilagem Articular/fisiopatologia , Colágeno Tipo II/sangue , Congressos como Assunto , Bases de Dados Factuais , Progressão da Doença , Feminino , Humanos , Masculino , Meniscos Tibiais/patologia , Pessoa de Meia-Idade , National Institutes of Health (U.S.) , Osteoartrite do Joelho/diagnóstico por imagem , Medição da Dor , Índice de Gravidade de Doença , Estados Unidos
3.
Biometrics ; 74(2): 439-447, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28853138

RESUMO

Genotype eigenvectors are widely used as covariates for control of spurious stratification in genetic association. Significance testing for the accompanying eigenvalues has typically been based on a standard Tracy-Widom limiting distribution for the largest eigenvalue, derived under white-noise assumptions. It is known that even modest local correlation among markers inflates the largest eigenvalues, even in the absence of true stratification. In addition, a few sample eigenvalues may be extreme, creating further complications in accurate testing. We explore several methods to identify appropriate null eigenvalue thresholds, while remaining sensitive to eigenvalues corresponding to population stratification. We introduce a novel block permutation approach, designed to produce an appropriate null eigenvalue distribution by eliminating long-range genomic correlation while preserving local correlation. We also propose a fast approach based on eigenvalue distribution modeling, using a simple fit criterion and the general Marcenko-Pastur equation under a simple discrete eigenvalue model. Block permutation and the model-based approach work well for pure simulations and for data resampled from the 1000 Genomes project. In contrast, we find that the standard approach of computing an "effective" number of markers does not perform well. The performance of the methods is also demonstrated for a motivating example from the International Cystic Fibrosis Consortium.


Assuntos
Estudos de Associação Genética/métodos , Modelos Estatísticos , Simulação por Computador , Fibrose Cística/genética , Interpretação Estatística de Dados , Genômica/métodos , Genótipo , Humanos , Modelos Genéticos
4.
Neuroimage ; 152: 38-49, 2017 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-28246033

RESUMO

A major goal in neuroscience is to understand the neural pathways underlying human behavior. We introduce the recently developed Joint and Individual Variation Explained (JIVE) method to the neuroscience community to simultaneously analyze imaging and behavioral data from the Human Connectome Project. Motivated by recent computational and theoretical improvements in the JIVE approach, we simultaneously explore the joint and individual variation between and within imaging and behavioral data. In particular, we demonstrate that JIVE is an effective and efficient approach for integrating task fMRI and behavioral variables using three examples: one example where task variation is strong, one where task variation is weak and a reference case where the behavior is not directly related to the image. These examples are provided to visualize the different levels of signal found in the joint variation including working memory regions in the image data and accuracy and response time from the in-task behavioral variables. Joint analysis provides insights not available from conventional single block decomposition methods such as Singular Value Decomposition. Additionally, the joint variation estimated by JIVE appears to more clearly identify the working memory regions than Partial Least Squares (PLS), while Canonical Correlation Analysis (CCA) gives grossly overfit results. The individual variation in JIVE captures the behavior unrelated signals such as a background activation that is spatially homogeneous and activation in the default mode network. The information revealed by this individual variation is not examined in traditional methods such as CCA and PLS. We suggest that JIVE can be used as an alternative to PLS and CCA to improve estimation of the signal common to two or more datasets and reveal novel insights into the signal unique to each dataset.


Assuntos
Encéfalo/anatomia & histologia , Encéfalo/fisiologia , Conectoma/métodos , Adulto , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Processamento de Sinais Assistido por Computador , Software , Adulto Jovem
5.
Osteoarthritis Cartilage ; 24(4): 640-6, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26620089

RESUMO

INTRODUCTION: Hip shape is a risk factor for the development of hip osteoarthritis (OA), and current methods to assess hip shape from radiographs are limited; therefore this study explored current and novel methods to assess hip shape. METHODS: Data from a prior case-control study nested in the Johnston County OA Project were used, including 382 hips (from 342 individuals). Hips were classified by radiographic hip OA (RHOA) status as RHOA cases (baseline Kellgren Lawrence grade [KLG] 0 or 1, follow-up [mean 6 years] KLG ≥ 2) or controls (KLG = 0 or 1 at both baseline and follow-up). Proximal femur shape was assessed using a 60-point model as previously described. The current analysis explored commonly used principal component analysis (PCA), as well as novel statistical methodologies suited to high dimension low sample size settings (Distance Weighted Discrimination [DWD] and Distance Projection Permutation [DiProPerm] hypothesis testing) to assess differences between cases and controls. RESULTS: Using these novel methodologies, we were able to better characterize morphologic differences by sex and race. In particular, the proximal femurs of African American women demonstrated significantly different shapes between cases and controls, implying an important role for sex and race in the development of RHOA. Notably, discrimination was improved with the use of DWD and DiProPerm compared to PCA. CONCLUSIONS: DWD with DiProPerm significance testing provides improved discrimination of variation in hip morphology between groups, and enables subgroup analyses even under small sample sizes.


Assuntos
Negro ou Afro-Americano/estatística & dados numéricos , Articulação do Quadril/patologia , Osteoartrite do Quadril/etnologia , Osteoartrite do Quadril/patologia , Idoso , Estudos de Casos e Controles , Interpretação Estatística de Dados , Feminino , Fêmur/diagnóstico por imagem , Fêmur/patologia , Articulação do Quadril/diagnóstico por imagem , Humanos , Masculino , Pessoa de Meia-Idade , North Carolina/epidemiologia , Osteoartrite do Quadril/diagnóstico por imagem , Análise de Componente Principal , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia/métodos , Fatores de Risco , Fatores Sexuais
6.
Nucleic Acids Res ; 42(14): e113, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25030904

RESUMO

High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Isoformas de RNA/metabolismo , Análise de Sequência de RNA/métodos , Software , Processamento Alternativo , Carcinoma de Células Escamosas/genética , Proteínas de Transporte/genética , Análise por Conglomerados , Éxons , Genes p16 , Loci Gênicos , Neoplasias de Cabeça e Pescoço/genética , Peptídeos e Proteínas de Sinalização Intracelular , Calicreínas/genética , Neoplasias Pulmonares/genética , Proteínas Nucleares , Carcinoma de Células Escamosas de Cabeça e Pescoço
7.
Stat Sin ; 26(4): 1747-1770, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28018116

RESUMO

The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

8.
Nucleic Acids Res ; 41(19): e178, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23935067

RESUMO

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Software , Artefatos , Linhagem Celular Tumoral , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos , Exoma , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
9.
Osteoarthritis Cartilage ; 22(10): 1657-67, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25278075

RESUMO

OBJECTIVE: To assess 3D morphological variations and local and systemic biomarker profiles in subjects with a diagnosis of temporomandibular joint osteoarthritis (TMJ OA). DESIGN: Twenty-eight patients with long-term TMJ OA (39.9 ± 16 years), 12 patients at initial diagnosis of OA (47.4 ± 16.1 years), and 12 healthy controls (41.8 ± 12.2 years) were recruited. All patients were female and had cone beam CT scans taken. TMJ arthrocentesis and venipuncture were performed on 12 OA and 12 age-matched healthy controls. Serum and synovial fluid levels of 50 biomarkers of arthritic inflammation were quantified by protein microarrays. Shape Analysis MANCOVA tested statistical correlations between biomarker levels and variations in condylar morphology. RESULTS: Compared with healthy controls, the OA average condyle was significantly smaller in all dimensions except its anterior surface, with areas indicative of bone resorption along the articular surface, particularly in the lateral pole. Synovial fluid levels of ANG, GDF15, TIMP-1, CXCL16, MMP-3 and MMP-7 were significantly correlated with bone apposition of the condylar anterior surface. Serum levels of ENA-78, MMP-3, PAI-1, VE-Cadherin, VEGF, GM-CSF, TGFßb1, IFNγg, TNFαa, IL-1αa, and IL-6 were significantly correlated with flattening of the lateral pole. Expression levels of ANG were significantly correlated with the articular morphology in healthy controls. CONCLUSIONS: Bone resorption at the articular surface, particularly at the lateral pole was statistically significant at initial diagnosis of TMJ OA. Synovial fluid levels of ANG, GDF15, TIMP-1, CXCL16, MMP-3 and MMP-7 were correlated with bone apposition. Serum levels of ENA-78, MMP-3, PAI-1, VE-Cadherin, VEGF, GM-CSF, TGFß1, IFNγ, TNFα, IL-1α, and IL-6 were correlated with bone resorption.


Assuntos
Mediadores da Inflamação/metabolismo , Osteoartrite/diagnóstico por imagem , Líquido Sinovial/metabolismo , Transtornos da Articulação Temporomandibular/diagnóstico por imagem , Articulação Temporomandibular/diagnóstico por imagem , Adulto , Biomarcadores/metabolismo , Reabsorção Óssea/diagnóstico por imagem , Reabsorção Óssea/etiologia , Estudos de Casos e Controles , Tomografia Computadorizada de Feixe Cônico , Feminino , Humanos , Imageamento Tridimensional , Pessoa de Meia-Idade , Osteoartrite/complicações , Transtornos da Articulação Temporomandibular/complicações , Adulto Jovem
10.
J Comput Graph Stat ; 33(2): 736-748, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39170642

RESUMO

For measuring the strength of visually-observed subpopulation differences, the Population Difference Criterion is proposed to assess the statistical significance of visually observed subpopulation differences. It addresses the following challenges: in high-dimensional contexts, distributional models can be dubious; in high-signal contexts, conventional permutation tests give poor pairwise comparisons. We also make two other contributions: Based on a careful analysis we find that a balanced permutation approach is more powerful in high-signal contexts than conventional permutations. Another contribution is the quantification of uncertainty due to permutation variation via a bootstrap confidence interval. The practical usefulness of these ideas is illustrated in the comparison of subpopulations of modern cancer data.

11.
Osteoarthr Cartil Open ; 6(3): 100508, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39238657

RESUMO

Objective: To investigate the relationship between measures of radiographic joint space width (JSW) loss and magnetic resonance imaging (MRI)-based cartilage thickness loss in the medial weight-bearing region of the tibiofemoral joint over 12-24 months. To stratify this relationship by clinically meaningful subgroups (sex and pain status). Design: We analyzed a subset of knees (n â€‹= â€‹256) from the Osteoarthritis Initiative (OAI) likely in early stage OA based on joint space narrowing (JSN) measurements. Natural logarithm transformation was used to approximate near normal distributions for JSW loss. Pearson Correlation coefficients described the relationship between ln-transformed JSW loss and several versions of deep learning-derived MRI-based cartilage thickness loss parameters (minimum, maximum, and mean) in subregions of the femoral condyle, tibial plateau, and combined femoral and tibial regions. Linear mixed-effects models evaluated the associations between the ln-transformed radiographic and MRI-derived measures including potential confounders. Results: We found weak correlations between ln-transformed JSW loss and MRI-based cartilage thickness ranging from R â€‹= â€‹-0.13 (p â€‹= â€‹0.20) to R â€‹= â€‹0.26 (p â€‹< â€‹0.01). Correlations were higher (still poor) among females compared to males and painful compared to non-painful knees. Model results showed weak associations for nearly all MRI-based measures, ranging from no association to ß (95% CI) â€‹= â€‹0.25 (0.11, 0.39). Associations were higher among females compared to males and minimal differences between painful and non-painful knees. Conclusions: Despite its recommended use in disease-modifying OA drug clinical trials, results suggest that JSW loss is an ineffective proxy measure of cartilage thickness loss over 12-24 months and within a localized region of the tibiofemoral joint.

12.
J Med Imaging (Bellingham) ; 11(4): 044006, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39185474

RESUMO

Purpose: We address the need for effective stain domain adaptation methods in histopathology to enhance the performance of downstream computational tasks, particularly classification. Existing methods exhibit varying strengths and weaknesses, prompting the exploration of a different approach. The focus is on improving stain color consistency, expanding the stain domain scope, and minimizing the domain gap between image batches. Approach: We introduce a new domain adaptation method, Stain simultaneous augmentation and normalization (SAN), designed to adjust the distribution of stain colors to align with a target distribution. Stain SAN combines the merits of established methods, such as stain normalization, stain augmentation, and stain mix-up, while mitigating their inherent limitations. Stain SAN adapts stain domains by resampling stain color matrices from a well-structured target distribution. Results: Experimental evaluations of cross-dataset clinical estrogen receptor status classification demonstrate the efficacy of Stain SAN and its superior performance compared with existing stain adaptation methods. In one case, the area under the curve (AUC) increased by 11.4%. Overall, our results clearly show the improvements made over the history of the development of these methods culminating with substantial enhancement provided by Stain SAN. Furthermore, we show that Stain SAN achieves results comparable with the state-of-the-art generative adversarial network-based approach without requiring separate training for stain adaptation or access to the target domain during training. Stain SAN's performance is on par with HistAuGAN, proving its effectiveness and computational efficiency. Conclusions: Stain SAN emerges as a promising solution, addressing the potential shortcomings of contemporary stain adaptation methods. Its effectiveness is underscored by notable improvements in the context of clinical estrogen receptor status classification, where it achieves the best AUC performance. The findings endorse Stain SAN as a robust approach for stain domain adaptation in histopathology images, with implications for advancing computational tasks in the field.

13.
ArXiv ; 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38800658

RESUMO

Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep-learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort that contains 160 hematoxylin and eosin whole-slide images of primary melanomas (86) and nevi (74). We randomly assigned 80% (134) as a training set and built an in-house deep-learning method to allow for classification, at the slide level, of nevi and melanomas. The proposed method performed well on the other 20% (26) test dataset; the accuracy of the slide classification task was 92.3% and our model also performed well in terms of predicting the region of interest annotated by the pathologists, showing excellent performance of our model on melanocytic skin tumors. Even though we tested the experiments on the skin tumor dataset, our work could also be extended to other medical image detection problems to benefit the clinical evaluation and diagnosis of different tumors.

14.
Cancers (Basel) ; 16(13)2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-39001357

RESUMO

High intratumoral heterogeneity is thought to be a poor prognostic indicator. However, the source of heterogeneity may also be important, as genomic heterogeneity is not always reflected in histologic or 'visual' heterogeneity. We aimed to develop a predictor of histologic heterogeneity and evaluate its association with outcomes and molecular heterogeneity. We used VGG16 to train an image classifier to identify unique, patient-specific visual features in 1655 breast tumors (5907 core images) from the Carolina Breast Cancer Study (CBCS). Extracted features for images, as well as the epithelial and stromal image components, were hierarchically clustered, and visual heterogeneity was defined as a greater distance between images from the same patient. We assessed the association between visual heterogeneity, clinical features, and DNA-based molecular heterogeneity using generalized linear models, and we used Cox models to estimate the association between visual heterogeneity and tumor recurrence. Basal-like and ER-negative tumors were more likely to have low visual heterogeneity, as were the tumors from younger and Black women. Less heterogeneous tumors had a higher risk of recurrence (hazard ratio = 1.62, 95% confidence interval = 1.22-2.16), and were more likely to come from patients whose tumors were comprised of only one subclone or had a TP53 mutation. Associations were similar regardless of whether the image was based on stroma, epithelium, or both. Histologic heterogeneity adds complementary information to commonly used molecular indicators, with low heterogeneity predicting worse outcomes. Future work integrating multiple sources of heterogeneity may provide a more comprehensive understanding of tumor progression.

15.
Bioinformatics ; 28(8): 1182-3, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22368246

RESUMO

UNLABELLED: R/DWD is an extensible package for classification. It is built based on a recently developed powerful classification method called distance weighted discrimination (DWD). DWD is related to, and has been shown to be superior to, the support vector machine in situations that are fundamental to bioinformatics, such as very high dimensional data. DWD has proven to be very useful for several fundamental bioinformatics tasks, including classification, data visualization and removal of biases, such as batch effects. Earlier DWD implementations, however, relied on Matlab, which is not free and requires a license. The major contribution of the R/DWD package is an implementation that is completely in R and thus can be used without any requirements for licensing or software purchase. In addition, R/DWD also provides efficient solvers for second-order-cone-programming and quadratic programming. AVAILABILITY AND IMPLEMENTATION: The package is freely available from cran.r-project.org.


Assuntos
Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Simulação por Computador , Máquina de Vetores de Suporte
16.
Commun Biol ; 6(1): 179, 2023 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-36797360

RESUMO

Model systems are an essential resource in cancer research. They simulate effects that we can infer into humans, but come at a risk of inaccurately representing human biology. This inaccuracy can lead to inconclusive experiments or misleading results, urging the need for an improved process for translating model system findings into human-relevant data. We present a process for applying joint dimension reduction (jDR) to horizontally integrate gene expression data across model systems and human tumor cohorts. We then use this approach to combine human TCGA gene expression data with data from human cancer cell lines and mouse model tumors. By identifying the aspects of genomic variation joint-acting across cohorts, we demonstrate how predictive modeling and clinical biomarkers from model systems can be improved.


Assuntos
Neoplasias , Transcriptoma , Animais , Camundongos , Humanos , Neoplasias/genética , Neoplasias/patologia , Perfilação da Expressão Gênica , Biomarcadores
17.
Front Genet ; 14: 1093326, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37007972

RESUMO

Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/.

18.
Res Sq ; 2023 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-36798423

RESUMO

Background: Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. Results: We avoid the crude approximations entailed by such aggregation through proposing an Independent Poisson Distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. Conclusions: This new method has multiple advantages, including (1) no needfor prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson .

19.
Ann Appl Stat ; 17(4): 2924-2943, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38046186

RESUMO

In The Cancer Genome Atlas (TCGA) data set, there are many interesting nonlinear dependencies between pairs of genes that reveal important relationships and subtypes of cancer. Such genomic data analysis requires a rapid, powerful and interpretable detection process, especially in a high-dimensional environment. We study the nonlinear patterns among the expression of pairs of genes from TCGA using a powerful tool called Binary Expansion Testing. We find many nonlinear patterns, some of which are driven by known cancer subtypes, some of which are novel.

20.
NPJ Breast Cancer ; 9(1): 92, 2023 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-37952058

RESUMO

Approaches for rapidly identifying patients at high risk of early breast cancer recurrence are needed. Image-based methods for prescreening hematoxylin and eosin (H&E) stained tumor slides could offer temporal and financial efficiency. We evaluated a data set of 704 1-mm tumor core H&E images (2-4 cores per case), corresponding to 202 participants (101 who recurred; 101 non-recurrent matched on age and follow-up time) from breast cancers diagnosed between 2008-2012 in the Carolina Breast Cancer Study. We leveraged deep learning to extract image information and trained a model to identify recurrence. Cross-validation accuracy for predicting recurrence was 62.4% [95% CI: 55.7, 69.1], similar to grade (65.8% [95% CI: 59.3, 72.3]) and ER status (66.3% [95% CI: 59.8, 72.8]). Interestingly, 70% (19/27) of early-recurrent low-intermediate grade tumors were identified by our image model. Relative to existing markers, image-based analyses provide complementary information for predicting early recurrence.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa