Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
J Comput Biol ; 30(4): 376-390, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36445177

RESUMO

Testing and isolation of infectious employees is one of the critical strategies to make the workplace safe during the pandemic for many organizations. Adaptive testing frequency reduces cost while keeping the pandemic under control at the workplace. However, most models aimed at estimating test frequencies were structured for municipalities or large organizations such as university campuses of highly mobile individuals. By contrast, the workplace exhibits distinct characteristics: employee positivity rate may be different from the local community because of rigorous protective measures at workplace, or self-selection of co-workers with common behavioral tendencies for adherence to pandemic mitigation guidelines. Moreover, dual exposure to COVID-19 occurs at work and home that complicates transmission modeling, as does transmission tracing at the workplace. Hence, we developed bi-modal SEIR (Susceptible, Exposed, Infectious, and Removed) model and R-shiny tool that accounts for these differentiating factors to adaptively estimate the testing frequency for workplace. Our tool uses easily measurable parameters: community incidence rate, risks of acquiring infection from community and workplace, workforce size, and sensitivity of testing. Our model is best suited for moderate-sized organizations with low internal transmission rates, no-outward facing employees whose position demands frequent in-person interactions with the public, and low to medium population positivity rates. Simulations revealed that employee behavior in adherence to protective measures at work and in their community, and the onsite workforce size have large effects on testing frequency. Reducing workplace transmission rate through workplace mitigation protocols and higher sensitivity of the test deployed, although to a lesser extent. Furthermore, our simulations showed that sentinel testing leads to only marginal increase in the number of infections even for high community incidence rates, suggesting that this may be a cost-effective approach in future pandemics. We used our model to accurately guide testing regimen for three campuses of the Jackson Laboratory.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Pandemias/prevenção & controle , SARS-CoV-2 , Local de Trabalho
2.
Cancer Res ; 82(22): 4126-4138, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36069866

RESUMO

Patient-derived xenograft (PDX) models are an effective preclinical in vivo platform for testing the efficacy of novel drugs and drug combinations for cancer therapeutics. Here we describe a repository of 79 genomically and clinically annotated lung cancer PDXs available from The Jackson Laboratory that have been extensively characterized for histopathologic features, mutational profiles, gene expression, and copy-number aberrations. Most of the PDXs are models of non-small cell lung cancer (NSCLC), including 37 lung adenocarcinoma (LUAD) and 33 lung squamous cell carcinoma (LUSC) models. Other lung cancer models in the repository include four small cell carcinomas, two large cell neuroendocrine carcinomas, two adenosquamous carcinomas, and one pleomorphic carcinoma. Models with both de novo and acquired resistance to targeted therapies with tyrosine kinase inhibitors are available in the collection. The genomic profiles of the LUAD and LUSC PDX models are consistent with those observed in patient tumors from The Cancer Genome Atlas and previously characterized gene expression-based molecular subtypes. Clinically relevant mutations identified in the original patient tumors were confirmed in engrafted PDX tumors. Treatment studies performed in a subset of the models recapitulated the responses expected on the basis of the observed genomic profiles. These models therefore serve as a valuable preclinical platform for translational cancer research. SIGNIFICANCE: Patient-derived xenografts of lung cancer retain key features observed in the originating patient tumors and show expected responses to treatment with standard-of-care agents, providing experimentally tractable and reproducible models for preclinical investigations.


Assuntos
Adenocarcinoma de Pulmão , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Animais , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Xenoenxertos , Ensaios Antitumorais Modelo de Xenoenxerto , Adenocarcinoma de Pulmão/tratamento farmacológico , Adenocarcinoma de Pulmão/genética , Modelos Animais de Doenças
3.
Alzheimers Dement (Amst) ; 13(1): e12140, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34027015

RESUMO

INTRODUCTION: Genome-wide association studies (GWAS) for late onset Alzheimer's disease (AD) may miss genetic variants relevant for delineating disease stages when using clinically defined case/control as a phenotype due to its loose definition and heterogeneity. METHODS: We use a transfer learning technique to train three-dimensional convolutional neural network (CNN) models based on structural magnetic resonance imaging (MRI) from the screening stage in the Alzheimer's Disease Neuroimaging Initiative consortium to derive image features that reflect AD progression. RESULTS: CNN-derived image phenotypes are significantly associated with fasting metabolites related to early lipid metabolic changes as well as insulin resistance and with genetic variants mapped to candidate genes enriched for amyloid beta degradation, tau phosphorylation, calcium ion binding-dependent synaptic loss, APP-regulated inflammation response, and insulin resistance. DISCUSSION: This is the first attempt to show that non-invasive MRI biomarkers are linked to AD progression characteristics, reinforcing their use in early AD diagnosis and monitoring.

4.
EBioMedicine ; 61: 103030, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33039710

RESUMO

BACKGROUND: Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. METHODS: We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. INTERPRETATION: The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. FUNDING: NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.


Assuntos
Inteligência Artificial , Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Neoplasias Primárias Desconhecidas/diagnóstico , Neoplasias Primárias Desconhecidas/genética , RNA , Software , Algoritmos , Biologia Computacional/normas , Bases de Dados Genéticas , Genômica/métodos , Humanos , Aprendizado de Máquina , Metástase Neoplásica/diagnóstico , Metástase Neoplásica/genética , Redes Neurais de Computação , Reprodutibilidade dos Testes , Fluxo de Trabalho
5.
J Biomol Tech ; 31(2): 66-73, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32382253

RESUMO

Over the last decade, the cost of -omics data creation has decreased 10-fold, whereas the need for analytical support for those data has increased exponentially. Consequently, bioinformaticians face a second wave of challenges: novel applications of existing approaches (e.g., single-cell RNA sequencing), integration of -omics data sets of differing size and scale (e.g., spatial transcriptomics), as well as novel computational and statistical methods, all of which require more sophisticated pipelines and data management. Nonetheless, bioinformatics cores are often asked to operate under primarily a cost-recovery model, with limited institutional support. Seeing the need to assess bioinformatics core operations, the Association of Biomolecular Resource Facilities Genomics Bioinformatics Research Group conducted a survey to answer questions about staffing, services, financial models, and challenges to better understand the challenges bioinformatics core facilities are currently faced with and will need to address going forward. Of the respondent groups, we chose to focus on the survey data from smaller cores, which made up the majority. Although all cores indicated similar challenges in terms of changing technologies and analysis needs, small cores tended to have the added challenge of funding their operations largely through cost-recovery models with heavy administrative burdens.


Assuntos
Pesquisa Biomédica/normas , Biologia Computacional/normas , Genômica/normas , Humanos , Análise de Célula Única/normas
6.
BMC Med Genomics ; 12(1): 92, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31262303

RESUMO

BACKGROUND: Patient-derived xenograft (PDX) models are in vivo models of human cancer that have been used for translational cancer research and therapy selection for individual patients. The Jackson Laboratory (JAX) PDX resource comprises 455 models originating from 34 different primary sites (as of 05/08/2019). The models undergo rigorous quality control and are genomically characterized to identify somatic mutations, copy number alterations, and transcriptional profiles. Bioinformatics workflows for analyzing genomic data obtained from human tumors engrafted in a mouse host (i.e., Patient-Derived Xenografts; PDXs) must address challenges such as discriminating between mouse and human sequence reads and accurately identifying somatic mutations and copy number alterations when paired non-tumor DNA from the patient is not available for comparison. RESULTS: We report here data analysis workflows and guidelines that address these challenges and achieve reliable identification of somatic mutations, copy number alterations, and transcriptomic profiles of tumors from PDX models that lack genomic data from paired non-tumor tissue for comparison. Our workflows incorporate commonly used software and public databases but are tailored to address the specific challenges of PDX genomics data analysis through parameter tuning and customized data filters and result in improved accuracy for the detection of somatic alterations in PDX models. We also report a gene expression-based classifier that can identify EBV-transformed tumors. We validated our analytical approaches using data simulations and demonstrated the overall concordance of the genomic properties of xenograft tumors with data from primary human tumors in The Cancer Genome Atlas (TCGA). CONCLUSIONS: The analysis workflows that we have developed to accurately predict somatic profiles of tumors from PDX models that lack normal tissue for comparison enable the identification of the key oncogenic genomic and expression signatures to support model selection and/or biomarker development in therapeutic studies. A reference implementation of our analysis recommendations is available at https://github.com/TheJacksonLaboratory/PDX-Analysis-Workflows .


Assuntos
Transformação Celular Neoplásica , Genômica/métodos , Neoplasias/genética , Neoplasias/patologia , Fluxo de Trabalho , Animais , Variações do Número de Cópias de DNA , Perfilação da Expressão Gênica , Humanos , Linfoma/genética , Linfoma/patologia , Camundongos , Mutação Puntual , Polimorfismo de Nucleotídeo Único
7.
BMC Bioinformatics ; 20(Suppl 11): 275, 2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31167661

RESUMO

BACKGROUND: The advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification. RESULTS: We have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment. CONCLUSIONS: Based on our study, we found that when marker genes are expressed at fold change of 4 or more, either Seurat or SIMLR algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fold change of 2, choice of the single cell algorithm is dependent on the number of single cells isolated and rarity of cell types to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in the design of single cell experiments.


Assuntos
Biologia Computacional/métodos , Projetos de Pesquisa , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Tamanho da Amostra
8.
Sci Rep ; 8(1): 17937, 2018 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-30560892

RESUMO

The processes by which tumors evolve are essential to the efficacy of treatment, but quantitative understanding of intratumoral dynamics has been limited. Although intratumoral heterogeneity is common, quantification of evolution is difficult from clinical samples because treatment replicates cannot be performed and because matched serial samples are infrequently available. To circumvent these problems we derived and assayed large sets of human triple-negative breast cancer xenografts and cell cultures from two patients, including 86 xenografts from cyclophosphamide, doxorubicin, cisplatin, docetaxel, or vehicle treatment cohorts as well as 45 related cell cultures. We assayed these samples via exome-seq and/or high-resolution droplet digital PCR, allowing us to distinguish complex therapy-induced selection and drift processes among endogenous cancer subclones with cellularity uncertainty <3%. For one patient, we discovered two predominant subclones that were granularly intermixed in all 48 co-derived xenograft samples. These two subclones exhibited differential chemotherapy sensitivity-when xenografts were treated with cisplatin for 3 weeks, the post-treatment volume change was proportional to the post-treatment ratio of subclones on a xenograft-to-xenograft basis. A subsequent cohort in which xenografts were treated with cisplatin, allowed a drug holiday, then treated a second time continued to exhibit this proportionality. In contrast, xenografts from other treatment cohorts, spatially dissected xenograft fragments, and cell cultures evolved in diverse ways but with substantial population bottlenecks. These results show that ecosystems susceptible to successive retreatment can arise spontaneously in breast cancer in spite of a background of irregular subclonal bottlenecks, and our work provides to our knowledge the first quantification of the population genetics of such a system. Intriguingly, in such an ecosystem the ratio of common subclones is predictive of the state of treatment susceptibility, showing how measurements of subclonal heterogeneity could guide treatment for some patients.


Assuntos
Antineoplásicos/farmacologia , Neoplasias da Mama/tratamento farmacológico , Alelos , Animais , Antineoplásicos/uso terapêutico , Biomarcadores Tumorais , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Evolução Clonal/efeitos dos fármacos , Evolução Clonal/genética , Variações do Número de Cópias de DNA/efeitos dos fármacos , Modelos Animais de Doenças , Feminino , Frequência do Gene , Humanos , Camundongos , Mutação , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Ensaios Antitumorais Modelo de Xenoenxerto
9.
BMC Bioinformatics ; 18(Suppl 16): 576, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297310

RESUMO

BACKGROUND: Differential co-expression (DCX) signifies change in degree of co-expression of a set of genes among different biological conditions. It has been used to identify differential co-expression networks or interactomes. Many algorithms have been developed for single-factor differential co-expression analysis and applied in a variety of studies. However, in many studies, the samples are characterized by multiple factors such as genetic markers, clinical variables and treatments. No algorithm or methodology is available for multi-factor analysis of differential co-expression. RESULTS: We developed a novel formulation and a computationally efficient greedy search algorithm called MultiDCoX to perform multi-factor differential co-expression analysis. Simulated data analysis demonstrates that the algorithm can effectively elicit differentially co-expressed (DCX) gene sets and quantify the influence of each factor on co-expression. MultiDCoX analysis of a breast cancer dataset identified interesting biologically meaningful differentially co-expressed (DCX) gene sets along with genetic and clinical factors that influenced the respective differential co-expression. CONCLUSIONS: MultiDCoX is a space and time efficient procedure to identify differentially co-expressed gene sets and successfully identify influence of individual factors on differential co-expression.


Assuntos
Algoritmos , Análise Fatorial , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias da Mama/genética , Quimiocina CXCL13/genética , Simulação por Computador , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Metaloproteinase 1 da Matriz/genética , Mutação/genética , Receptores de Estrogênio/metabolismo , Análise de Sobrevida , Proteína Supressora de Tumor p53/genética
10.
Cancer Inform ; 15: 103-14, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27330269

RESUMO

Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.

11.
Genome Res ; 24(10): 1559-71, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25186909

RESUMO

Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.


Assuntos
Proteína BRCA1/genética , Neoplasias da Mama/genética , Aberrações Cromossômicas , Cromossomos Humanos Par 17/genética , Receptor ErbB-2/genética , Sequência de Bases , Linhagem Celular Tumoral , Feminino , Amplificação de Genes , Duplicação Gênica , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Células MCF-7 , Dados de Sequência Molecular , Análise de Sequência de DNA
12.
Cancer Inform ; 13(Suppl 6): 35-48, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25949096

RESUMO

Driver genes are directly responsible for oncogenesis and identifying them is essential in order to fully understand the mechanisms of cancer. However, it is difficult to delineate them from the larger pool of genes that are deregulated in cancer (ie, passenger genes). In order to address this problem, we developed an approach called TRIAngulating Gene Expression (TRIAGE through clinico-genomic intersects). Here, we present a refinement of this approach incorporating a new scoring methodology to identify putative driver genes that are deregulated in cancer. TRIAGE triangulates - or integrates - three levels of information: gene expression, gene location, and patient survival. First, TRIAGE identifies regions of deregulated expression (ie, expression footprints) by deriving a newly established measure called the Local Singular Value Decomposition (LSVD) score for each locus. Driver genes are then distinguished from passenger genes using dual survival analyses. Incorporating measurements of gene expression and weighting them according to the LSVD weight of each tumor, these analyses are performed using the genes located in significant expression footprints. Here, we first use simulated data to characterize the newly established LSVD score. We then present the results of our application of this refined version of TRIAGE to gene expression data from five cancer types. This refined version of TRIAGE not only allowed us to identify known prominent driver genes, such as MMP1, IL8, and COL1A2, but it also led us to identify several novel ones. These results illustrate that TRIAGE complements existing tools, allows for the identification of genes that drive cancer and could perhaps elucidate potential future targets of novel anticancer therapeutics.

13.
PLoS One ; 8(1): e53562, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23349717

RESUMO

The liver is one of the most sex-dimorphic organs in both oviparous and viviparous animals. In order to understand the molecular basis of the difference between male and female livers, high-throughput RNA-SAGE (serial analysis of gene expression) sequencing was performed for zebrafish livers of both sexes and their transcriptomes were compared. Both sexes had abundantly expressed genes involved in translation, coagulation and lipid metabolism, consistent with the general function of the liver. For sex-biased transcripts, from in addition to the high enrichment of vitellogenin transcripts in spawning female livers, which constituted nearly 80% of total mRNA, it is apparent that the female-biased genes were mostly involved in ribosome/translation, estrogen pathway, lipid transport, etc, while the male-biased genes were enriched for oxidation reduction, carbohydrate metabolism, coagulation, protein transport and localization, etc. Sexual dimorphism on xenobiotic metabolism and anti-oxidation was also noted and it is likely that retinol x receptor (RXR) and liver x receptor (LXR) play central roles in regulating the sexual differences of lipid and cholesterol metabolisms. Consistent with high ribosomal/translational activities in the female liver, female-biased genes were significantly regulated by two important transcription factors, Myc and Mycn. In contrast, Male livers showed activation of transcription factors Ppargc1b, Hnf4a, and Stat4, which regulate lipid and glucose metabolisms and various cellular activities. The transcriptomic responses to sex hormones, 17ß-estradiol (E2) or 11-keto testosterone (KT11), were also investigated in both male and female livers and we found that female livers were relatively insensitive to sex hormone disturbance, while the male livers were readily affected. E2 feminized male liver by up-regulating female-biased transcripts and down-regulating male-biased transcripts. The information obtained in this study provides comprehensive insights into the sexual dimorphism of zebrafish liver transcriptome and will facilitate further development of the zebrafish as a human liver disease model.


Assuntos
Perfilação da Expressão Gênica , Hormônios Esteroides Gonadais/farmacologia , Fígado/efeitos dos fármacos , Fígado/metabolismo , Caracteres Sexuais , Peixe-Zebra/genética , Animais , Feminino , Redes Reguladoras de Genes/efeitos dos fármacos , Humanos , Masculino , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA , Xenobióticos/metabolismo , Peixe-Zebra/metabolismo
14.
BMC Res Notes ; 5: 232, 2012 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-22583621

RESUMO

BACKGROUND: In the field of mouse genetics the advent of technologies like microarray based expression profiling dramatically increased data availability and sensitivity, yet these advanced methods are often vulnerable to the unavoidable heterogeneity of in vivo material and might therefore reflect differentially expressed genes between mouse strains of no relevance to a targeted experiment. The aim of this study was not to elaborate on the usefulness of microarray analysis in general, but to expand our knowledge regarding this potential "background noise" for the widely used Illumina microarray platform surpassing existing data which focused primarily on the adult sensory and nervous system, by analyzing patterns of gene expression at different embryonic stages using wild type strains and modern transgenic models of often non-isogenic backgrounds. RESULTS: Wild type embryos of 11 mouse strains commonly used in transgenic and molecular genetic studies at three developmental time points were subjected to Illumina microarray expression profiling in a strain-by-strain comparison. Our data robustly reflects known gene expression patterns during mid-gestation development. Decreasing diversity of the input tissue and/or increasing strain diversity raised the sensitivity of the array towards the genetic background. Consistent strain sensitivity of some probes was attributed to genetic polymorphisms or probe design related artifacts. CONCLUSION: Our study provides an extensive reference list of gene expression profiling background noise of value to anyone in the field of developmental biology and transgenic research performing microarray expression profiling with the widely used Illumina microarray platform. Probes identified as strain specific background noise further allow for microarray expression profiling on its own to be a valuable tool for establishing genealogies of mouse inbred strains.


Assuntos
Embrião de Mamíferos/metabolismo , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Animais , Sequência de Bases , Camundongos , Camundongos Transgênicos , Especificidade da Espécie
15.
Biol Direct ; 6: 27, 2011 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-21595983

RESUMO

BACKGROUND: False discovery rate (FDR) control is commonly accepted as the most appropriate error control in multiple hypothesis testing problems. The accuracy of FDR estimation depends on the accuracy of the estimation of p-values from each test and validity of the underlying assumptions of the distribution. However, in many practical testing problems such as in genomics, the p-values could be under-estimated or over-estimated for many known or unknown reasons. Consequently, FDR estimation would then be influenced and lose its veracity. RESULTS: We propose a new extrapolative method called Constrained Regression Recalibration (ConReg-R) to recalibrate the empirical p-values by modeling their distribution to improve the FDR estimates. Our ConReg-R method is based on the observation that accurately estimated p-values from true null hypotheses follow uniform distribution and the observed distribution of p-values is indeed a mixture of distributions of p-values from true null hypotheses and true alternative hypotheses. Hence, ConReg-R recalibrates the observed p-values so that they exhibit the properties of an ideal empirical p-value distribution. The proportion of true null hypotheses (π0) and FDR are estimated after the recalibration. CONCLUSIONS: ConReg-R provides an efficient way to improve the FDR estimates. It only requires the p-values from the tests and avoids permutation of the original test data. We demonstrate that the proposed method significantly improves FDR estimation on several gene expression datasets obtained from microarray and RNA-seq experiments.


Assuntos
Interpretação Estatística de Dados , Análise de Regressão , Algoritmos , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/fisiologia , Tamanho da Amostra , Análise de Sequência de RNA
16.
Algorithms Mol Biol ; 5: 23, 2010 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-20507637

RESUMO

BACKGROUND: Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real datasets and combine the resultant biclusters into one unified ranking. RESULTS: In this paper we propose differential co-expression framework and a differential co-expression scoring function to objectively quantify quality or goodness of a bicluster of genes based on the observation that genes in a bicluster are co-expressed in the conditions belonged to the bicluster and not co-expressed in the other conditions. Furthermore, we propose a scoring function to stratify biclusters into three types of co-expression. We used the proposed scoring functions to understand the performance and behavior of the four well established biclustering algorithms on six real datasets from different domains by combining their output into one unified ranking. CONCLUSIONS: Differential co-expression framework is useful to provide quantitative and objective assessment of the goodness of biclusters of co-expressed genes and performance of biclustering algorithms in identifying co-expression biclusters. It also helps to combine the biclusters output by different algorithms into one unified ranking i.e. meta-biclustering.

17.
BMC Bioinformatics ; 11: 247, 2010 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-20462459

RESUMO

BACKGROUND: DNA replication is a fundamental biological process during S phase of cell division. It is initiated from several hundreds of origins along whole chromosome with different firing efficiencies (or frequency of usage). Direct measurement of origin firing efficiency by techniques such as DNA combing are time-consuming and lack the ability to measure all origins. Recent genome-wide study of DNA replication approximated origin firing efficiency by indirectly measuring other quantities related to replication. However, these approximation methods do not reflect properties of origin firing and may lead to inappropriate estimations. RESULTS: In this paper, we develop a probabilistic model - Spanned Firing Time Model (SFTM) to characterize DNA replication process. The proposed model reflects current understandings about DNA replication. Origins in an individual cell may initiate replication randomly within a time window, but the population average exhibits a temporal program with some origins replicated early and the others late. By estimating DNA origin firing time and fork moving velocity from genome-wide time-course S-phase copy number variation data, we could estimate firing efficiency of all origins. The estimated firing efficiency is correlated well with the previous studies in fission and budding yeasts. CONCLUSIONS: The new probabilistic model enables sensitive identification of origins as well as genome-wide estimation of origin firing efficiency. We have successfully estimated firing efficiencies of all origins in S. cerevisiae, S. pombe and human chromosomes 21 and 22.


Assuntos
Variações do Número de Cópias de DNA/genética , Replicação do DNA/genética , Genoma , Genômica/métodos , Modelos Estatísticos , Genoma Fúngico , Genoma Humano , Humanos , Fase S , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética
18.
Int J Data Min Bioinform ; 4(6): 617-38, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-21355498

RESUMO

Kostka and Spang proposed a statistic (KS-statistic) and an algorithm (KS algorithm) to elicit Differentially Co-expressed Gene Sets (DCEGSs) by minimising KS-statistic. We prove that the statistical distributions of KS-statistic under null hypothesis in variance un-normalised and normalised data settings are central and doubly non-central F-distributions, respectively. Based on this analysis, we propose two alternative but equivalent statistics whose null distributions are easier to evaluate. Further, we propose to improve the algorithm by objectively setting the search parameters via maximising the statistical significance of the resultant gene set and pre-filtering the genes by Friendly Neighbours (FNs) algorithm.


Assuntos
Algoritmos , Doença/genética , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos
19.
Transgenic Res ; 19(2): 299-304, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19662507

RESUMO

A tissue-specific transgenic model was employed to test the effects of intron and vector sequences on transgene expression in zebrafish after microinjection. In this model, the 2.3 kb promoter taken from the 5' upstream region of the transcription initiation site of keratin 4 (krt4) was used to drive the enhanced green fluorescence protein (EGFP) reporter gene in a transgenic vector. For assaying the strength of EGFP expression, the effects of including an intron before the EGFP coding region or using different forms of DNA, including circular plasmid, linear full-length plasmid, and the linear transgene coding region without any prokaryotic vector sequence, were tested. After microinjection, the transgene expression was analyzed using transient assays. Consequently, further comparative analysis supported by Fisher's exact test was performed based on the data generated by analyzing the strength of the transgene expression. It was shown that inclusion of an intron in the construct increases the transgene expression in a transient transgenic zebrafish assay. Furthermore, the circular plasmid containing the transgene produced the strongest EGFP expression.


Assuntos
Sequência de Bases , Vetores Genéticos/genética , Plasmídeos/genética , Processamento Pós-Transcricional do RNA , Transgenes/fisiologia , Peixe-Zebra/metabolismo , Animais , Animais Geneticamente Modificados , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Íntrons/genética , Queratina-4/genética , Queratina-4/metabolismo , Microinjeções , Coelhos , Transgenes/genética , Peixe-Zebra/embriologia , Peixe-Zebra/genética , Globinas beta/genética
20.
Nature ; 462(7269): 58-64, 2009 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-19890323

RESUMO

Genomes are organized into high-level three-dimensional structures, and DNA elements separated by long genomic distances can in principle interact functionally. Many transcription factors bind to regulatory DNA elements distant from gene promoters. Although distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Here we describe the development of a new strategy, chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) for the de novo detection of global chromatin interactions, with which we have comprehensively mapped the chromatin interaction network bound by oestrogen receptor alpha (ER-alpha) in the human genome. We found that most high-confidence remote ER-alpha-binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ER-alpha functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.


Assuntos
Cromatina/genética , Cromatina/metabolismo , Receptor alfa de Estrogênio/metabolismo , Genoma Humano/genética , Sítios de Ligação , Linhagem Celular , Imunoprecipitação da Cromatina , Reagentes de Ligações Cruzadas , Formaldeído , Humanos , Regiões Promotoras Genéticas/genética , Ligação Proteica , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Transcrição Gênica , Ativação Transcricional
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA