Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
PLoS Comput Biol ; 16(2): e1007664, 2020 02.
Article in English | MEDLINE | ID: mdl-32097405

ABSTRACT

Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.


Subject(s)
Computational Biology/methods , Gene Expression Profiling , RNA-Seq , Algorithms , Animals , Drosophila melanogaster , Genomics , Humans , Mice , Models, Statistical , Pattern Recognition, Automated , Programming Languages , Reproducibility of Results , Software , Transcriptome
2.
Am J Epidemiol ; 188(6): 1023-1026, 2019 06 01.
Article in English | MEDLINE | ID: mdl-30649166

ABSTRACT

Phase 1 of the Human Microbiome Project (HMP) investigated 18 body subsites of 242 healthy American adults to produce the first comprehensive reference for the composition and variation of the "healthy" human microbiome. Publicly available data sets from amplicon sequencing of two 16S ribosomal RNA variable regions, with extensive controlled-access participant data, provide a reference for ongoing microbiome studies. However, utilization of these data sets can be hindered by the complex bioinformatic steps required to access, import, decrypt, and merge the various components in formats suitable for ecological and statistical analysis. The HMP16SData package provides count data for both 16S ribosomal RNA variable regions, integrated with phylogeny, taxonomy, public participant data, and controlled participant data for authorized researchers, using standard integrative Bioconductor data objects. By removing bioinformatic hurdles of data access and management, HMP16SData enables epidemiologists with only basic R skills to quickly analyze HMP data.


Subject(s)
Databases, Genetic/statistics & numerical data , Microbiota/physiology , RNA, Ribosomal, 16S/metabolism , Adolescent , Adult , Computational Biology , Female , Humans , Male , Young Adult
3.
Genes Chromosomes Cancer ; 51(12): 1067-78, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22887771

ABSTRACT

Only a minority of intraductal carcinomas of the breast give rise to stromally invasive disease. We microdissected 206 paraffin blocks representing 116 different cases of low-grade ductal carcinoma in situ (DCIS). Fifty-five were pure DCIS (PD) cases without progression to invasive carcinoma. Sixty-one cases had a small invasive component. DNA was extracted from microdissected sections and hybridized to high-density bacterial artificial chromosome arrays. Array comparative genomic hybridization analysis of 118 hybridized DNA samples yielded data on 69 samples that were suitable for further statistical analysis. This cohort included 20 pure DCIS cases, 25 mixed DCIS (MD), and 24 mixed invasive carcinoma samples. PD cases had a higher frequency of DNA copy number changes than MD cases, and the latter had similar DNA profiles compared to paired invasive carcinomas. Copy number changes on 13 chromosomal arms occurred at different rates in PD versus MD lesions. Eight of 19 candidate genes residing at those loci were confirmed to have differential copy number changes by quantitative PCR. NCOR2/SMRT and NR4A1 (both on 12q), DYNLRB2 (16q), CELSR1, UPK3A, and ST13 (all on 22q) were more frequently amplified in PD. Moreover, NCOR2, NR4A1, and DYNLRB2 showed more frequent copy number losses in MD. GRAP2 (22q) was more often amplified in MD, whereas TAF1C (16q) was more commonly deleted in PD. A multigene model comprising these candidate genes discriminated between PD and MD lesions with high accuracy. These findings suggest that the propensity to invade the stroma may be encoded in the genome of intraductal carcinomas.


Subject(s)
Breast Neoplasms/genetics , Breast/pathology , Carcinoma, Ductal, Breast/genetics , Carcinoma, Intraductal, Noninfiltrating/genetics , DNA Copy Number Variations , Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/pathology , Comparative Genomic Hybridization , Disease Progression , Female , Humans
4.
Cancer Invest ; 29(4): 300-7, 2011 May.
Article in English | MEDLINE | ID: mdl-21469979

ABSTRACT

We screened the whole tumor genome to identify DNA copy number gains and losses that discriminate between primary breast carcinomas (MP) and their nodal metastases (ML). Six candidate genes were confirmed by quantitative PCR to have differentially distributed copy number changes. Three of the genes (ERRγ, DDX6, and TIAM1) were more commonly amplified in nodal metastases. Principal component analysis revealed that MP-ML pairs varied markedly in their genomic divergence. The latter was larger in PR-negative tumors. Nodal metastases may form early or late in the development of breast carcinomas and PR-negative tumors may metastasize earlier or are genomically less stable.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/genetics , Carcinoma, Ductal, Breast/secondary , DNA Copy Number Variations , Gene Expression Regulation, Neoplastic , Comparative Genomic Hybridization , Female , Gene Expression Profiling/methods , Genetic Association Studies , Humans , Lymphatic Metastasis , Polymerase Chain Reaction , Principal Component Analysis
5.
J Biomed Biotechnol ; 2011: 860732, 2011.
Article in English | MEDLINE | ID: mdl-21403910

ABSTRACT

The main focus in pin-tip (or print-tip) microarray analysis is determining which probes, genes, or oligonucleotides are differentially expressed. Specifically in array comparative genomic hybridization (aCGH) experiments, researchers search for chromosomal imbalances in the genome. To model this data, scientists apply statistical methods to the structure of the experiment and assume that the data consist of the signal plus random noise. In this paper we propose "SmoothArray", a new method to preprocess comparative genomic hybridization (CGH) bacterial artificial chromosome (BAC) arrays and we show the effects on a cancer dataset. As part of our R software package "aCGHplus," this freely available algorithm removes the variation due to the intensity effects, pin/print-tip, the spatial location on the microarray chip, and the relative location from the well plate. removal of this variation improves the downstream analysis and subsequent inferences made on the data. Further, we present measures to evaluate the quality of the dataset according to the arrayer pins, 384-well plates, plate rows, and plate columns. We compare our method against competing methods using several metrics to measure the biological signal. With this novel normalization algorithm and quality control measures, the user can improve their inferences on datasets and pinpoint problems that may arise in their BAC aCGH technology.


Subject(s)
Algorithms , Comparative Genomic Hybridization/standards , Quality Control , Chromosome Mapping/methods , Chromosomes, Artificial, Bacterial/genetics , Comparative Genomic Hybridization/statistics & numerical data , DNA Probes/genetics , Data Interpretation, Statistical , Genome, Human/genetics , Humans , Software
6.
Genes Chromosomes Cancer ; 49(9): 791-802, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20607851

ABSTRACT

The goal of this study was to identify recurrent regions of genomic gain or loss in endometrial cancer of the endometrioid type in the context of racial disparities in mortality for this disease. Array comparative genomic hybridization (aCGH) analysis was performed on 80 frozen primary tumors from the Gynecologic Oncology Group (GOG)-210 bank using the RPCI 19K BAC arrays. The 80 patients included 20 African American (AA) Stage I, 20 White (W) Stage I, 20 African American (AA) Stage IIIC/IV, and 20 White (W) Stage IIIC/IV. A separate subset of 220 endometrial cancers with outcome data was used for validation. A 1.6-Mbp region of gain at 1q23 was identified by aCGH in all AA patients and high grade W patients, but not W low grade patients. In the validation arm of 220 patients copy number gain at this region was validated using FISH and locus specific BACs. The number of AA patients in the validation arm was too small to confirm the aCGH association with racial disparity. Kaplan-Meier curves for survival showed a significant difference for gain at 1q23 versus no gain (log rank P = 0.0014). When subdivided into various groups of risk by stage and grade the survival curves showed a decreased survival for high grade and/or stage tumors, but not for low grade and/or stage endometrioid tumors. Univariate analyses for gain at 1q23 showed a significant association (P = 0.009) with survival. Multivariate analysis for gain at 1q23 did not show a significant association with survival (P = 0.14).


Subject(s)
Black or African American/genetics , Comparative Genomic Hybridization , Endometrial Neoplasms/ethnology , Endometrial Neoplasms/genetics , White People/genetics , Adenocarcinoma, Clear Cell/ethnology , Adenocarcinoma, Clear Cell/genetics , Adenocarcinoma, Clear Cell/therapy , Carcinoma, Endometrioid/ethnology , Carcinoma, Endometrioid/genetics , Carcinoma, Endometrioid/therapy , Chromosomes, Human, Pair 1/genetics , Cystadenocarcinoma, Serous/ethnology , Cystadenocarcinoma, Serous/genetics , Cystadenocarcinoma, Serous/therapy , Endometrial Neoplasms/therapy , Female , Gene Amplification , Humans , In Situ Hybridization, Fluorescence , Middle Aged , Survival Rate , Treatment Outcome
7.
F1000Res ; 8: 752, 2019.
Article in English | MEDLINE | ID: mdl-31249680

ABSTRACT

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor ( https://bioconductor.org/packages/BiocPkgTools).


Subject(s)
Data Mining , Software , Metadata
8.
F1000Res ; 7: 1656, 2018.
Article in English | MEDLINE | ID: mdl-30473781

ABSTRACT

The importance of bioinformatics, computational biology, and data science in biomedical research continues to grow, driving a need for effective instruction and education. A workshop setting, with lectures and guided hands-on tutorials, is a common approach to teaching practical computational and analytical methods. Here, we detail the process we used to produce high-quality, community-authored educational materials that are available for public consumption and reuse. The coordinated efforts of 17 authors over 10 weeks resulted in 15 workshops available as a website and as a 388-page electronic book. We describe how we utilized cloud infrastructure, GitHub, and a literate programming approach to robustly deliver hands-on tutorials to participants of the annual Bioconductor conference. The scripts, raw and published workshop materials, and cloud machine image are all openly available. Our approach uses free services and software and can be adapted by workshop organizers and authors in other contests with appropriate technical backgrounds.


Subject(s)
Computational Biology , Education
9.
Oncotarget ; 7(50): 83160-83176, 2016 Dec 13.
Article in English | MEDLINE | ID: mdl-27825120

ABSTRACT

Leveraging population-distinct linkage equilibrium (LD) patterns, trans-ethnic follow-up of variants discovered from genome-wide association studies (GWAS) has proved to be useful in facilitating the identification of bona fide causal variants. We previously developed the preferential LD approach, a novel method that successfully identified causal variants driving the GWAS signals within European-descent populations even when the causal variants were only weakly linked with the GWAS-discovered variants. To evaluate the performance of our approach in a trans-ethnic setting, we applied it to follow up breast cancer GWAS hits identified mostly from populations of European ancestry in African Americans (AA). We evaluated 74 breast cancer GWAS variants in 8,315 AA women from the African American Breast Cancer Epidemiology and Risk (AMBER) consortium. Only 27% of them were associated with breast cancer risk at significance level α=0.05, suggesting race-specificity of the identified breast cancer risk loci. We followed up on those replicated GWAS hits in the AMBER consortium utilizing the preferential LD approach, to search for causal variants or better breast cancer markers from the 1000 Genomes variant catalog. Our approach identified stronger breast cancer markers for 80% of the GWAS hits with at least nominal breast cancer association, and in 81% of these cases, the marker identified was among the top 10 of all 1000 Genomes variants in the corresponding locus. The results support trans-ethnic application of the preferential LD approach in search for candidate causal variants, and may have implications for future genetic research of breast cancer in AA women.


Subject(s)
Biomarkers, Tumor/genetics , Black or African American/genetics , Breast Neoplasms/ethnology , Breast Neoplasms/genetics , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Aged , Breast Neoplasms/pathology , Case-Control Studies , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Middle Aged , Phenotype , Registries , Risk Assessment , Risk Factors , Time Factors , United States/epidemiology
10.
Int J Biol Sci ; 11(12): 1363-75, 2015.
Article in English | MEDLINE | ID: mdl-26681916

ABSTRACT

Genetic and epigenetic alterations have been identified as to contribute directly or indirectly to the generation of transitional cell carcinoma of the urinary bladder (TCC-UB). We have previously found that amplification of chromosome 6p22 is significantly associated with the muscle-invasive rather than superficial TCC-UB. Here, we demonstrated that Sox4, one of the candidate oncogenes located within the chromosome 6p22 amplicon, confers bladder cancer stem cell (CSC) properties. Down-regulation of Sox4 led to the inhibition of cell migration, colony formation as well as mesenchymal-to-epithelial transition (MET). Interestingly, knockdown of Sox4 also reduced the sphere formation, enriched cell population with high levels of aldehyde dehydrogenase (ALDH (high)) and tumor formation potential. Using gene expression profiling, we further identified novel Sox4 target genes. Last, immunohistochemistry analysis of human bladder tumor tissue microarrays (TMAs) indicated that high Sox4 expression was correlated with advanced cancer stages and poor survival rate. In summary, our data show that Sox4 is an important regulator of the bladder CSC properties and it may serve as a biomarker of the aggressive phenotype in bladder cancer.


Subject(s)
Carcinoma, Transitional Cell/genetics , Neoplastic Stem Cells/pathology , SOXC Transcription Factors/genetics , Urinary Bladder Neoplasms/genetics , Biomarkers, Tumor/genetics , Carcinoma, Transitional Cell/pathology , Cell Line, Tumor , Chromosomes, Human, Pair 6 , Cohort Studies , Epithelial-Mesenchymal Transition , Humans , Prognosis , Urinary Bladder Neoplasms/pathology
11.
Cell Cycle ; 14(1): 146-56, 2015.
Article in English | MEDLINE | ID: mdl-25602524

ABSTRACT

The Hippo pathway is an evolutionarily conserved regulator of tissue growth and cell fate during development and regeneration. Conversely, deregulation of the Hippo pathway has been reported in several malignancies. Here, we used integrative functional genomics approaches to identify TAZ, a transcription co-activator and key downstream effector of the Hippo pathway, as an essential driver for the propagation of TNBC malignant phenotype. We further showed in non-transformed human mammary basal epithelial cells that expression of constitutively active TAZ confers cancer stem cell (CSC) traits that are dependent on the TAZ and TEAD interacting domains. In addition, to gain a better understanding of how TAZ functions, we performed genetic-function analysis of TAZ. Significantly, we identified that both the WW and transcriptional activation domains of TAZ are critical for the induced CSC properties as well as tumorigenic potential as manifested in vitro and in human breast cancer xenograft in vivo. Collectively, our data suggest that pharmacological inhibition of TAZ activity may provide a novel means of targeting and eliminating breast CSCs.


Subject(s)
Neoplastic Stem Cells/metabolism , Transcription Factors/metabolism , Triple Negative Breast Neoplasms/pathology , Animals , Cell Transformation, Neoplastic , Cells, Cultured , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Epithelial Cells/cytology , Epithelial Cells/metabolism , Epithelial-Mesenchymal Transition , Female , Hippo Signaling Pathway , Humans , Mammary Glands, Human/cytology , Mice , Mice, Inbred NOD , Mice, SCID , Nuclear Proteins/chemistry , Nuclear Proteins/metabolism , Protein Interaction Domains and Motifs , Protein Serine-Threonine Kinases/metabolism , Protein Structure, Tertiary , RNA, Small Interfering/metabolism , TEA Domain Transcription Factors , Transcription Factors/antagonists & inhibitors , Transcription Factors/chemistry , Transcriptional Activation , Triple Negative Breast Neoplasms/metabolism
12.
Cancer Epidemiol Biomarkers Prev ; 24(8): 1207-13, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25990554

ABSTRACT

BACKGROUND: Whole-exome sequencing (WES) has recently emerged as an appealing approach to systematically study coding variants. However, the requirement for a large amount of high-quality DNA poses a barrier that may limit its application in large cancer epidemiologic studies. We evaluated the performance of WES with low input amount and saliva DNA as an alternative source material. METHODS: Five breast cancer patients were randomly selected from the Pathways Study. From each patient, four samples, including 3 µg, 1 µg, and 0.2 µg blood DNA and 1 µg saliva DNA, were aliquoted for library preparation using the Agilent SureSelect Kit and sequencing using Illumina HiSeq2500. Quality metrics of sequencing and variant calling, as well as concordance of variant calls from the whole exome and 21 known breast cancer genes, were assessed by input amount and DNA source. RESULTS: There was little difference by input amount or DNA source on the quality of sequencing and variant calling. The concordance rate was about 98% for single-nucleotide variant calls and 83% to 86% for short insertion/deletion calls. For the 21 known breast cancer genes, WES based on low input amount and saliva DNA identified the same set variants in samples from a same patient. CONCLUSIONS: Low DNA input amount, as well as saliva DNA, can be used to generate WES data of satisfactory quality. IMPACT: Our findings support the expansion of WES applications in cancer epidemiologic studies where only low DNA amount or saliva samples are available.


Subject(s)
DNA/genetics , Exome/genetics , Neoplasms/epidemiology , Sequence Analysis, DNA/methods , Genomics , Humans
13.
Adv Bioinformatics ; 2013: 790567, 2013.
Article in English | MEDLINE | ID: mdl-24223587

ABSTRACT

Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.

14.
J Cancer Res Clin Oncol ; 137(5): 795-809, 2011 May.
Article in English | MEDLINE | ID: mdl-20680643

ABSTRACT

PURPOSE: We employed a whole genome tumor profiling approach in an attempt to identify DNA copy number alterations (CNAs) and new candidate genes that are correlated with the metastatic potential of a primary breast carcinoma and with progression at the metastatic site. METHODS: Fifty-four small (≤ 2 cm), high grade, ER-positive, formalin-fixed invasive ductal carcinomas were suitable for whole genome profiling analysis. Twenty-four of them did not form metastases within 5-10 years (unmatched primaries, UP). Thirty tumors had at least one synchronous axillary lymph node metastasis (matched primaries, MP; matched lymph node metastases, ML). Genomic DNA was hybridized to high density (19k) BAC arrays. Statistical analysis revealed differential distributions of CNAs between UP and MP and between MP and ML, respectively. We selected 27 candidate genes for validation experiments using quantitative (Q-)PCR of genomic DNA. For tetraspanin TSPAN1, we studied mRNA expression levels in a separate cohort of primary breast carcinomas and in breast cell lines. RESULTS: Matched primary (MP) tumors had a threefold higher rate of DNA copy number losses compared to UP tumors. In the UP-MP comparison, 186 BACs were differentially amplified or deleted. Most of them were localized to chromosomes 7p, 16q and 18q. In the MP-ML comparison, 131 BACs showed differential CNAs. Most of them were localized to chromosomes 1q and 20. By Q-PCR, seven candidate genes could be confirmed to show differential distributions of CNAs. TSPAN1 was amplified in UP and deleted in MP tumors. The gene was markedly downregulated in ER-negative and high-grade breast cancers. CONCLUSIONS: Metastasizing tumors had a higher rate of deletions, suggesting possible inactivation of metastasis suppressor genes. We provide preliminary evidence that TSPAN1 may be another important breast cancer suppressor gene belonging to the tetraspanin superfamily.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , Gene Dosage , Membrane Proteins/genetics , Cell Line, Tumor , Chromosomes, Artificial, Bacterial , Comparative Genomic Hybridization , Female , Genes, Tumor Suppressor , Humans , Lymphatic Metastasis , Polymerase Chain Reaction , RNA, Messenger/analysis , Tetraspanins
15.
Int J Bioinform Res Appl ; 6(6): 584-93, 2010.
Article in English | MEDLINE | ID: mdl-21354964

ABSTRACT

While the technologies for high dimensional data have been advancing, a lack of adequate visualisation tools to accommodate the results and inability to integrate multiple sources of data has emerged. The move towards multi-disciplinary work and collaborative research impresses the need for visualisation and analysis tools that are platform independent and customisable. iGenomicViewer through the use of customisable tool-tips that may include links and images, allows for a greater level of data integration for genomic data in a variety of formats. The iGenomicViewer is a freely available R software which allows users to generate interactive, platform-independent plots of genomic data.


Subject(s)
Genome , Genomics/methods , Software , Computer Graphics , Databases, Genetic , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL