Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
PLoS Comput Biol ; 16(2): e1007664, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32097405

RESUMO

Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , RNA-Seq , Algoritmos , Animais , Drosophila melanogaster , Genômica , Humanos , Camundongos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , Linguagens de Programação , Reprodutibilidade dos Testes , Software , Transcriptoma
2.
Am J Epidemiol ; 188(6): 1023-1026, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30649166

RESUMO

Phase 1 of the Human Microbiome Project (HMP) investigated 18 body subsites of 242 healthy American adults to produce the first comprehensive reference for the composition and variation of the "healthy" human microbiome. Publicly available data sets from amplicon sequencing of two 16S ribosomal RNA variable regions, with extensive controlled-access participant data, provide a reference for ongoing microbiome studies. However, utilization of these data sets can be hindered by the complex bioinformatic steps required to access, import, decrypt, and merge the various components in formats suitable for ecological and statistical analysis. The HMP16SData package provides count data for both 16S ribosomal RNA variable regions, integrated with phylogeny, taxonomy, public participant data, and controlled participant data for authorized researchers, using standard integrative Bioconductor data objects. By removing bioinformatic hurdles of data access and management, HMP16SData enables epidemiologists with only basic R skills to quickly analyze HMP data.


Assuntos
Bases de Dados Genéticas/estatística & dados numéricos , Microbiota/fisiologia , RNA Ribossômico 16S/metabolismo , Adolescente , Adulto , Biologia Computacional , Feminino , Humanos , Masculino , Adulto Jovem
3.
Genes Chromosomes Cancer ; 51(12): 1067-78, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22887771

RESUMO

Only a minority of intraductal carcinomas of the breast give rise to stromally invasive disease. We microdissected 206 paraffin blocks representing 116 different cases of low-grade ductal carcinoma in situ (DCIS). Fifty-five were pure DCIS (PD) cases without progression to invasive carcinoma. Sixty-one cases had a small invasive component. DNA was extracted from microdissected sections and hybridized to high-density bacterial artificial chromosome arrays. Array comparative genomic hybridization analysis of 118 hybridized DNA samples yielded data on 69 samples that were suitable for further statistical analysis. This cohort included 20 pure DCIS cases, 25 mixed DCIS (MD), and 24 mixed invasive carcinoma samples. PD cases had a higher frequency of DNA copy number changes than MD cases, and the latter had similar DNA profiles compared to paired invasive carcinomas. Copy number changes on 13 chromosomal arms occurred at different rates in PD versus MD lesions. Eight of 19 candidate genes residing at those loci were confirmed to have differential copy number changes by quantitative PCR. NCOR2/SMRT and NR4A1 (both on 12q), DYNLRB2 (16q), CELSR1, UPK3A, and ST13 (all on 22q) were more frequently amplified in PD. Moreover, NCOR2, NR4A1, and DYNLRB2 showed more frequent copy number losses in MD. GRAP2 (22q) was more often amplified in MD, whereas TAF1C (16q) was more commonly deleted in PD. A multigene model comprising these candidate genes discriminated between PD and MD lesions with high accuracy. These findings suggest that the propensity to invade the stroma may be encoded in the genome of intraductal carcinomas.


Assuntos
Neoplasias da Mama/genética , Mama/patologia , Carcinoma Ductal de Mama/genética , Carcinoma Intraductal não Infiltrante/genética , Variações do Número de Cópias de DNA , Neoplasias da Mama/patologia , Carcinoma Ductal de Mama/patologia , Carcinoma Intraductal não Infiltrante/patologia , Hibridização Genômica Comparativa , Progressão da Doença , Feminino , Humanos
4.
Cancer Invest ; 29(4): 300-7, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21469979

RESUMO

We screened the whole tumor genome to identify DNA copy number gains and losses that discriminate between primary breast carcinomas (MP) and their nodal metastases (ML). Six candidate genes were confirmed by quantitative PCR to have differentially distributed copy number changes. Three of the genes (ERRγ, DDX6, and TIAM1) were more commonly amplified in nodal metastases. Principal component analysis revealed that MP-ML pairs varied markedly in their genomic divergence. The latter was larger in PR-negative tumors. Nodal metastases may form early or late in the development of breast carcinomas and PR-negative tumors may metastasize earlier or are genomically less stable.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Carcinoma Ductal de Mama/genética , Carcinoma Ductal de Mama/secundário , Variações do Número de Cópias de DNA , Regulação Neoplásica da Expressão Gênica , Hibridização Genômica Comparativa , Feminino , Perfilação da Expressão Gênica/métodos , Estudos de Associação Genética , Humanos , Metástase Linfática , Reação em Cadeia da Polimerase , Análise de Componente Principal
5.
J Biomed Biotechnol ; 2011: 860732, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21403910

RESUMO

The main focus in pin-tip (or print-tip) microarray analysis is determining which probes, genes, or oligonucleotides are differentially expressed. Specifically in array comparative genomic hybridization (aCGH) experiments, researchers search for chromosomal imbalances in the genome. To model this data, scientists apply statistical methods to the structure of the experiment and assume that the data consist of the signal plus random noise. In this paper we propose "SmoothArray", a new method to preprocess comparative genomic hybridization (CGH) bacterial artificial chromosome (BAC) arrays and we show the effects on a cancer dataset. As part of our R software package "aCGHplus," this freely available algorithm removes the variation due to the intensity effects, pin/print-tip, the spatial location on the microarray chip, and the relative location from the well plate. removal of this variation improves the downstream analysis and subsequent inferences made on the data. Further, we present measures to evaluate the quality of the dataset according to the arrayer pins, 384-well plates, plate rows, and plate columns. We compare our method against competing methods using several metrics to measure the biological signal. With this novel normalization algorithm and quality control measures, the user can improve their inferences on datasets and pinpoint problems that may arise in their BAC aCGH technology.


Assuntos
Algoritmos , Hibridização Genômica Comparativa/normas , Controle de Qualidade , Mapeamento Cromossômico/métodos , Cromossomos Artificiais Bacterianos/genética , Hibridização Genômica Comparativa/estatística & dados numéricos , Sondas de DNA/genética , Interpretação Estatística de Dados , Genoma Humano/genética , Humanos , Software
6.
Genes Chromosomes Cancer ; 49(9): 791-802, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20607851

RESUMO

The goal of this study was to identify recurrent regions of genomic gain or loss in endometrial cancer of the endometrioid type in the context of racial disparities in mortality for this disease. Array comparative genomic hybridization (aCGH) analysis was performed on 80 frozen primary tumors from the Gynecologic Oncology Group (GOG)-210 bank using the RPCI 19K BAC arrays. The 80 patients included 20 African American (AA) Stage I, 20 White (W) Stage I, 20 African American (AA) Stage IIIC/IV, and 20 White (W) Stage IIIC/IV. A separate subset of 220 endometrial cancers with outcome data was used for validation. A 1.6-Mbp region of gain at 1q23 was identified by aCGH in all AA patients and high grade W patients, but not W low grade patients. In the validation arm of 220 patients copy number gain at this region was validated using FISH and locus specific BACs. The number of AA patients in the validation arm was too small to confirm the aCGH association with racial disparity. Kaplan-Meier curves for survival showed a significant difference for gain at 1q23 versus no gain (log rank P = 0.0014). When subdivided into various groups of risk by stage and grade the survival curves showed a decreased survival for high grade and/or stage tumors, but not for low grade and/or stage endometrioid tumors. Univariate analyses for gain at 1q23 showed a significant association (P = 0.009) with survival. Multivariate analysis for gain at 1q23 did not show a significant association with survival (P = 0.14).


Assuntos
Negro ou Afro-Americano/genética , Hibridização Genômica Comparativa , Neoplasias do Endométrio/etnologia , Neoplasias do Endométrio/genética , População Branca/genética , Adenocarcinoma de Células Claras/etnologia , Adenocarcinoma de Células Claras/genética , Adenocarcinoma de Células Claras/terapia , Carcinoma Endometrioide/etnologia , Carcinoma Endometrioide/genética , Carcinoma Endometrioide/terapia , Cromossomos Humanos Par 1/genética , Cistadenocarcinoma Seroso/etnologia , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/terapia , Neoplasias do Endométrio/terapia , Feminino , Amplificação de Genes , Humanos , Hibridização in Situ Fluorescente , Pessoa de Meia-Idade , Taxa de Sobrevida , Resultado do Tratamento
7.
F1000Res ; 8: 752, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31249680

RESUMO

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor ( https://bioconductor.org/packages/BiocPkgTools).


Assuntos
Mineração de Dados , Software , Metadados
8.
F1000Res ; 7: 1656, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30473781

RESUMO

The importance of bioinformatics, computational biology, and data science in biomedical research continues to grow, driving a need for effective instruction and education. A workshop setting, with lectures and guided hands-on tutorials, is a common approach to teaching practical computational and analytical methods. Here, we detail the process we used to produce high-quality, community-authored educational materials that are available for public consumption and reuse. The coordinated efforts of 17 authors over 10 weeks resulted in 15 workshops available as a website and as a 388-page electronic book. We describe how we utilized cloud infrastructure, GitHub, and a literate programming approach to robustly deliver hands-on tutorials to participants of the annual Bioconductor conference. The scripts, raw and published workshop materials, and cloud machine image are all openly available. Our approach uses free services and software and can be adapted by workshop organizers and authors in other contests with appropriate technical backgrounds.


Assuntos
Biologia Computacional , Educação
9.
Oncotarget ; 7(50): 83160-83176, 2016 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-27825120

RESUMO

Leveraging population-distinct linkage equilibrium (LD) patterns, trans-ethnic follow-up of variants discovered from genome-wide association studies (GWAS) has proved to be useful in facilitating the identification of bona fide causal variants. We previously developed the preferential LD approach, a novel method that successfully identified causal variants driving the GWAS signals within European-descent populations even when the causal variants were only weakly linked with the GWAS-discovered variants. To evaluate the performance of our approach in a trans-ethnic setting, we applied it to follow up breast cancer GWAS hits identified mostly from populations of European ancestry in African Americans (AA). We evaluated 74 breast cancer GWAS variants in 8,315 AA women from the African American Breast Cancer Epidemiology and Risk (AMBER) consortium. Only 27% of them were associated with breast cancer risk at significance level α=0.05, suggesting race-specificity of the identified breast cancer risk loci. We followed up on those replicated GWAS hits in the AMBER consortium utilizing the preferential LD approach, to search for causal variants or better breast cancer markers from the 1000 Genomes variant catalog. Our approach identified stronger breast cancer markers for 80% of the GWAS hits with at least nominal breast cancer association, and in 81% of these cases, the marker identified was among the top 10 of all 1000 Genomes variants in the corresponding locus. The results support trans-ethnic application of the preferential LD approach in search for candidate causal variants, and may have implications for future genetic research of breast cancer in AA women.


Assuntos
Biomarcadores Tumorais/genética , Negro ou Afro-Americano/genética , Neoplasias da Mama/etnologia , Neoplasias da Mama/genética , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Idoso , Neoplasias da Mama/patologia , Estudos de Casos e Controles , Feminino , Seguimentos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Pessoa de Meia-Idade , Fenótipo , Sistema de Registros , Medição de Risco , Fatores de Risco , Fatores de Tempo , Estados Unidos/epidemiologia
10.
Int J Biol Sci ; 11(12): 1363-75, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26681916

RESUMO

Genetic and epigenetic alterations have been identified as to contribute directly or indirectly to the generation of transitional cell carcinoma of the urinary bladder (TCC-UB). We have previously found that amplification of chromosome 6p22 is significantly associated with the muscle-invasive rather than superficial TCC-UB. Here, we demonstrated that Sox4, one of the candidate oncogenes located within the chromosome 6p22 amplicon, confers bladder cancer stem cell (CSC) properties. Down-regulation of Sox4 led to the inhibition of cell migration, colony formation as well as mesenchymal-to-epithelial transition (MET). Interestingly, knockdown of Sox4 also reduced the sphere formation, enriched cell population with high levels of aldehyde dehydrogenase (ALDH (high)) and tumor formation potential. Using gene expression profiling, we further identified novel Sox4 target genes. Last, immunohistochemistry analysis of human bladder tumor tissue microarrays (TMAs) indicated that high Sox4 expression was correlated with advanced cancer stages and poor survival rate. In summary, our data show that Sox4 is an important regulator of the bladder CSC properties and it may serve as a biomarker of the aggressive phenotype in bladder cancer.


Assuntos
Carcinoma de Células de Transição/genética , Células-Tronco Neoplásicas/patologia , Fatores de Transcrição SOXC/genética , Neoplasias da Bexiga Urinária/genética , Biomarcadores Tumorais/genética , Carcinoma de Células de Transição/patologia , Linhagem Celular Tumoral , Cromossomos Humanos Par 6 , Estudos de Coortes , Transição Epitelial-Mesenquimal , Humanos , Prognóstico , Neoplasias da Bexiga Urinária/patologia
11.
Cell Cycle ; 14(1): 146-56, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25602524

RESUMO

The Hippo pathway is an evolutionarily conserved regulator of tissue growth and cell fate during development and regeneration. Conversely, deregulation of the Hippo pathway has been reported in several malignancies. Here, we used integrative functional genomics approaches to identify TAZ, a transcription co-activator and key downstream effector of the Hippo pathway, as an essential driver for the propagation of TNBC malignant phenotype. We further showed in non-transformed human mammary basal epithelial cells that expression of constitutively active TAZ confers cancer stem cell (CSC) traits that are dependent on the TAZ and TEAD interacting domains. In addition, to gain a better understanding of how TAZ functions, we performed genetic-function analysis of TAZ. Significantly, we identified that both the WW and transcriptional activation domains of TAZ are critical for the induced CSC properties as well as tumorigenic potential as manifested in vitro and in human breast cancer xenograft in vivo. Collectively, our data suggest that pharmacological inhibition of TAZ activity may provide a novel means of targeting and eliminating breast CSCs.


Assuntos
Células-Tronco Neoplásicas/metabolismo , Fatores de Transcrição/metabolismo , Neoplasias de Mama Triplo Negativas/patologia , Animais , Transformação Celular Neoplásica , Células Cultivadas , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Transição Epitelial-Mesenquimal , Feminino , Via de Sinalização Hippo , Humanos , Glândulas Mamárias Humanas/citologia , Camundongos , Camundongos Endogâmicos NOD , Camundongos SCID , Proteínas Nucleares/química , Proteínas Nucleares/metabolismo , Domínios e Motivos de Interação entre Proteínas , Proteínas Serina-Treonina Quinases/metabolismo , Estrutura Terciária de Proteína , RNA Interferente Pequeno/metabolismo , Fatores de Transcrição de Domínio TEA , Fatores de Transcrição/antagonistas & inibidores , Fatores de Transcrição/química , Ativação Transcricional , Neoplasias de Mama Triplo Negativas/metabolismo
12.
Cancer Epidemiol Biomarkers Prev ; 24(8): 1207-13, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25990554

RESUMO

BACKGROUND: Whole-exome sequencing (WES) has recently emerged as an appealing approach to systematically study coding variants. However, the requirement for a large amount of high-quality DNA poses a barrier that may limit its application in large cancer epidemiologic studies. We evaluated the performance of WES with low input amount and saliva DNA as an alternative source material. METHODS: Five breast cancer patients were randomly selected from the Pathways Study. From each patient, four samples, including 3 µg, 1 µg, and 0.2 µg blood DNA and 1 µg saliva DNA, were aliquoted for library preparation using the Agilent SureSelect Kit and sequencing using Illumina HiSeq2500. Quality metrics of sequencing and variant calling, as well as concordance of variant calls from the whole exome and 21 known breast cancer genes, were assessed by input amount and DNA source. RESULTS: There was little difference by input amount or DNA source on the quality of sequencing and variant calling. The concordance rate was about 98% for single-nucleotide variant calls and 83% to 86% for short insertion/deletion calls. For the 21 known breast cancer genes, WES based on low input amount and saliva DNA identified the same set variants in samples from a same patient. CONCLUSIONS: Low DNA input amount, as well as saliva DNA, can be used to generate WES data of satisfactory quality. IMPACT: Our findings support the expansion of WES applications in cancer epidemiologic studies where only low DNA amount or saliva samples are available.


Assuntos
DNA/genética , Exoma/genética , Neoplasias/epidemiologia , Análise de Sequência de DNA/métodos , Genômica , Humanos
13.
Adv Bioinformatics ; 2013: 790567, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24223587

RESUMO

Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.

14.
J Cancer Res Clin Oncol ; 137(5): 795-809, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-20680643

RESUMO

PURPOSE: We employed a whole genome tumor profiling approach in an attempt to identify DNA copy number alterations (CNAs) and new candidate genes that are correlated with the metastatic potential of a primary breast carcinoma and with progression at the metastatic site. METHODS: Fifty-four small (≤ 2 cm), high grade, ER-positive, formalin-fixed invasive ductal carcinomas were suitable for whole genome profiling analysis. Twenty-four of them did not form metastases within 5-10 years (unmatched primaries, UP). Thirty tumors had at least one synchronous axillary lymph node metastasis (matched primaries, MP; matched lymph node metastases, ML). Genomic DNA was hybridized to high density (19k) BAC arrays. Statistical analysis revealed differential distributions of CNAs between UP and MP and between MP and ML, respectively. We selected 27 candidate genes for validation experiments using quantitative (Q-)PCR of genomic DNA. For tetraspanin TSPAN1, we studied mRNA expression levels in a separate cohort of primary breast carcinomas and in breast cell lines. RESULTS: Matched primary (MP) tumors had a threefold higher rate of DNA copy number losses compared to UP tumors. In the UP-MP comparison, 186 BACs were differentially amplified or deleted. Most of them were localized to chromosomes 7p, 16q and 18q. In the MP-ML comparison, 131 BACs showed differential CNAs. Most of them were localized to chromosomes 1q and 20. By Q-PCR, seven candidate genes could be confirmed to show differential distributions of CNAs. TSPAN1 was amplified in UP and deleted in MP tumors. The gene was markedly downregulated in ER-negative and high-grade breast cancers. CONCLUSIONS: Metastasizing tumors had a higher rate of deletions, suggesting possible inactivation of metastasis suppressor genes. We provide preliminary evidence that TSPAN1 may be another important breast cancer suppressor gene belonging to the tetraspanin superfamily.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Dosagem de Genes , Proteínas de Membrana/genética , Linhagem Celular Tumoral , Cromossomos Artificiais Bacterianos , Hibridização Genômica Comparativa , Feminino , Genes Supressores de Tumor , Humanos , Metástase Linfática , Reação em Cadeia da Polimerase , RNA Mensageiro/análise , Tetraspaninas
15.
Int J Bioinform Res Appl ; 6(6): 584-93, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-21354964

RESUMO

While the technologies for high dimensional data have been advancing, a lack of adequate visualisation tools to accommodate the results and inability to integrate multiple sources of data has emerged. The move towards multi-disciplinary work and collaborative research impresses the need for visualisation and analysis tools that are platform independent and customisable. iGenomicViewer through the use of customisable tool-tips that may include links and images, allows for a greater level of data integration for genomic data in a variety of formats. The iGenomicViewer is a freely available R software which allows users to generate interactive, platform-independent plots of genomic data.


Assuntos
Genoma , Genômica/métodos , Software , Gráficos por Computador , Bases de Dados Genéticas , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa