RESUMO
BACKGROUND: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. RESULTS: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a "subpopulation aware" 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). CONCLUSIONS: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment.
Assuntos
Genoma de Planta , Polimorfismo de Nucleotídeo Único , Fluxo de Trabalho , Melhoramento Vegetal , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
The tropical oligotrophic oceanic areas are characterized by high water transparency and annual solar radiation. Under these conditions, a large number of phylogenetically diverse mesozooplankton species living in the surface waters (neuston) are found to be blue pigmented. In the present study, we focused on understanding the metabolic and genetic basis of the observed blue phenotype functional equivalence between the blue-pigmented organisms from the phylum Arthropoda, subclass Copepoda (Acartia fossae) and the phylum Chordata, class Appendicularia (Oikopleura dioica) in the Red Sea. Previous studies have shown that carotenoid-protein complexes are responsible for blue coloration in crustaceans. Therefore, we performed carotenoid metabolic profiling using both targeted and nontargeted (high-resolution mass spectrometry) approaches in four different blue-pigmented genera of copepods and one blue-pigmented species of appendicularia. Astaxanthin was found to be the principal carotenoid in all the species. The pathway analysis showed that all the species can synthesize astaxanthin from ß-carotene, ingested from dietary sources, via 3-hydroxyechinenone, canthaxanthin, zeaxanthin, adonirubin or adonixanthin. Further, using de novo assembled transcriptome of blue A. fossae (subclass Copepoda), we identified highly expressed homologous ß-carotene hydroxylase enzymes and putative carotenoid-binding proteins responsible for astaxanthin formation and the blue phenotype. In blue O. dioica (class Appendicularia), corresponding putative genes were identified from the reference genome. Collectively, our data provide molecular evidences for the bioconversion and accumulation of blue astaxanthin-protein complexes underpinning the observed ecological functional equivalence and adaptive convergence among neustonic mesozooplankton.
Assuntos
Copépodes/genética , Metaboloma , Transcriptoma , Urocordados/genética , Sequência de Aminoácidos , Animais , Copépodes/química , Oceano Índico , Lipocalinas/química , Oxigenases de Função Mista/química , Dados de Sequência Molecular , Filogenia , Pigmentação , Urocordados/química , Xantofilas/químicaRESUMO
Understanding and exploiting genetic diversity is a key factor for the productive and stable production of rice. Here, we utilize 73 high-quality genomes that encompass the subpopulation structure of Asian rice (Oryza sativa), plus the genomes of two wild relatives (O. rufipogon and O. punctata), to build a pan-genome inversion index of 1769 non-redundant inversions that span an average of ~29% of the O. sativa cv. Nipponbare reference genome sequence. Using this index, we estimate an inversion rate of ~700 inversions per million years in Asian rice, which is 16 to 50 times higher than previously estimated for plants. Detailed analyses of these inversions show evidence of their effects on gene expression, recombination rate, and linkage disequilibrium. Our study uncovers the prevalence and scale of large inversions (≥100 bp) across the pan-genome of Asian rice and hints at their largely unexplored role in functional biology and crop performance.
Assuntos
Oryza , Oryza/genética , Análise de Sequência de DNA , Genoma de Planta/genética , Evolução Biológica , FilogeniaRESUMO
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
Assuntos
Bases de Dados de Proteínas , Análise de Sequência de Proteína , Proteínas/química , Proteínas/classificação , Integração de SistemasRESUMO
BACKGROUND: Merging stem cells with biomimetic materials represent an attractive approach to tissue engineering. The development of an alternative scaffold with the ability to mimic the extracellular matrix, and the 3D gradient preventing any alteration in cell metabolism or in their gene expression patterns, would have many medical applications. OBJECTIVE: In this study, we introduced the use of RGD (Arg-Gly-Asp) bio-conjugated cotton to promote the growth and proliferation of mesenchymal stem cells (MSCs). METHODS: We measured the expression of stem cell markers and adhesion markers with Q-PCR and analyzed the transcriptomic. The results obtained showed that the MSCs, when cultured with bio-conjugated cotton fibers, form aggregates around the fibers while proliferating. The seeded MSCs with cotton fibers proliferated in a similar fashion to the cells seeded on the monolayer (population doubling level 1.88 and 2.19 respectively). RESULTS: The whole genome sequencing of cells adhering to these cotton fibers and cells adhering to the cell culture dish showed differently expressed genes and pathways in both populations. However, the expression of the stem cell markers (Oct4, cKit, CD105) and cell adhesion markers (CD29, HSPG2 and CD138), when examined with quantitative RT-PCR, was maintained in both cell populations. CONCLUSION: These results clearly show the ability of the cotton fibers to promote MSCs growth and proliferation in a 3D structure mimicking the in vivo environment without losing their stem cell phenotype.
Assuntos
Proliferação de Células , Fibra de Algodão , Diferenciação Celular , Células Cultivadas , Células-Tronco Mesenquimais , Oligopeptídeos , Alicerces TeciduaisRESUMO
BACKGROUND: While programmed cell death receptor 1 (PD-1) blockade treatment has revolutionized treatment of patients with melanoma, clinical outcomes are highly variable, and only a fraction of patients show durable responses. Therefore, there is a clear need for predictive biomarkers to select patients who will benefit from the treatment. METHOD: To identify potential predictive markers for response to PD-1 checkpoint blockade immunotherapy, we conducted single-cell RNA sequencing analyses of peripheral blood mononuclear cells (PBMC) (n=8), as well as an in-depth immune monitoring study (n=20) by flow cytometry in patients with advanced melanoma undergoing treatment with nivolumab at Karolinska University Hospital. Blood samples were collected before the start of treatment and at the time of the second dose. RESULTS: Unbiased single-cell RNA sequencing of PBMC in patients with melanoma uncovered that a higher frequency of monocytes and a lower ratio of CD4+ T cells to monocyte were inversely associated with overall survival. Similarly, S100A9 expression in the monocytic subset was correlated inversely with overall survival. These results were confirmed by a flow cytometry-based analysis in an independent patient cohort. CONCLUSION: Our results suggest that monocytic cell populations can critically determine the outcome of PD-1 blockade, particularly the subset expressing S100A9, which should be further explored as a possible predictive biomarker. Detailed knowledge of the biological role of S100A9+ monocytes is of high translational relevance.
Assuntos
Calgranulina B/sangue , Inibidores de Checkpoint Imunológico/uso terapêutico , Melanoma/tratamento farmacológico , Monócitos/metabolismo , Nivolumabe/uso terapêutico , Receptor de Morte Celular Programada 1/antagonistas & inibidores , Neoplasias Cutâneas/tratamento farmacológico , Adulto , Idoso , Idoso de 80 Anos ou mais , Calgranulina B/genética , Feminino , Citometria de Fluxo , Humanos , Inibidores de Checkpoint Imunológico/efeitos adversos , Masculino , Melanoma/sangue , Melanoma/imunologia , Pessoa de Meia-Idade , Monócitos/imunologia , Nivolumabe/efeitos adversos , Valor Preditivo dos Testes , Receptor de Morte Celular Programada 1/metabolismo , RNA-Seq , Análise de Célula Única , Neoplasias Cutâneas/sangue , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/imunologia , Suécia , Fatores de Tempo , Resultado do TratamentoRESUMO
In mammals, LINE-1 (L1) retrotransposons constitute between 15% and 20% of the genome. Although only a few copies have retained the ability to retrotranspose, evidence in brain and differentiating pluripotent cells indicates that L1 retrotransposition occurs and creates mosaics in normal somatic tissues. The function of de novo insertions remains to be understood. The transdifferentiation of mouse embryonic fibroblasts to dopaminergic neuronal fate provides a suitable model for studying L1 dynamics in a defined genomic and unaltered epigenomic background. We found that L1 elements are specifically re-expressed and mobilized during the initial stages of reprogramming and that their insertions into specific acceptor loci coincides with higher chromatin accessibility and creation of new transcribed units. Those events accompany the maturation of neuronal committed cells. We conclude that L1 retrotransposition is a non-random process correlating with chromatin opening and lncRNA production that accompanies direct somatic cell reprogramming.
Assuntos
Transdiferenciação Celular/genética , Neurônios Dopaminérgicos/citologia , Neurônios Dopaminérgicos/metabolismo , Fibroblastos/citologia , Fibroblastos/metabolismo , Elementos Nucleotídeos Longos e Dispersos , Animais , Biomarcadores , Técnicas de Cultura de Células , Linhagem Celular , Biologia Computacional/métodos , Imunofluorescência , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Genoma , Camundongos , Retroelementos , Sequenciamento Completo do GenomaRESUMO
The impact of mammalian RNA interference components, particularly, Argonaute proteins, on chromatin organization is unexplored. Recent reports indicate that AGO1 association with chromatin appears to influence gene expression. To uncover the role of AGO1 in the nucleus, we used a combination of genome-wide approaches in control and AGO1-depleted HepG2 cells. We found that AGO1 strongly associates with active enhancers and RNA being produced at those sites. Hi-C analysis revealed AGO1 enrichment at the boundaries of topologically associated domains (TADs). By Hi-C in AGO1 knockdown cells, we observed changes in chromatin organization, including TADs and A/B compartment mixing, specifically in AGO1-bound regions. Distinct groups of genes and especially eRNA transcripts located within differentially interacting loci showed altered expression upon AGO1 depletion. Moreover, AGO1 association with enhancers is dependent on eRNA transcription. Collectively, our data suggest that enhancer-associated AGO1 contributes to the fine-tuning of chromatin architecture and gene expression in human cells.
Assuntos
Proteínas Argonautas/genética , Proteínas Argonautas/metabolismo , Fatores de Iniciação em Eucariotos/genética , Fatores de Iniciação em Eucariotos/metabolismo , Regulação da Expressão Gênica/genética , Núcleo Celular/genética , Cromatina/genética , Montagem e Desmontagem da Cromatina/genética , Elementos Facilitadores Genéticos/genética , Expressão Gênica/genética , Regulação da Expressão Gênica/fisiologia , Genoma Humano/genética , Células HEK293 , Células Hep G2 , HumanosRESUMO
BACKGROUND: Despite considerable efforts within the microarray community for standardising data format, content and description, microarray technologies present major challenges in managing, sharing, analysing and re-using the large amount of data generated locally or internationally. Additionally, it is recognised that inconsistent and low quality experimental annotation in public data repositories significantly compromises the re-use of microarray data for meta-analysis. MiMiR, the Microarray data Mining Resource was designed to tackle some of these limitations and challenges. Here we present new software components and enhancements to the original infrastructure that increase accessibility, utility and opportunities for large scale mining of experimental and clinical data. RESULTS: A user friendly Online Annotation Tool allows researchers to submit detailed experimental information via the web at the time of data generation rather than at the time of publication. This ensures the easy access and high accuracy of meta-data collected. Experiments are programmatically built in the MiMiR database from the submitted information and details are systematically curated and further annotated by a team of trained annotators using a new Curation and Annotation Tool. Clinical information can be annotated and coded with a clinical Data Mapping Tool within an appropriate ethical framework. Users can visualise experimental annotation, assess data quality, download and share data via a web-based experiment browser called MiMiR Online. All requests to access data in MiMiR are routed through a sophisticated middleware security layer thereby allowing secure data access and sharing amongst MiMiR registered users prior to publication. Data in MiMiR can be mined and analysed using the integrated EMAAS open source analysis web portal or via export of data and meta-data into Rosetta Resolver data analysis package. CONCLUSION: The new MiMiR suite of software enables systematic and effective capture of extensive experimental and clinical information with the highest MIAME score, and secure data sharing prior to publication. MiMiR currently contains more than 150 experiments corresponding to over 3000 hybridisations and supports the Microarray Centre's large microarray user community and two international consortia. The MiMiR flexible and scalable hardware and software architecture enables secure warehousing of thousands of datasets, including clinical studies, from microarray and potentially other -omics technologies.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Análise em Microsséries , Interface Usuário-Computador , Disseminação de Informação/métodos , Internet/organização & administração , Análise em Microsséries/métodos , Análise em Microsséries/estatística & dados numéricos , Projetos de PesquisaRESUMO
Changes in the environment, such as those caused by climate change, can exert stress on plant growth, diversity and ultimately global food security. Thus, focused efforts to fully understand plant response to stress are urgently needed in order to develop strategies to cope with the effects of climate change. Because Physcomitrella patens holds a key evolutionary position bridging the gap between green algae and higher plants, and because it exhibits a well-developed stress tolerance, it is an excellent model for such exploration. Here, we have used Physcomitrella patens to study genome-wide responses to abiotic stress through transcriptomic analysis by a high-throughput sequencing platform. We report a comprehensive analysis of transcriptome dynamics, defining profiles of elicited gene regulation responses to abiotic stress-associated hormone Abscisic Acid (ABA), cold, drought, and salt treatments. We identified more than 20,000 genes expressed under each aforementioned stress treatments, of which 9,668 display differential expression in response to stress. The comparison of Physcomitrella patens stress regulated genes with unicellular algae, vascular and flowering plants revealed genomic delineation concomitant with the evolutionary movement to land, including a general gene family complexity and loss of genes associated with different functional groups.
Assuntos
Evolução Biológica , Bryopsida/genética , Regulação da Expressão Gênica de Plantas , Estudo de Associação Genômica Ampla , Estresse Fisiológico/genética , Ácido Abscísico/farmacologia , Mapeamento Cromossômico , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Ontologia Genética , Genoma de Planta , Reprodutibilidade dos Testes , TranscriptomaRESUMO
The development of computational resources to visualize and explore data from combined genome-wide expression and linkage studies is essential for the development of testable hypotheses. eQTL Explorer stores expression profiles, linkage data and information from external sources in a relational database and enables simultaneous visualization and intuitive interpretation of the combined data via a Java graphical interface. eQTL Explorer provides a new and powerful tool to interrogate these very large and complex datasets.