Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Brief Bioinform ; 21(1): 96-105, 2020 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-30462158

RESUMEN

The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up.

3.
Sci Eng Ethics ; 26(4): 2189-2213, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32067185

RESUMEN

Data science skills are rapidly becoming a necessity in modern science. In response to this need, institutions and organizations around the world are developing research data science curricula to teach the programming and computational skills that are needed to build and maintain data infrastructures and maximize the use of available data. To date, however, few of these courses have included an explicit ethics component, and developing such components can be challenging. This paper describes a novel approach to teaching data ethics on short courses developed for the CODATA-RDA Schools for Research Data Science. The ethics content of these schools is centred on the concept of open and responsible (data) science citizenship that draws on virtue ethics to promote ethics of practice. Despite having little formal teaching time, this concept of citizenship is made central to the course by distributing ethics content across technical modules. Ethics instruction consists of a wide range of techniques, including stand-alone lectures, group discussions and mini-exercises linked to technical modules. This multi-level approach enables students to develop an understanding both of "responsible and open (data) science citizenship", and of how such responsibilities are implemented in daily research practices within their home environment. This approach successfully locates ethics within daily data science practice, and allows students to see how small actions build into larger ethical concerns. This emphasises that ethics are not something "removed from daily research" or the remit of data generators/end users, but rather are a vital concern for all data scientists.


Asunto(s)
Curriculum , Ética Médica , Humanos , Enseñanza , Virtudes
4.
Nucleic Acids Res ; 42(5): 3028-43, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24357409

RESUMEN

Our knowledge of the role of higher-order chromatin structures in transcription of microRNA genes (MIRs) is evolving rapidly. Here we investigate the effect of 3D architecture of chromatin on the transcriptional regulation of MIRs. We demonstrate that MIRs have transcriptional features that are similar to protein-coding genes. RNA polymerase II-associated ChIA-PET data reveal that many groups of MIRs and protein-coding genes are organized into functionally compartmentalized chromatin communities and undergo coordinated expression when their genomic loci are spatially colocated. We observe that MIRs display widespread communication in those transcriptionally active communities. Moreover, miRNA-target interactions are significantly enriched among communities with functional homogeneity while depleted from the same community from which they originated, suggesting MIRs coordinating function-related pathways at posttranscriptional level. Further investigation demonstrates the existence of spatial MIR-MIR chromatin interacting networks. We show that groups of spatially coordinated MIRs are frequently from the same family and involved in the same disease category. The spatial interaction network possesses both common and cell-specific subnetwork modules that result from the spatial organization of chromatin within different cell types. Together, our study unveils an entirely unexplored layer of MIR regulation throughout the human genome that links the spatial coordination of MIRs to their co-expression and function.


Asunto(s)
Cromatina/metabolismo , Regulación de la Expresión Génica , MicroARNs/genética , Cromatina/química , Humanos , Células K562 , Células MCF-7 , MicroARNs/biosíntesis , Transcripción Genética
5.
Plant Physiol ; 161(4): 1930-51, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23439917

RESUMEN

Phytohormones regulate plant growth from cell division to organ development. Jasmonates (JAs) are signaling molecules that have been implicated in stress-induced responses. However, they have also been shown to inhibit plant growth, but the mechanisms are not well understood. The effects of methyl jasmonate (MeJA) on leaf growth regulation were investigated in Arabidopsis (Arabidopsis thaliana) mutants altered in JA synthesis and perception, allene oxide synthase and coi1-16B (for coronatine insensitive1), respectively. We show that MeJA inhibits leaf growth through the JA receptor COI1 by reducing both cell number and size. Further investigations using flow cytometry analyses allowed us to evaluate ploidy levels and to monitor cell cycle progression in leaves and cotyledons of Arabidopsis and/or Nicotiana benthamiana at different stages of development. Additionally, a novel global transcription profiling analysis involving continuous treatment with MeJA was carried out to identify the molecular players whose expression is regulated during leaf development by this hormone and COI1. The results of these studies revealed that MeJA delays the switch from the mitotic cell cycle to the endoreduplication cycle, which accompanies cell expansion, in a COI1-dependent manner and inhibits the mitotic cycle itself, arresting cells in G1 phase prior to the S-phase transition. Significantly, we show that MeJA activates critical regulators of endoreduplication and affects the expression of key determinants of DNA replication. Our discoveries also suggest that MeJA may contribute to the maintenance of a cellular "stand-by mode" by keeping the expression of ribosomal genes at an elevated level. Finally, we propose a novel model for MeJA-regulated COI1-dependent leaf growth inhibition.


Asunto(s)
Acetatos/farmacología , Arabidopsis/citología , Arabidopsis/genética , Ciclopentanos/farmacología , Endorreduplicación/efectos de los fármacos , Oxilipinas/farmacología , Hojas de la Planta/citología , Hojas de la Planta/crecimiento & desarrollo , Arabidopsis/efectos de los fármacos , Arabidopsis/crecimiento & desarrollo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Recuento de Células , Núcleo Celular/efectos de los fármacos , Núcleo Celular/metabolismo , Tamaño del Núcleo Celular/efectos de los fármacos , Proliferación Celular/efectos de los fármacos , Tamaño de la Célula/efectos de los fármacos , Análisis por Conglomerados , Cotiledón/efectos de los fármacos , Cotiledón/crecimiento & desarrollo , Replicación del ADN/efectos de los fármacos , ADN de Plantas/metabolismo , Regulación hacia Abajo/efectos de los fármacos , Endorreduplicación/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas/efectos de los fármacos , Meristema/citología , Meristema/efectos de los fármacos , Mitosis/efectos de los fármacos , Mitosis/genética , Modelos Biológicos , Fenotipo , Hojas de la Planta/efectos de los fármacos , Proteínas Ribosómicas/genética , Proteínas Ribosómicas/metabolismo
6.
Nucleic Acids Res ; 40(8): 3307-15, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22199258

RESUMEN

Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.


Asunto(s)
Sondas de ADN/química , G-Cuádruplex , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Humanos
7.
J Comput Biol ; 30(2): 131-148, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36689201

RESUMEN

Given the wide variability in the quality of next-generation sequencing data submitted to public repositories, it is essential to identify methods that can perform quality control on these data sets when additional quality control data, such as mean tile data, are missing from public repositories. In this study, we present evidence that correlating counts of reads corresponding to pairs of motifs separated over specific distances on individual exons can be used as a proxy mean tile data in the data sets we analyzed and hence could be used when mean tile data are not available. As test data sets we use the Homo sapiens in vitro transcribed (IVT) data set, and a Drosophila melanogaster data set comprising wild and mutant types. We find that a FastQC analysis of the available parts of these data sets demonstrates that the per-tile sequencing quality is good for all the data sets apart from the mutant-type data where the mutant-r3 data are worse than the mutant-r2 data. Correspondingly, intra-exon motif correlations are reasonably large for all data sets except this latter case where the mutant-r2 correlations are low and the mutant-r3 correlations close to zero. We propose that these extremely low correlations are indicative of bias of technical origin, such as flowcell errors. In addition to this, the intra-exon motif correlations as a function of both guanosine-cytosine (GC) content parameters are somewhat higher and less dependent on the GC content parameters in the IVT-Plasmids messenger RNA (mRNA) selection free RNA-Seq sample (control) than in the other RNA-Seq samples that did undergo mRNA selection: both ribosomal depletion (IVT-Only) and PolyA selection (IVT-PolyA, wild type, and mutant).


Asunto(s)
Proteínas de Drosophila , Drosophila melanogaster , Animales , RNA-Seq , Drosophila melanogaster/genética , ARN Mensajero/genética , Exones/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Proteínas de Drosophila/genética , Proteínas Circadianas Period/genética
8.
Front Res Metr Anal ; 7: 912456, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35965666

RESUMEN

The FAIR data principles are rapidly becoming a standard through which to assess responsible and reproducible research. In contrast to the requirements associated with the Interoperability principle, the requirements associated with the Accessibility principle are often assumed to be relatively straightforward to implement. Indeed, a variety of different tools assessing FAIR rely on the data being deposited in a trustworthy digital repository. In this paper we note that there is an implicit assumption that access to a repository is independent of where the user is geographically located. Using a virtual personal network (VPN) service we find that access to a set of web sites that underpin Open Science is variable from a set of 14 countries; either through connectivity issues (i.e., connections to download HTML being dropped) or through direct blocking (i.e., web servers sending 403 error codes). Many of the countries included in this study are already marginalized from Open Science discussions due to political issues or infrastructural challenges. This study clearly indicates that access to FAIR data resources is influenced by a range of geo-political factors. Given the volatile nature of politics and the slow pace of infrastructural investment, this is likely to continue to be an issue and indeed may grow. We propose that it is essential for discussions and implementations of FAIR to include awareness of these issues of accessibility. Without this awareness, the expansion of FAIR data may unintentionally reinforce current access inequities and research inequalities around the globe.

9.
J Exp Bot ; 62(8): 2973-87, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21398429

RESUMEN

The shade avoidance syndrome (SAS) allows plants to anticipate and avoid shading by neighbouring plants by initiating an elongation growth response. The phytochrome photoreceptors are able to detect a reduction in the red:far red ratio in incident light, the result of selective absorption of red and blue wavelengths by proximal vegetation. A shade-responsive luciferase reporter line (PHYB::LUC) was used to carry out a high-throughput screen to identify novel SAS mutants. The dracula 1 (dra1) mutant, that showed no avoidance of shade for the PHYB::LUC response, was the result of a mutation in the PHYA gene. Like previously characterized phyA mutants, dra1 showed a long hypocotyl in far red light and an enhanced hypocotyl elongation response to shade. However, dra1 additionally showed a long hypocotyl in red light. Since phyB levels are relatively unaffected in dra1, this gain-of-function red light phenotype strongly suggests a disruption of phyB signalling. The dra1 mutation, G773E within the phyA PAS2 domain, occurs at a residue absolutely conserved among phyA sequences. The equivalent residue in phyB is absolutely conserved as a threonine. PAS domains are structurally conserved domains involved in molecular interaction. Structural modelling of the dra1 mutation within the phyA PAS2 domain shows some similarity with the structure of the phyB PAS2 domain, suggesting that the interference with phyB signalling may be the result of non-functional mimicry. Hence, it was hypothesized that this PAS2 residue forms a key distinction between the phyA and phyB phytochrome species.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Arabidopsis/fisiología , Ensayos Analíticos de Alto Rendimiento/métodos , Mutación/genética , Fitocromo A/genética , Alelos , Arabidopsis/efectos de la radiación , Proteínas de Arabidopsis/metabolismo , Segregación Cromosómica/genética , Segregación Cromosómica/efectos de la radiación , Clonación Molecular , Regulación de la Expresión Génica de las Plantas/efectos de la radiación , Genes de Plantas/genética , Genes Reporteros/genética , Hipocótilo/crecimiento & desarrollo , Hipocótilo/efectos de la radiación , Luz , Luciferasas/metabolismo , Modelos Moleculares , Fenotipo , Fitocromo B/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transducción de Señal/efectos de la radiación
10.
Patterns (N Y) ; 2(10): 100324, 2021 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-34693369

RESUMEN

We evaluate recent efforts to further the effective teaching of FAIR data principles by examining existing and developing educational frameworks focused upon FAIR, training initiatives that have informed teaching on FAIR skills' topics, and a number of key sources for discovering FAIR training materials and how much those sources provide descriptive information about the materials. FAIR4S, providing a coherent description of skills and competencies, is analyzed by target audience using the description of actors found in a European Open Science Cloud ecosystem report and by comparison of the coverage and extent of description of educational and training materials available from the list of sources for finding such materials. Our analysis elucidates the importance of linking resources to FAIR-related educational frameworks, providing consistent descriptions of them using a community-based metadata scheme, and developing an instructor community of practice where ideas and methods can be shared on how to teach FAIR data skills.

11.
Wellcome Open Res ; 5: 267, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33501381

RESUMEN

The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for clinicians, researchers, policy- and decision-makers, funders, publishers, public health experts, disaster preparedness and response experts, infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations), and other potential users. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.

12.
J Integr Bioinform ; 14(3)2017 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-28941355

RESUMEN

Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Programas Informáticos , Animales , Sesgo , Drosophila melanogaster/genética , Exones/genética , Perfilación de la Expresión Génica , Transcriptoma/genética
13.
Nucleic Acids Res ; 31(24): 7189-98, 2003 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-14654694

RESUMEN

A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.


Asunto(s)
Biología Computacional/métodos , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Sitios de Unión , Secuencia Conservada , Bases de Datos Genéticas , Interacciones Hidrofóbicas e Hidrofílicas , Electricidad Estática
14.
Nucleic Acids Res ; 32(16): 4732-41, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15356290

RESUMEN

Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix-turn-helix (HTH), helix-hairpin-helix (HhH) and helix-loop-helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif.


Asunto(s)
Biología Computacional/métodos , Proteínas de Unión al ADN/química , Secuencias de Aminoácidos , Sitios de Unión , Proteínas de Unión al ADN/metabolismo , Bases de Datos de Proteínas , Genómica , Secuencias Hélice-Asa-Hélice , Secuencias Hélice-Giro-Hélice , Modelos Moleculares , Electricidad Estática
15.
Gigascience ; 4: 23, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25960871

RESUMEN

BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present.


Asunto(s)
Biología Computacional , Internet , Interfaz Usuario-Computador
16.
PLoS One ; 9(7): e102642, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25050811

RESUMEN

We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.


Asunto(s)
Programas Informáticos , Biología Computacional , Bases de Datos Genéticas , Humanos , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos
18.
Plant Cell ; 20(4): 947-68, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18424613

RESUMEN

In darkness, shoot apex growth is repressed, but it becomes rapidly activated by light. We show that phytochromes and cryptochromes play largely redundant roles in this derepression in Arabidopsis thaliana. We examined the light activation of transcriptional changes in a finely resolved time course, comparing the shoot apex (meristem and leaf primordia) and the cotyledon and found >5700 differentially expressed genes. Early events specific to the shoot apices included the repression of genes for Really Interesting New Gene finger proteins and basic domain/leucine zipper and basic helix-loop-helix transcription factors. The downregulation of auxin and ethylene and the upregulation of cytokinin and gibberellin hormonal responses were also characteristic of shoot apices. In the apex, genes involved in ribosome biogenesis and protein translation were rapidly and synchronously induced, simultaneously with cell proliferation genes, preceding visible organ growth. Subsequently, the activation of signaling genes and transcriptional signatures of cell wall expansion, turgor generation, and plastid biogenesis were apparent. Furthermore, light regulates the forms and protein levels of two transcription factors with opposing functions in cell proliferation, E2FB and E2FC, through the Constitutively Photomorphogenic1 (COP1), COP9-Signalosome5, and Deetiolated1 light signaling molecules. These data provide the basis for reconstruction of the regulatory networks for light-regulated meristem, leaf, and cotyledon development.


Asunto(s)
Arabidopsis/efectos de la radiación , Ciclo Celular/efectos de la radiación , Cotiledón/citología , Expresión Génica/efectos de la radiación , Luz , Brotes de la Planta/citología , Secuencia de Aminoácidos , Arabidopsis/citología , Arabidopsis/genética , Genes de Plantas , Familia de Multigenes , Proteínas del Complejo del Centro de Reacción Fotosintética/fisiología , Reacción en Cadena de la Polimerasa , Transcripción Genética/efectos de la radiación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA