Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 30(2): 131-148, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36689201

RESUMO

Given the wide variability in the quality of next-generation sequencing data submitted to public repositories, it is essential to identify methods that can perform quality control on these data sets when additional quality control data, such as mean tile data, are missing from public repositories. In this study, we present evidence that correlating counts of reads corresponding to pairs of motifs separated over specific distances on individual exons can be used as a proxy mean tile data in the data sets we analyzed and hence could be used when mean tile data are not available. As test data sets we use the Homo sapiens in vitro transcribed (IVT) data set, and a Drosophila melanogaster data set comprising wild and mutant types. We find that a FastQC analysis of the available parts of these data sets demonstrates that the per-tile sequencing quality is good for all the data sets apart from the mutant-type data where the mutant-r3 data are worse than the mutant-r2 data. Correspondingly, intra-exon motif correlations are reasonably large for all data sets except this latter case where the mutant-r2 correlations are low and the mutant-r3 correlations close to zero. We propose that these extremely low correlations are indicative of bias of technical origin, such as flowcell errors. In addition to this, the intra-exon motif correlations as a function of both guanosine-cytosine (GC) content parameters are somewhat higher and less dependent on the GC content parameters in the IVT-Plasmids messenger RNA (mRNA) selection free RNA-Seq sample (control) than in the other RNA-Seq samples that did undergo mRNA selection: both ribosomal depletion (IVT-Only) and PolyA selection (IVT-PolyA, wild type, and mutant).


Assuntos
Proteínas de Drosophila , Drosophila melanogaster , Animais , RNA-Seq , Drosophila melanogaster/genética , RNA Mensageiro/genética , Éxons/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Drosophila/genética , Proteínas Circadianas Period/genética
2.
Front Res Metr Anal ; 7: 912456, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35965666

RESUMO

The FAIR data principles are rapidly becoming a standard through which to assess responsible and reproducible research. In contrast to the requirements associated with the Interoperability principle, the requirements associated with the Accessibility principle are often assumed to be relatively straightforward to implement. Indeed, a variety of different tools assessing FAIR rely on the data being deposited in a trustworthy digital repository. In this paper we note that there is an implicit assumption that access to a repository is independent of where the user is geographically located. Using a virtual personal network (VPN) service we find that access to a set of web sites that underpin Open Science is variable from a set of 14 countries; either through connectivity issues (i.e., connections to download HTML being dropped) or through direct blocking (i.e., web servers sending 403 error codes). Many of the countries included in this study are already marginalized from Open Science discussions due to political issues or infrastructural challenges. This study clearly indicates that access to FAIR data resources is influenced by a range of geo-political factors. Given the volatile nature of politics and the slow pace of infrastructural investment, this is likely to continue to be an issue and indeed may grow. We propose that it is essential for discussions and implementations of FAIR to include awareness of these issues of accessibility. Without this awareness, the expansion of FAIR data may unintentionally reinforce current access inequities and research inequalities around the globe.

3.
Patterns (N Y) ; 2(10): 100324, 2021 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-34693369

RESUMO

We evaluate recent efforts to further the effective teaching of FAIR data principles by examining existing and developing educational frameworks focused upon FAIR, training initiatives that have informed teaching on FAIR skills' topics, and a number of key sources for discovering FAIR training materials and how much those sources provide descriptive information about the materials. FAIR4S, providing a coherent description of skills and competencies, is analyzed by target audience using the description of actors found in a European Open Science Cloud ecosystem report and by comparison of the coverage and extent of description of educational and training materials available from the list of sources for finding such materials. Our analysis elucidates the importance of linking resources to FAIR-related educational frameworks, providing consistent descriptions of them using a community-based metadata scheme, and developing an instructor community of practice where ideas and methods can be shared on how to teach FAIR data skills.

5.
Sci Eng Ethics ; 26(4): 2189-2213, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32067185

RESUMO

Data science skills are rapidly becoming a necessity in modern science. In response to this need, institutions and organizations around the world are developing research data science curricula to teach the programming and computational skills that are needed to build and maintain data infrastructures and maximize the use of available data. To date, however, few of these courses have included an explicit ethics component, and developing such components can be challenging. This paper describes a novel approach to teaching data ethics on short courses developed for the CODATA-RDA Schools for Research Data Science. The ethics content of these schools is centred on the concept of open and responsible (data) science citizenship that draws on virtue ethics to promote ethics of practice. Despite having little formal teaching time, this concept of citizenship is made central to the course by distributing ethics content across technical modules. Ethics instruction consists of a wide range of techniques, including stand-alone lectures, group discussions and mini-exercises linked to technical modules. This multi-level approach enables students to develop an understanding both of "responsible and open (data) science citizenship", and of how such responsibilities are implemented in daily research practices within their home environment. This approach successfully locates ethics within daily data science practice, and allows students to see how small actions build into larger ethical concerns. This emphasises that ethics are not something "removed from daily research" or the remit of data generators/end users, but rather are a vital concern for all data scientists.


Assuntos
Currículo , Ética Médica , Humanos , Ensino , Virtudes
6.
Wellcome Open Res ; 5: 267, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33501381

RESUMO

The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for clinicians, researchers, policy- and decision-makers, funders, publishers, public health experts, disaster preparedness and response experts, infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations), and other potential users. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.

7.
Brief Bioinform ; 21(1): 96-105, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30462158

RESUMO

The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up.

9.
J Integr Bioinform ; 14(3)2017 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-28941355

RESUMO

Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Software , Animais , Viés , Drosophila melanogaster/genética , Éxons/genética , Perfilação da Expressão Gênica , Transcriptoma/genética
10.
Gigascience ; 4: 23, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25960871

RESUMO

BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present.


Assuntos
Biologia Computacional , Internet , Interface Usuário-Computador
11.
PLoS One ; 9(7): e102642, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25050811

RESUMO

We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.


Assuntos
Software , Biologia Computacional , Bases de Dados Genéticas , Humanos , Internet , Análise de Sequência com Séries de Oligonucleotídeos
12.
Nucleic Acids Res ; 42(5): 3028-43, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24357409

RESUMO

Our knowledge of the role of higher-order chromatin structures in transcription of microRNA genes (MIRs) is evolving rapidly. Here we investigate the effect of 3D architecture of chromatin on the transcriptional regulation of MIRs. We demonstrate that MIRs have transcriptional features that are similar to protein-coding genes. RNA polymerase II-associated ChIA-PET data reveal that many groups of MIRs and protein-coding genes are organized into functionally compartmentalized chromatin communities and undergo coordinated expression when their genomic loci are spatially colocated. We observe that MIRs display widespread communication in those transcriptionally active communities. Moreover, miRNA-target interactions are significantly enriched among communities with functional homogeneity while depleted from the same community from which they originated, suggesting MIRs coordinating function-related pathways at posttranscriptional level. Further investigation demonstrates the existence of spatial MIR-MIR chromatin interacting networks. We show that groups of spatially coordinated MIRs are frequently from the same family and involved in the same disease category. The spatial interaction network possesses both common and cell-specific subnetwork modules that result from the spatial organization of chromatin within different cell types. Together, our study unveils an entirely unexplored layer of MIR regulation throughout the human genome that links the spatial coordination of MIRs to their co-expression and function.


Assuntos
Cromatina/metabolismo , Regulação da Expressão Gênica , MicroRNAs/genética , Cromatina/química , Humanos , Células K562 , Células MCF-7 , MicroRNAs/biossíntese , Transcrição Gênica
13.
Plant Physiol ; 161(4): 1930-51, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23439917

RESUMO

Phytohormones regulate plant growth from cell division to organ development. Jasmonates (JAs) are signaling molecules that have been implicated in stress-induced responses. However, they have also been shown to inhibit plant growth, but the mechanisms are not well understood. The effects of methyl jasmonate (MeJA) on leaf growth regulation were investigated in Arabidopsis (Arabidopsis thaliana) mutants altered in JA synthesis and perception, allene oxide synthase and coi1-16B (for coronatine insensitive1), respectively. We show that MeJA inhibits leaf growth through the JA receptor COI1 by reducing both cell number and size. Further investigations using flow cytometry analyses allowed us to evaluate ploidy levels and to monitor cell cycle progression in leaves and cotyledons of Arabidopsis and/or Nicotiana benthamiana at different stages of development. Additionally, a novel global transcription profiling analysis involving continuous treatment with MeJA was carried out to identify the molecular players whose expression is regulated during leaf development by this hormone and COI1. The results of these studies revealed that MeJA delays the switch from the mitotic cell cycle to the endoreduplication cycle, which accompanies cell expansion, in a COI1-dependent manner and inhibits the mitotic cycle itself, arresting cells in G1 phase prior to the S-phase transition. Significantly, we show that MeJA activates critical regulators of endoreduplication and affects the expression of key determinants of DNA replication. Our discoveries also suggest that MeJA may contribute to the maintenance of a cellular "stand-by mode" by keeping the expression of ribosomal genes at an elevated level. Finally, we propose a novel model for MeJA-regulated COI1-dependent leaf growth inhibition.


Assuntos
Acetatos/farmacologia , Arabidopsis/citologia , Arabidopsis/genética , Ciclopentanos/farmacologia , Endorreduplicação/efeitos dos fármacos , Oxilipinas/farmacologia , Folhas de Planta/citologia , Folhas de Planta/crescimento & desenvolvimento , Arabidopsis/efeitos dos fármacos , Arabidopsis/crescimento & desenvolvimento , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Contagem de Células , Núcleo Celular/efeitos dos fármacos , Núcleo Celular/metabolismo , Tamanho do Núcleo Celular/efeitos dos fármacos , Proliferação de Células/efeitos dos fármacos , Tamanho Celular/efeitos dos fármacos , Análise por Conglomerados , Cotilédone/efeitos dos fármacos , Cotilédone/crescimento & desenvolvimento , Replicação do DNA/efeitos dos fármacos , DNA de Plantas/metabolismo , Regulação para Baixo/efeitos dos fármacos , Endorreduplicação/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Meristema/citologia , Meristema/efeitos dos fármacos , Mitose/efeitos dos fármacos , Mitose/genética , Modelos Biológicos , Fenótipo , Folhas de Planta/efeitos dos fármacos , Proteínas Ribossômicas/genética , Proteínas Ribossômicas/metabolismo
14.
Nucleic Acids Res ; 40(8): 3307-15, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22199258

RESUMO

Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.


Assuntos
Sondas de DNA/química , Quadruplex G , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Humanos
15.
J Exp Bot ; 62(8): 2973-87, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21398429

RESUMO

The shade avoidance syndrome (SAS) allows plants to anticipate and avoid shading by neighbouring plants by initiating an elongation growth response. The phytochrome photoreceptors are able to detect a reduction in the red:far red ratio in incident light, the result of selective absorption of red and blue wavelengths by proximal vegetation. A shade-responsive luciferase reporter line (PHYB::LUC) was used to carry out a high-throughput screen to identify novel SAS mutants. The dracula 1 (dra1) mutant, that showed no avoidance of shade for the PHYB::LUC response, was the result of a mutation in the PHYA gene. Like previously characterized phyA mutants, dra1 showed a long hypocotyl in far red light and an enhanced hypocotyl elongation response to shade. However, dra1 additionally showed a long hypocotyl in red light. Since phyB levels are relatively unaffected in dra1, this gain-of-function red light phenotype strongly suggests a disruption of phyB signalling. The dra1 mutation, G773E within the phyA PAS2 domain, occurs at a residue absolutely conserved among phyA sequences. The equivalent residue in phyB is absolutely conserved as a threonine. PAS domains are structurally conserved domains involved in molecular interaction. Structural modelling of the dra1 mutation within the phyA PAS2 domain shows some similarity with the structure of the phyB PAS2 domain, suggesting that the interference with phyB signalling may be the result of non-functional mimicry. Hence, it was hypothesized that this PAS2 residue forms a key distinction between the phyA and phyB phytochrome species.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Arabidopsis/fisiologia , Ensaios de Triagem em Larga Escala/métodos , Mutação/genética , Fitocromo A/genética , Alelos , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/metabolismo , Segregação de Cromossomos/genética , Segregação de Cromossomos/efeitos da radiação , Clonagem Molecular , Regulação da Expressão Gênica de Plantas/efeitos da radiação , Genes de Plantas/genética , Genes Reporter/genética , Hipocótilo/crescimento & desenvolvimento , Hipocótilo/efeitos da radiação , Luz , Luciferases/metabolismo , Modelos Moleculares , Fenótipo , Fitocromo B/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transdução de Sinais/efeitos da radiação
16.
Plant Cell ; 20(4): 947-68, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18424613

RESUMO

In darkness, shoot apex growth is repressed, but it becomes rapidly activated by light. We show that phytochromes and cryptochromes play largely redundant roles in this derepression in Arabidopsis thaliana. We examined the light activation of transcriptional changes in a finely resolved time course, comparing the shoot apex (meristem and leaf primordia) and the cotyledon and found >5700 differentially expressed genes. Early events specific to the shoot apices included the repression of genes for Really Interesting New Gene finger proteins and basic domain/leucine zipper and basic helix-loop-helix transcription factors. The downregulation of auxin and ethylene and the upregulation of cytokinin and gibberellin hormonal responses were also characteristic of shoot apices. In the apex, genes involved in ribosome biogenesis and protein translation were rapidly and synchronously induced, simultaneously with cell proliferation genes, preceding visible organ growth. Subsequently, the activation of signaling genes and transcriptional signatures of cell wall expansion, turgor generation, and plastid biogenesis were apparent. Furthermore, light regulates the forms and protein levels of two transcription factors with opposing functions in cell proliferation, E2FB and E2FC, through the Constitutively Photomorphogenic1 (COP1), COP9-Signalosome5, and Deetiolated1 light signaling molecules. These data provide the basis for reconstruction of the regulatory networks for light-regulated meristem, leaf, and cotyledon development.


Assuntos
Arabidopsis/efeitos da radiação , Ciclo Celular/efeitos da radiação , Cotilédone/citologia , Expressão Gênica/efeitos da radiação , Luz , Brotos de Planta/citologia , Sequência de Aminoácidos , Arabidopsis/citologia , Arabidopsis/genética , Genes de Plantas , Família Multigênica , Complexo de Proteínas do Centro de Reação Fotossintética/fisiologia , Reação em Cadeia da Polimerase , Transcrição Gênica/efeitos da radiação
17.
Nucleic Acids Res ; 32(16): 4732-41, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15356290

RESUMO

Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix-turn-helix (HTH), helix-hairpin-helix (HhH) and helix-loop-helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Motivos de Aminoácidos , Sítios de Ligação , Proteínas de Ligação a DNA/metabolismo , Bases de Dados de Proteínas , Genômica , Sequências Hélice-Alça-Hélice , Sequências Hélice-Volta-Hélice , Modelos Moleculares , Eletricidade Estática
18.
Nucleic Acids Res ; 31(24): 7189-98, 2003 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-14654694

RESUMO

A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Sítios de Ligação , Sequência Conservada , Bases de Dados Genéticas , Interações Hidrofóbicas e Hidrofílicas , Eletricidade Estática
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...