Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850134

RESUMO

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.


Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genética
2.
Am J Hum Genet ; 104(1): 13-20, 2019 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-30609404

RESUMO

Genomic sequencing is rapidly transitioning into clinical practice, and implementation into healthcare systems has been supported by substantial government investment, totaling over US$4 billion, in at least 14 countries. These national genomic-medicine initiatives are driving transformative change under real-life conditions while simultaneously addressing barriers to implementation and gathering evidence for wider adoption. We review the diversity of approaches and current progress made by national genomic-medicine initiatives in the UK, France, Australia, and US and provide a roadmap for sharing strategies, standards, and data internationally to accelerate implementation.


Assuntos
Atenção à Saúde/métodos , Atenção à Saúde/organização & administração , Genética Médica/métodos , Genética Médica/organização & administração , Genômica/tendências , Cooperação Internacional , Austrália , Atenção à Saúde/economia , Atenção à Saúde/tendências , Medicina Baseada em Evidências , França , Genética Médica/economia , Genética Médica/tendências , Genômica/economia , Humanos , Disseminação de Informação , Setor Privado , Reino Unido , Estados Unidos
3.
Nat Rev Drug Discov ; 13(4): 239-40, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24687050

RESUMO

Information technologies already have a key role in pharmaceutical research and development (R&D), but achieving substantial advances in their use and effectiveness will depend on overcoming current challenges in sharing, integrating and jointly analysing the range of data generated at different stages of the R&D process.


Assuntos
Indústria Farmacêutica/organização & administração , Gestão do Conhecimento , Pesquisa/organização & administração , Comportamento Cooperativo , Humanos , Gestão da Informação , Tecnologia Farmacêutica
4.
Nature ; 494(7435): 77-80, 2013 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-23354052

RESUMO

Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 10(6) bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.


Assuntos
Arquivos , DNA/química , DNA/síntese química , Gestão da Informação/métodos , Sequência de Bases , Computadores , DNA/economia , Gestão da Informação/economia , Dados de Sequência Molecular , Análise de Sequência de DNA/economia , Biologia Sintética/economia , Biologia Sintética/métodos
7.
Genome Res ; 20(5): 685-92, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20194951

RESUMO

We have produced an evolutionary model for promoters, analogous to the commonly used synonymous/nonsynonymous mutation models for protein-coding sequences. Although our model, called Sunflower, relies on some simple assumptions, it captures enough of the biology of transcription factor action to show clear correlation with other biological features. Sunflower predicts a binding profile of transcription factors to DNA sequences, in which different factors compete for the same potential binding sites. The parametrized model simultaneously estimates a continuous measurement of binding occupancy across the genomic sequence for each factor. We can then introduce a localized mutation, rerun the binding model, and record the difference in binding profiles. A single mutation can alter interactions both upstream and downstream of its position due to potential overlapping binding sites, and our statistic captures this domino effect. Over evolutionary time, we observe a clear excess of low-scoring mutations fixed in promoters, consistent with most changes being neutral. However, this is not consistent across all promoters, and some promoters show more rapid divergence. This divergence often occurs in the presence of relatively constant protein-coding divergence. Interestingly, different classes of promoters show different sensitivity to mutations, with phosphorylation-related genes having promoters inherently more sensitive to mutations than immune genes. Although there have previously been a number of models attempting to handle transcription factor binding, Sunflower provides a richer biological model, incorporating weak binding sites and the possibility of competition. The results show the first clear correlations between such a model and evolutionary processes.


Assuntos
Cães/genética , Genoma Humano , Modelos Genéticos , Regiões Promotoras Genéticas/genética , Seleção Genética/genética , Fatores de Transcrição/genética , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação , Evolução Molecular , Regulação da Expressão Gênica , Genoma/genética , Genoma Humano/genética , Humanos , Cadeias de Markov , Mutação , Fatores de Transcrição/metabolismo
8.
Genome Res ; 20(2): 249-56, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20123915

RESUMO

We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.


Assuntos
Biblioteca Genômica , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Enzimas de Restrição do DNA/química , Drosophila melanogaster/genética , Mutação , Alinhamento de Sequência , Análise de Sequência de DNA/economia
9.
Bioinformatics ; 23(13): i195-204, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646297

RESUMO

MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/.


Assuntos
Algoritmos , Mapeamento Cromossômico/instrumentação , Desenho Assistido por Computador , Sondas de DNA/química , Análise em Microsséries/instrumentação , Análise em Microsséries/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Controle de Qualidade
10.
Genome Res ; 17(3): 320-7, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17284679

RESUMO

Peptide hormones are small, processed, and secreted peptides that signal via membrane receptors and play critical roles in normal and pathological physiology. The search for novel peptide hormones has been hampered by their small size, low or restricted expression, and lack of sequence similarity. To overcome these difficulties, we developed a bioinformatics search tool based on the hidden Markov model formalism that uses several peptide hormone sequence features to estimate the likelihood that a protein contains a processed and secreted peptide of this class. Application of this tool to an alignment of mammalian proteomes ranked 90% of known peptide hormones among the top 300 proteins. An analysis of the top scoring hypothetical and poorly annotated human proteins identified two novel candidate peptide hormones. Biochemical analysis of the two candidates, which we called spexin and augurin, showed that both were localized to secretory granules in a transfected pancreatic cell line and were recovered from the cell supernatant. Spexin was expressed in the submucosal layer of the mouse esophagus and stomach, and a predicted peptide from the spexin precursor induced muscle contraction in a rat stomach explant assay. Augurin was specifically expressed in mouse endocrine tissues, including pituitary and adrenal gland, choroid plexus, and the atrio-ventricular node of the heart. Our findings demonstrate the utility of a bioinformatics approach to identify novel biologically active peptides. Peptide hormones and their receptors are important diagnostic and therapeutic targets, and our results suggest that spexin and augurin are novel peptide hormones likely to be involved in physiological homeostasis.


Assuntos
Algoritmos , Biologia Computacional/métodos , Hormônios Peptídicos/genética , Hormônios Peptídicos/metabolismo , Proteoma/genética , Proteômica/métodos , Sequência de Aminoácidos , Animais , Linhagem Celular , Primers do DNA , Células Enteroendócrinas/metabolismo , Humanos , Imuno-Histoquímica , Funções Verossimilhança , Cadeias de Markov , Camundongos , Modelos Genéticos , Dados de Sequência Molecular , Ratos
11.
Genome Biol ; 7 Suppl 1: S2.1-31, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16925836

RESUMO

BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.


Assuntos
Biologia Computacional/normas , Genoma Humano , Genômica/normas , Processamento Alternativo , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Genes , Genômica/métodos , Humanos , Camundongos , RNA Mensageiro/análise , Análise de Sequência de DNA , Análise de Sequência de RNA
12.
Nucleic Acids Res ; 30(1): 276-80, 2002 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-11752314

RESUMO

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the World Wide Web in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgb.ki.se/Pfam/, in France at http://pfam.jouy.inra.fr/ and in the US at http://pfam.wustl.edu/. The latest version (6.6) of Pfam contains 3071 families, which match 69% of proteins in SWISS-PROT 39 and TrEMBL 14. Structural data, where available, have been utilised to ensure that Pfam families correspond with structural domains, and to improve domain-based annotation. Predictions of non-domain regions are now also included. In addition to secondary structure, Pfam multiple sequence alignments now contain active site residue mark-up. New search tools, including taxonomy search and domain query, greatly add to the functionality and usability of the Pfam resource.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Animais , Sítios de Ligação , Gráficos por Computador , Evolução Molecular , Genoma , Humanos , Armazenamento e Recuperação da Informação , Internet , Substâncias Macromoleculares , Cadeias de Markov , Filogenia , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/fisiologia , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA