Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
BMC Genomics ; 24(1): 408, 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37468834

RESUMO

BACKGROUND: The group of > 40 cryptic whitefly species called Bemisia tabaci sensu lato are amongst the world's worst agricultural pests and plant-virus vectors. Outbreaks of B. tabaci s.l. and the associated plant-virus diseases continue to contribute to global food insecurity and social instability, particularly in sub-Saharan Africa and Asia. Published B. tabaci s.l. genomes have limited use for studying African cassava B. tabaci SSA1 species, due to the high genetic divergences between them. Genomic annotations presented here were performed using the 'Ensembl gene annotation system', to ensure that comparative analyses and conclusions reflect biological differences, as opposed to arising from different methodologies underpinning transcript model identification. RESULTS: We present here six new B. tabaci s.l. genomes from Africa and Asia, and two re-annotated previously published genomes, to provide evolutionary insights into these globally distributed pests. Genome sizes ranged between 616-658 Mb and exhibited some of the highest coverage of transposable elements reported within Arthropoda. Many fewer total protein coding genes (PCG) were recovered compared to the previously published B. tabaci s.l. genomes and structural annotations generated via the uniform methodology strongly supported a repertoire of between 12.8-13.2 × 103 PCG. An integrative systematics approach incorporating phylogenomic analysis of nuclear and mitochondrial markers supported a monophyletic Aleyrodidae and the basal positioning of B. tabaci Uganda-1 to the sub-Saharan group of species. Reciprocal cross-mating data and the co-cladogenesis pattern of the primary obligate endosymbiont 'Candidatus Portiera aleyrodidarum' from 11 Bemisia genomes further supported the phylogenetic reconstruction to show that African cassava B. tabaci populations consist of just three biological species. We include comparative analyses of gene families related to detoxification, sugar metabolism, vector competency and evaluate the presence and function of horizontally transferred genes, essential for understanding the evolution and unique biology of constituent B. tabaci. s.l species. CONCLUSIONS: These genomic resources have provided new and critical insights into the genetics underlying B. tabaci s.l. biology. They also provide a rich foundation for post-genomic research, including the selection of candidate gene-targets for innovative whitefly and virus-control strategies.


Assuntos
Hemípteros , Vírus de Plantas , Animais , Filogenia , África , Ásia
2.
Hum Mutat ; 43(8): 986-997, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-34816521

RESUMO

The Ensembl Variant Effect Predictor (VEP) is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequences using the Ensembl/GENCODE or RefSeq gene sets. It also reports phenotype associations from databases such as ClinVar, allele frequencies from studies including gnomAD, and predictions of deleteriousness from tools such as Sorting Intolerant From Tolerant and Combined Annotation Dependent Depletion. Ensembl VEP includes filtering options to customize variant prioritization. It is well supported and updated roughly quarterly to incorporate the latest gene, variant, and phenotype association information. Ensembl VEP analysis can be performed using a highly configurable, extensible command-line tool, a Representational State Transfer application programming interface, and a user-friendly web interface. These access methods are designed to suit different levels of bioinformatics experience and meet different needs in terms of data size, visualization, and flexibility. In this tutorial, we will describe performing variant annotation using the Ensembl VEP web tool, which enables sophisticated analysis through a simple interface.


Assuntos
Genômica , Software , Biologia Computacional , Bases de Dados Genéticas , Frequência do Gene , Humanos , Anotação de Sequência Molecular , Fenótipo
3.
Nucleic Acids Res ; 50(D1): D988-D995, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791404

RESUMO

Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed to efficiently deliver annotation at scale for all eukaryotic life, and it also provides deep comprehensive annotation for key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the annotation of new assemblies. Here, we report the release of the greatest annual number of newly annotated genomes in the history of Ensembl via our dedicated Ensembl Rapid Release platform (http://rapid.ensembl.org). We have also developed a new method to generate comparative analyses at scale for these assemblies and, for the first time, we have annotated non-vertebrate eukaryotes. Meanwhile, we continually improve, extend and update the annotation for our high-value reference vertebrate genomes and report the details here. We have a range of specific software tools for specific tasks, such as the Ensembl Variant Effect Predictor (VEP) and the newly developed interface for the Variant Recoder. All Ensembl data, software and tools are freely available for download and are accessible programmatically.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Anotação de Sequência Molecular , Software , Animais , Biologia Computacional/classificação , Humanos
5.
Mol Genet Genomic Med ; 9(12): e1786, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34435752

RESUMO

BACKGROUND: Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. METHOD: Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. RESULTS: The survey had 788 responses from 32 different countries, the results of which we report here. CONCLUSIONS: We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.


Assuntos
Biomarcadores , Biologia Computacional , Genômica , RNA Mensageiro/genética , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Humanos , Software , Navegador
8.
Nucleic Acids Res ; 49(D1): D884-D891, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33137190

RESUMO

The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed and comprehensive annotation of gene structures, regulatory elements and variants, and enable comparative genomics by inferring the evolutionary history of genes and genomes. Our integrated genomic data are made available in a variety of ways, including genome browsers, search interfaces, specialist tools such as the Ensembl Variant Effect Predictor, download files and programmatic interfaces. Here, we present recent Ensembl developments including two new website portals. Ensembl Rapid Release (http://rapid.ensembl.org) is designed to provide core tools and services for genomes as soon as possible and has been deployed to support large biodiversity sequencing projects. Our SARS-CoV-2 genome browser (https://covid-19.ensembl.org) integrates our own annotation with publicly available genomic data from numerous sources to facilitate the use of genomics in the international scientific response to the COVID-19 pandemic. We also report on other updates to our annotation resources, tools and services. All Ensembl data and software are freely available without restriction.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genômica/métodos , SARS-CoV-2/genética , Vertebrados/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Humanos , Internet , Anotação de Sequência Molecular/métodos , Pandemias , Vertebrados/classificação
9.
Nature ; 581(7809): 434-443, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461654

RESUMO

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Assuntos
Exoma/genética , Genes Essenciais/genética , Variação Genética/genética , Genoma Humano/genética , Adulto , Encéfalo/metabolismo , Doenças Cardiovasculares/genética , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Mutação com Perda de Função/genética , Masculino , Taxa de Mutação , Pró-Proteína Convertase 9/genética , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Sequenciamento do Exoma , Sequenciamento Completo do Genoma
10.
Nat Med ; 26(6): 869-877, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32461697

RESUMO

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.


Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Mutação com Perda de Função/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Feminino , Mutação com Ganho de Função/genética , Heterozigoto , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/antagonistas & inibidores , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Longevidade/genética , Linfócitos/metabolismo , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/metabolismo , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/genética , Fenótipo
11.
F1000Res ; 92020.
Artigo em Inglês | MEDLINE | ID: mdl-34367618

RESUMO

Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While "High-Throughput" sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR's recently established human CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context.


Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA , Variações do Número de Cópias de DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
12.
Nucleic Acids Res ; 48(D1): D682-D688, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31691826

RESUMO

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Epigenoma , Anotação de Sequência Molecular , Algoritmos , Animais , Gráficos por Computador , Bases de Dados de Proteínas , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Histonas/metabolismo , Humanos , Imageamento Tridimensional , Internet , Ligantes , Ferramenta de Busca , Software , Especificidade da Espécie , Transcriptoma , Interface Usuário-Computador , Navegador
13.
Nucleic Acids Res ; 47(D1): D745-D751, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30407521

RESUMO

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Vertebrados/genética , Animais , Biologia Computacional/tendências , Humanos , Camundongos , Anotação de Sequência Molecular , Software
14.
Bioinformatics ; 35(13): 2315-2317, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30475984

RESUMO

SUMMARY: Assessing the pathogenicity of genetic variants can be a complex and challenging task. Spliceogenic variants, which alter mRNA splicing, may yield mature transcripts that encode non-functional protein products, an important predictor of Mendelian disease risk. However, most variant annotation tools do not adequately assess spliceogenicity outside the native splice site and thus the disease-causing potential of variants in other intronic and exonic regions is often overlooked. Here, we present a plugin for the Ensembl Variant Effect Predictor that packages MaxEntScan and extends its functionality to provide splice site predictions using a maximum entropy model. The plugin incorporates a sliding window algorithm to predict splice site loss or gain for any variant that overlaps a transcript feature. We also demonstrate the utility of the plugin by comparing our predictions to two mRNA splicing datasets containing several cancer-susceptibility genes. AVAILABILITY AND IMPLEMENTATION: Source code is freely available under the Apache License, Version 2.0: https://github.com/Ensembl/VEP_plugins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Splicing de RNA , Software , Algoritmos , Éxons , Íntrons
15.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576484

RESUMO

The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype and disease. There are numerous valuable and well-established variation resources, but collating and making sense of non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without a systematic catalogue of these data and appropriate query and annotation tools, understanding the genome sequence of an individual and assessing their disease risk is impossible. In Ensembl, we substantially solve this problem: we develop methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats, both visually and programmatically; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , Algoritmos , Humanos , Análise de Sequência de DNA , Interface Usuário-Computador
16.
BMC Genomics ; 19(1): 624, 2018 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-30134833

RESUMO

BACKGROUND: The new genomic technologies have provided novel insights into the genetics of interactions between vectors, viruses and hosts, which are leading to advances in the control of arboviruses of medical importance. However, the development of tools and resources available for vectors of non-zoonotic arboviruses remains neglected. Biting midges of the genus Culicoides transmit some of the most important arboviruses of wildlife and livestock worldwide, with a global impact on economic productivity, health and welfare. The absence of a suitable reference genome has hindered genomic analyses to date in this important genus of vectors. In the present study, the genome of Culicoides sonorensis, a vector of bluetongue virus (BTV) in the USA, has been sequenced to provide the first reference genome for these vectors. In this study, we also report the use of the reference genome to perform initial transcriptomic analyses of vector competence for BTV. RESULTS: Our analyses reveal that the genome is 189 Mb, assembled in 7974 scaffolds. Its annotation using the transcriptomic data generated in this study and in a previous study has identified 15,612 genes. Gene expression analyses of C. sonorensis females infected with BTV performed in this study revealed 165 genes that were differentially expressed between vector competent and refractory females. Two candidate genes, glutathione S-transferase (gst) and the antiviral helicase ski2, previously recognized as involved in vector competence for BTV in C. sonorensis (gst) and repressing dsRNA virus propagation (ski2), were confirmed in this study. CONCLUSIONS: The reference genome of C. sonorensis has enabled preliminary analyses of the gene expression profiles of vector competent and refractory individuals. The genome and transcriptomes generated in this study provide suitable tools for future research on arbovirus transmission. These provide a valuable resource for these vector lineage, which diverged from other major Dipteran vector families over 200 million years ago. The genome will be a valuable source of comparative data for other important Dipteran vector families including mosquitoes (Culicidae) and sandflies (Psychodidae), and together with the transcriptomic data can yield potential targets for transgenic modification in vector control and functional studies.


Assuntos
Vírus Bluetongue/fisiologia , Bluetongue/transmissão , Ceratopogonidae/genética , Ceratopogonidae/virologia , Genoma de Inseto , Insetos Vetores , Animais , Bluetongue/imunologia , Bluetongue/virologia , Vírus Bluetongue/imunologia , Ceratopogonidae/imunologia , Evolução Molecular , Perfilação da Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Imunidade Inata/genética , Insetos Vetores/genética , Insetos Vetores/fisiologia , Anotação de Sequência Molecular , Análise de Sequência de DNA , Transcriptoma/genética
17.
Bioinformatics ; 34(11): 1884-1892, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29390084

RESUMO

Motivation: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. Results: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. Availability and implementation: https://github.com/ima23/maxent-ppi. Contact: sbh11@cl.cam.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular , Máquina de Vetores de Suporte , Entropia
18.
Nature ; 544(7649): 235-239, 2017 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-28406212

RESUMO

A major goal of biomedicine is to understand the function of every gene in the human genome. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a 'human knockout project', a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.


Assuntos
Consanguinidade , Análise Mutacional de DNA , Deleção de Genes , Genes/genética , Estudos de Associação Genética/métodos , Homozigoto , Fenótipo , 1-Alquil-2-acetilglicerofosfocolina Esterase/deficiência , 1-Alquil-2-acetilglicerofosfocolina Esterase/genética , Apolipoproteína C-III/deficiência , Apolipoproteína C-III/genética , Estudos de Coortes , Doença das Coronárias/sangue , Doença das Coronárias/genética , Família 2 do Citocromo P450/genética , Gorduras na Dieta/farmacologia , Exoma/genética , Jejum/sangue , Feminino , Frequência do Gene , Humanos , Interleucina-8/sangue , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/sangue , Infarto do Miocárdio/genética , Neurregulinas/genética , Paquistão , Linhagem , Fosfoproteínas/genética , Período Pós-Prandial , Sítios de Splice de RNA/genética , Genética Reversa/métodos , Trocadores de Sódio-Hidrogênio/genética , Triglicerídeos/sangue
19.
Development ; 141(20): 3994-4005, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25294943

RESUMO

Although we now have a wealth of information on the transcription patterns of all the genes in the Drosophila genome, much less is known about the properties of the encoded proteins. To provide information on the expression patterns and subcellular localisations of many proteins in parallel, we have performed a large-scale protein trap screen using a hybrid piggyBac vector carrying an artificial exon encoding yellow fluorescent protein (YFP) and protein affinity tags. From screening 41 million embryos, we recovered 616 verified independent YFP-positive lines representing protein traps in 374 genes, two-thirds of which had not been tagged in previous P element protein trap screens. Over 20 different research groups then characterized the expression patterns of the tagged proteins in a variety of tissues and at several developmental stages. In parallel, we purified many of the tagged proteins from embryos using the affinity tags and identified co-purifying proteins by mass spectrometry. The fly stocks are publicly available through the Kyoto Drosophila Genetics Resource Center. All our data are available via an open access database (Flannotator), which provides comprehensive information on the expression patterns, subcellular localisations and in vivo interaction partners of the trapped proteins. Our resource substantially increases the number of available protein traps in Drosophila and identifies new markers for cellular organelles and structures.


Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/fisiologia , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Proteínas de Membrana/metabolismo , Animais , Proteínas de Bactérias/química , Cruzamentos Genéticos , Éxons , Feminino , Técnicas Genéticas , Genoma , Proteínas Luminescentes/química , Masculino , Ovário/metabolismo , Fatores Sexuais , Testículo/metabolismo , Transcrição Gênica
20.
Mol Cell Proteomics ; 12(1): 1-13, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23071097

RESUMO

Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.


Assuntos
Biologia Computacional/métodos , Complexos Multiproteicos/análise , Algoritmos , Animais , Bactérias/metabolismo , Cromatografia de Afinidade , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas , Metionina/metabolismo , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA