Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Proc Biol Sci ; 290(2011): 20231401, 2023 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-37989245

RESUMO

Flowering phenology is important in the adaptation of many plants to their local environment, but its adaptive value has not been extensively studied in herbaceous perennials. We used Arabis alpina as a model system to determine the importance of flowering phenology to fitness of a herbaceous perennial with a wide geographical range. Individual plants representative of local genetic diversity (accessions) were collected across Europe, including in Spain, the Alps and Scandinavia. The flowering behaviour of these accessions was documented in controlled conditions, in common-garden experiments at native sites and in situ in natural populations. Accessions from the Alps and Scandinavia varied in whether they required exposure to cold (vernalization) to induce flowering, and in the timing and duration of flowering. By contrast, all Spanish accessions obligately required vernalization and had a short duration of flowering. Using experimental gardens at native sites, we show that an obligate requirement for vernalization increases survival in Spain. Based on our analyses of genetic diversity and flowering behaviour across Europe, we propose that in the model herbaceous perennial A. alpina, an obligate requirement for vernalization, which is correlated with short duration of flowering, is favoured by selection in Spain where the plants experience a long growing season.


Assuntos
Arabis , Arabis/genética , Flores/genética , Geografia , Países Escandinavos e Nórdicos , Europa (Continente)
2.
Bioinformatics ; 37(2): 162-170, 2021 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-32797179

RESUMO

MOTIVATION: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. RESULTS: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. AVAILABILITY AND IMPLEMENTATION: Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Redes Neurais de Computação , Proteínas/genética
3.
Bioinformatics ; 36(4): 1182-1190, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31562759

RESUMO

MOTIVATION: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. RESULTS: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa. AVAILABILITY AND IMPLEMENTATION: MLC is available as a Python package at www.github.com/stamakro/MLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , RNA-Seq , Ontologia Genética , Fenótipo
4.
Bioinformatics ; 35(7): 1116-1124, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30169569

RESUMO

MOTIVATION: Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29 000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict. RESULTS: We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A.thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F-measure. AVAILABILITY AND IMPLEMENTATION: Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Software , Algoritmos , Sequência de Aminoácidos , Ontologia Genética , Anotação de Sequência Molecular
5.
Plant J ; 80(1): 136-48, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25039268

RESUMO

We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.


Assuntos
Variação Genética , Genoma de Planta/genética , Solanum lycopersicum/genética , Cruzamento , Mapeamento Cromossômico , DNA de Plantas/química , DNA de Plantas/genética , Frutas/genética , Sequenciamento de Nucleotídeos em Larga Escala , Dados de Sequência Molecular , Fenótipo , Filogenia , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência , Análise de Sequência de DNA , Especificidade da Espécie
6.
BMC Genomics ; 16: 374, 2015 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-25958312

RESUMO

BACKGROUND: In flowering plants it has been shown that de novo genome assemblies of different species and genera show a significant drop in the proportion of alignable sequence. Within a plant species, however, it is assumed that different haplotypes of the same chromosome align well. In this paper we have compared three de novo assemblies of potato chromosome 5 and report on the sequence variation and the proportion of sequence that can be aligned. RESULTS: For the diploid potato clone RH89-039-16 (RH) we produced two linkage phase controlled and haplotype-specific assemblies of chromosome 5 based on BAC-by-BAC sequencing, which were aligned to each other and compared to the 52 Mb chromosome 5 reference sequence of the doubled monoploid clone DM 1-3 516 R44 (DM). We identified 17.0 Mb of non-redundant sequence scaffolds derived from euchromatic regions of RH and 38.4 Mb from the pericentromeric heterochromatin. For 32.7 Mb of the RH sequences the correct position and order on chromosome 5 was determined, using genetic markers, fluorescence in situ hybridisation and alignment to the DM reference genome. This ordered fraction of the RH sequences is situated in the euchromatic arms and in the heterochromatin borders. In the euchromatic regions, the sequence collinearity between the three chromosomal homologs is good, but interruption of collinearity occurs at nine gene clusters. Towards and into the heterochromatin borders, absence of collinearity due to structural variation was more extensive and was caused by hemizygous and poorly aligning regions of up to 450 kb in length. In the most central heterochromatin, a total of 22.7 Mb sequence from both RH haplotypes remained unordered. These RH sequences have very few syntenic regions and represent a non-alignable region between the RH and DM heterochromatin haplotypes of chromosome 5. CONCLUSIONS: Our results show that among homologous potato chromosomes large regions are present with dramatic loss of sequence collinearity. This stresses the need for more de novo reference assemblies in order to capture genome diversity in this crop. The discovery of three highly diverged pericentric heterochromatin haplotypes within one species is a novelty in plant genome analysis. The possible origin and cytogenetic implication of this heterochromatin haplotype diversity are discussed.


Assuntos
Cromossomos de Plantas , Eucromatina/genética , Heterocromatina/genética , Solanum tuberosum/genética , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Eucromatina/metabolismo , Ligação Genética , Genótipo , Haplótipos , Heterocromatina/metabolismo , Hibridização in Situ Fluorescente , Polimorfismo Genético
7.
PLoS Genet ; 8(11): e1003088, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23209441

RESUMO

We sequenced and compared the genomes of the Dothideomycete fungal plant pathogens Cladosporium fulvum (Cfu) (syn. Passalora fulva) and Dothistroma septosporum (Dse) that are closely related phylogenetically, but have different lifestyles and hosts. Although both fungi grow extracellularly in close contact with host mesophyll cells, Cfu is a biotroph infecting tomato, while Dse is a hemibiotroph infecting pine. The genomes of these fungi have a similar set of genes (70% of gene content in both genomes are homologs), but differ significantly in size (Cfu >61.1-Mb; Dse 31.2-Mb), which is mainly due to the difference in repeat content (47.2% in Cfu versus 3.2% in Dse). Recent adaptation to different lifestyles and hosts is suggested by diverged sets of genes. Cfu contains an α-tomatinase gene that we predict might be required for detoxification of tomatine, while this gene is absent in Dse. Many genes encoding secreted proteins are unique to each species and the repeat-rich areas in Cfu are enriched for these species-specific genes. In contrast, conserved genes suggest common host ancestry. Homologs of Cfu effector genes, including Ecp2 and Avr4, are present in Dse and induce a Cf-Ecp2- and Cf-4-mediated hypersensitive response, respectively. Strikingly, genes involved in production of the toxin dothistromin, a likely virulence factor for Dse, are conserved in Cfu, but their expression differs markedly with essentially no expression by Cfu in planta. Likewise, Cfu has a carbohydrate-degrading enzyme catalog that is more similar to that of necrotrophs or hemibiotrophs and a larger pectinolytic gene arsenal than Dse, but many of these genes are not expressed in planta or are pseudogenized. Overall, comparison of their genomes suggests that these closely related plant pathogens had a common ancestral host but since adapted to different hosts and lifestyles by a combination of differentiated gene content, pseudogenization, and gene regulation.


Assuntos
Adaptação Fisiológica/genética , Cladosporium/genética , Genoma , Interações Hospedeiro-Patógeno , Sequência de Bases , Proteínas Fúngicas/genética , Regulação Fúngica da Expressão Gênica , Solanum lycopersicum/genética , Solanum lycopersicum/parasitologia , Filogenia , Pinus/genética , Pinus/parasitologia , Doenças das Plantas/genética
8.
PLoS Genet ; 7(6): e1002070, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21695235

RESUMO

The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicola was sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed "mesosynteny" is very different from synteny seen between other organisms. A surprising feature of the M. graminicola genome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors.


Assuntos
Ascomicetos/genética , Cromossomos Fúngicos/genética , Genoma Fúngico/genética , Ascomicetos/metabolismo , Ascomicetos/patogenicidade , Rearranjo Gênico , Doenças das Plantas/microbiologia , Sintenia , Triticum/microbiologia
9.
Nucleic Acids Res ; 39(Web Server issue): W524-7, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21609962

RESUMO

Although several tools for the analysis of ChIP-seq data have been published recently, there is a growing demand, in particular in the plant research community, for computational resources with which such data can be processed, analyzed, stored, visualized and integrated within a single, user-friendly environment. To accommodate this demand, we have developed PRI-CAT (Plant Research International ChIP-seq analysis tool), a web-based workflow tool for the management and analysis of ChIP-seq experiments. PRI-CAT is currently focused on Arabidopsis, but will be extended with other plant species in the near future. Users can directly submit their sequencing data to PRI-CAT for automated analysis. A QuickLoad server compatible with genome browsers is implemented for the storage and visualization of DNA-binding maps. Submitted datasets and results can be made publicly available through PRI-CAT, a feature that will enable community-based integrative analysis and visualization of ChIP-seq experiments. Secondary analysis of data can be performed with the aid of GALAXY, an external framework for tool and data integration. PRI-CAT is freely available at http://www.ab.wur.nl/pricat. No login is required.


Assuntos
Arabidopsis/genética , Imunoprecipitação da Cromatina/métodos , Proteínas de Plantas/metabolismo , Software , Fatores de Transcrição/metabolismo , Sítios de Ligação , Gráficos por Computador , Proteínas de Ligação a DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Regiões Promotoras Genéticas
11.
Plant Physiol ; 155(1): 271-81, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21098674

RESUMO

Although Arabidopsis (Arabidopsis thaliana) is the best studied plant species, the biological role of one-third of its proteins is still unknown. We developed a probabilistic protein function prediction method that integrates information from sequences, protein-protein interactions, and gene expression. The method was applied to proteins from Arabidopsis. Evaluation of prediction performance showed that our method has improved performance compared with single source-based prediction approaches and two existing integration approaches. An innovative feature of our method is that it enables transfer of functional information between proteins that are not directly associated with each other. We provide novel function predictions for 5,807 proteins. Recent experimental studies confirmed several of the predictions. We highlight these in detail for proteins predicted to be involved in flowering and floral organ development.


Assuntos
Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta/genética , Animais , Área Sob a Curva , Teorema de Bayes , Flores/embriologia , Flores/genética , Cadeias de Markov , Modelos Genéticos , Anotação de Sequência Molecular , Organogênese/genética , Reprodutibilidade dos Testes
12.
Microb Cell Fact ; 11: 36, 2012 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-22448915

RESUMO

Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains.


Assuntos
Genoma Fúngico , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Biotecnologia , Variações do Número de Cópias de DNA , DNA Fúngico/genética , Genes Fúngicos , Engenharia Metabólica/métodos , Fases de Leitura Aberta , Proteínas de Saccharomyces cerevisiae/metabolismo , Análise de Sequência de DNA
13.
BMC Bioinformatics ; 12: 444, 2011 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-22082126

RESUMO

BACKGROUND: In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. RESULTS: We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. CONCLUSIONS: A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments. AVAILABILITY: R-code of our implementation is available via http://www.ab.wur.nl/rmrcm.


Assuntos
Algoritmos , Mutação , Análise de Regressão , Sequência de Aminoácidos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Sequência Conservada , Proteínas de Domínio MADS/química , Proteínas de Domínio MADS/genética , Proteínas de Domínio MADS/metabolismo , Modelos Moleculares , Mapas de Interação de Proteínas , Análise de Sequência de Proteína
14.
Trends Genet ; 24(11): 539-51, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18819722

RESUMO

Orthology is a key evolutionary concept in many areas of genomic research. It provides a framework for subjects as diverse as the evolution of genomes, gene functions, cellular networks and functional genome annotation. Although orthologous proteins usually perform equivalent functions in different species, establishing true orthologous relationships requires a phylogenetic approach, which combines both trees and graphs (networks) using reliable species phylogeny and available genomic data from more than two species, and an insight into the processes of molecular evolution. Here, we evaluate the available bioinformatics tools and provide a set of guidelines to aid researchers in choosing the most appropriate tool for any situation.


Assuntos
Evolução Molecular , Genômica/métodos , Filogenia , Homologia de Sequência , Animais , Bases de Dados Genéticas , Genoma , Humanos , Proteínas/química
15.
BMC Plant Biol ; 11(1): 82, 2011 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-21575182

RESUMO

BACKGROUND: Large-scale analyses of genomics and transcriptomics data have revealed that alternative splicing (AS) substantially increases the complexity of the transcriptome in higher eukaryotes. However, the extent to which this complexity is reflected at the level of the proteome remains unclear. On the basis of a lack of conservation of AS between species, we previously concluded that AS does not frequently serve as a mechanism that enables the production of multiple functional proteins from a single gene. Following this conclusion, we hypothesized that the extent to which AS events contribute to the proteome diversity in Arabidopsis thaliana would be lower than expected on the basis of transcriptomics data. Here, we test this hypothesis by analyzing two large-scale proteomics datasets from Arabidopsis thaliana. RESULTS: A total of only 60 AS events could be confirmed using the proteomics data. However, for about 60% of the loci that, based on transcriptomics data, were predicted to produce multiple protein isoforms through AS, no isoform-specific peptides were found. We therefore performed in silico AS detection experiments to assess how well AS events were represented in the experimental datasets. The results of these in silico experiments indicated that the low number of confirmed AS events was the consequence of a limited sampling depth rather than in vivo under-representation of AS events in these datasets. CONCLUSION: Although the impact of AS on the functional properties of the proteome remains to be uncovered, the results of this study indicate that AS-induced diversity at the transcriptome level is also expressed at the proteome level.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Proteoma/genética , Processamento Alternativo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Hibridização Genômica Comparativa , DNA de Plantas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Genoma de Planta , Genômica , Polimorfismo Genético , Isoformas de Proteínas , Proteoma/metabolismo , Proteômica/métodos
16.
Theor Appl Genet ; 123(3): 493-508, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21590328

RESUMO

Like all plants, potato has evolved a surveillance system consisting of a large array of genes encoding for immune receptors that confer resistance to pathogens and pests. The majority of these so-called resistance or R proteins belong to the super-family that harbour a nucleotide binding and a leucine-rich-repeat domain (NB-LRR). Here, sequence information of the conserved NB domain was used to investigate the genome-wide genetic distribution of the NB-LRR resistance gene loci in potato. We analysed the sequences of 288 unique BAC clones selected using filter hybridisation screening of a BAC library of the diploid potato clone RH89-039-16 (S. tuberosum ssp. tuberosum) and a physical map of this BAC library. This resulted in the identification of 738 partial and full-length NB-LRR sequences. Based on homology of these sequences with known resistance genes, 280 and 448 sequences were classified as TIR-NB-LRR (TNL) and CC-NB-LRR (CNL) sequences, respectively. Genetic mapping revealed the presence of 15 TNL and 32 CNL loci. Thirty-six are novel, while three TNL loci and eight CNL loci are syntenic with previously identified functional resistance genes. The genetic map was complemented with 68 universal CAPS markers and 82 disease resistance trait loci described in literature, providing an excellent template for genetic studies and applied research in potato.


Assuntos
Mapeamento Cromossômico/métodos , Doenças das Plantas/genética , Locos de Características Quantitativas , Solanum tuberosum/genética , Clonagem Molecular , Resistência à Doença , Perfilação da Expressão Gênica , Biblioteca Gênica , Genes de Plantas , Ligação Genética , Doenças das Plantas/imunologia , Imunidade Vegetal , Proteínas de Plantas/química , Proteínas de Plantas/genética , Análise de Sequência de DNA , Solanum tuberosum/imunologia
17.
PLoS Comput Biol ; 6(11): e1001017, 2010 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-21124869

RESUMO

Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution.


Assuntos
Motivos de Aminoácidos , Proteínas de Domínio MADS/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Sequência de Aminoácidos , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Bases de Dados de Proteínas , Evolução Molecular , Proteínas de Domínio MADS/genética , Proteínas de Domínio MADS/metabolismo , Modelos Moleculares , Modelos Estatísticos , Dados de Sequência Molecular , Mutação , Reprodutibilidade dos Testes , Alinhamento de Sequência
18.
Plant J ; 58(5): 857-69, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19207213

RESUMO

We studied the physical and genetic organization of chromosome 6 of tomato (Solanum lycopersicum) cv. Heinz 1706 by combining bacterial artificial chromosome (BAC) sequence analysis, high-information-content fingerprinting, genetic analysis, and BAC-fluorescent in situ hybridization (FISH) mapping data. The chromosome positions of 81 anchored seed and extension BACs corresponded in most cases with the linear marker order on the high-density EXPEN 2000 linkage map. We assembled 25 BAC contigs and eight singleton BACs spanning 2.0 Mb of the short-arm euchromatin, 1.8 Mb of the pericentromeric heterochromatin and 6.9 Mb of the long-arm euchromatin. Sequence data were combined with their corresponding genetic and pachytene chromosome positions into an integrated map that covers approximately a third of the chromosome 6 euchromatin and a small part of the pericentromeric heterochromatin. We then compared physical length (Mb), genetic (cM) and chromosome distances (microm) for determining gap sizes between contigs, revealing relative hot and cold spots of recombination. Through sequence annotation we identified several clusters of functionally related genes and an uneven distribution of both gene and repeat sequences between heterochromatin and euchromatin domains. Although a greater number of the non-transposon genes were located in the euchromatin, the highly repetitive (22.4%) pericentromeric heterochromatin displayed an unexpectedly high gene content of one gene per 36.7 kb. Surprisingly, the short-arm euchromatin was relatively rich in repeats as well, with a repeat content of 13.4%, yet the ratio of Ty3/Gypsy and Ty1/Copia retrotransposable elements across the chromosome clearly distinguished euchromatin (2:3) from heterochromatin (3:2).


Assuntos
Cromossomos de Plantas/genética , Genes de Plantas , Retroelementos , Solanum lycopersicum/genética , Passeio de Cromossomo , Cromossomos Artificiais Bacterianos , Mapeamento de Sequências Contíguas , Impressões Digitais de DNA , DNA de Plantas/genética , Eucromatina , Heterocromatina , Hibridização in Situ Fluorescente , Análise de Sequência de DNA
19.
BMC Genomics ; 11: 607, 2010 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-20979667

RESUMO

BACKGROUND: Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS: Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION: Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.


Assuntos
Sequência Conservada/genética , Proteínas de Domínio MADS/genética , Mutação/genética , Proteínas de Plantas/genética , Sequência de Bases , Análise Mutacional de DNA , Proteínas de Domínio MADS/química , Dados de Sequência Molecular , Proteínas de Plantas/química , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes
20.
PLoS One ; 15(11): e0242723, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33237964

RESUMO

Physical interaction between two proteins is strong evidence that the proteins are involved in the same biological process, making Protein-Protein Interaction (PPI) networks a valuable data resource for predicting the cellular functions of proteins. However, PPI networks are largely incomplete for non-model species. Here, we tested to what extent these incomplete networks are still useful for genome-wide function prediction. We used two network-based classifiers to predict Biological Process Gene Ontology terms from protein interaction data in four species: Saccharomyces cerevisiae, Escherichia coli, Arabidopsis thaliana and Solanum lycopersicum (tomato). The classifiers had reasonable performance in the well-studied yeast, but performed poorly in the other species. We showed that this poor performance can be considerably improved by adding edges predicted from various data sources, such as text mining, and that associations from the STRING database are more useful than interactions predicted by a neural network from sequence-based features.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Proteínas de Escherichia coli , Escherichia coli , Anotação de Sequência Molecular , Mapas de Interação de Proteínas/fisiologia , Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Solanum lycopersicum , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Solanum lycopersicum/genética , Solanum lycopersicum/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA