Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
BMC Genomics ; 18(1): 326, 2017 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-28441938

RESUMEN

BACKGROUND: Mitochondrial dysfunction is linked to numerous pathological states, in particular related to metabolism, brain health and ageing. Nuclear encoded gene polymorphisms implicated in mitochondrial functions can be analyzed in the context of classical genome wide association studies. By contrast, mitochondrial DNA (mtDNA) variants are more challenging to identify and analyze for several reasons. First, contrary to the diploid nuclear genome, each cell carries several hundred copies of the circular mitochondrial genome. Mutations can therefore be present in only a subset of the mtDNA molecules, resulting in a heterogeneous pool of mtDNA, a situation referred to as heteroplasmy. Consequently, detection and quantification of variants requires extremely accurate tools, especially when this proportion is small. Additionally, the mitochondrial genome has pseudogenized into numerous copies within the nuclear genome over the course of evolution. These nuclear pseudogenes, named NUMTs, must be distinguished from genuine mtDNA sequences and excluded from the analysis. RESULTS: Here we describe a novel method, named MitoRS, in which the entire mitochondrial genome is amplified in a single reaction using rolling circle amplification. This approach is easier to setup and of higher throughput when compared to classical PCR amplification. Sequencing libraries are generated at high throughput exploiting a tagmentation-based method. Fine-tuned parameters are finally applied in the analysis to allow detection of variants even of low frequency heteroplasmy. The method was thoroughly benchmarked in a set of experiments designed to demonstrate its robustness, accuracy and sensitivity. The MitoRS method requires 5 ng total DNA as starting material. More than 96 samples can be processed in less than a day of laboratory work and sequenced in a single lane of an Illumina HiSeq flow cell. The lower limit for accurate quantification of single nucleotide variants has been measured at 1% frequency. CONCLUSIONS: The MitoRS method enables the robust, accurate, and sensitive analysis of a large number of samples. Because it is cost effective and simple to setup, we anticipate this method will promote the analysis of mtDNA variants in large cohorts, and may help assessing the impact of mtDNA heteroplasmy on metabolic health, brain function, cancer progression, or ageing.


Asunto(s)
ADN Mitocondrial/análisis , Técnicas de Amplificación de Ácido Nucleico/métodos , ADN Mitocondrial/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Polimorfismo de Nucleótido Simple , Reacción en Cadena en Tiempo Real de la Polimerasa , Análisis de Secuencia de ADN
2.
Plant J ; 69(3): 475-88, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21967390

RESUMEN

Sireviruses are one of the three genera of Copia long terminal repeat (LTR) retrotransposons, exclusive to and highly abundant in plants, and with a unique, among retrotransposons, genome structure. Yet, perhaps due to the few references to the Sirevirus origin of some families, compounded by the difficulty in correctly assigning retrotransposon families into genera, Sireviruses have hardly featured in recent research. As a result, analysis at this key level of classification and details of their colonization and impact on plant genomes are currently lacking. Recently, however, it became possible to accurately assign elements from diverse families to this genus in one step, based on highly conserved sequence motifs. Hence, Sirevirus dynamics in the relatively obese maize genome can now be comprehensively studied. Overall, we identified >10 600 intact and approximately 28 000 degenerate Sirevirus elements from a plethora of families, some brought into the genus for the first time. Sireviruses make up approximately 90% of the Copia population and it is the only genus that has successfully infiltrated the genome, possibly by experiencing intense amplification during the last 600 000 years, while being constantly recycled by host mechanisms. They accumulate in chromosome-distal gene-rich areas, where they insert in between gene islands, mainly in preferred zones within their own genomes. Sirevirus LTRs are heavily methylated, while there is evidence for a palindromic consensus target sequence. This work brings Sireviruses in the spotlight, elucidating their lifestyle and history, and suggesting their crucial role in the current genomic make-up of maize, and possibly other plant hosts.


Asunto(s)
Evolución Molecular , Genoma de Planta , Retroelementos , Zea mays/genética , Algoritmos , Metilación de ADN , ADN de Plantas/genética , Filogenia , Análisis de Secuencia de ADN
3.
Mol Biol Evol ; 29(11): 3371-84, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22628532

RESUMEN

Most fungal plant pathogens secrete effector proteins during pathogenesis to manipulate their host's defense and promote disease. These are so highly diverse in sequence and distribution, they are essentially considered as species-specific. However, we have recently shown the presence of homologous effectors in fungal species of the Dothideomycetes class. One such example is Ecp2, an effector originally described in the tomato pathogen Cladosporium fulvum but later detected in the plant pathogenic fungi Mycosphaerella fijiensis and Mycosphaerella graminicola as well. Here, using in silico sequence-similarity searches against a database of 135 fungal genomes and GenBank, we extend our queries for homologs of Ecp2 to the fungal kingdom and beyond, and further study their history of diversification. Our analyses show that Ecp2 homologs are members of an ancient and widely distributed superfamily of putative fungal effectors, which we term Hce2 for Homologs of C. fulvum Ecp2. Molecular evolutionary analyses show that the superfamily originated and diversified within the fungal kingdom, experiencing multiple lineage-specific expansions and losses that are consistent with the birth-and-death model of gene family evolution. Newly formed paralogs appear to be subject to diversification early after gene duplication events, whereas at later stages purifying selection acts to preserve diversity and the newly evolved putative functions. Some members of the Hce2 superfamily are fused to fungal Glycoside Hydrolase family 18 chitinases that show high similarity to the Zymocin killer toxin from the dairy yeast Kluyveromyces lactis, suggesting an analogous role in antagonistic interactions. The observed high rates of gene duplication and loss in the Hce2 superfamily, combined with diversification in both sequence and possibly functions within and between species, suggest that Hce2s are involved in adaptation to stresses and new ecological niches. Such findings address the need to rationalize effector biology and evolution beyond the perspective of solely host-microbe interactions.


Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Proteínas Fúngicas/genética , Familia de Multigenes , Secuencia de Aminoácidos , Proteínas Fúngicas/química , Hongos/clasificación , Hongos/genética , Duplicación de Gen/genética , Especiación Genética , Genoma Fúngico/genética , Modelos Genéticos , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Filogenia , Estructura Terciaria de Proteína , Especificidad de la Especie
4.
Plant Physiol ; 155(1): 271-81, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21098674

RESUMEN

Although Arabidopsis (Arabidopsis thaliana) is the best studied plant species, the biological role of one-third of its proteins is still unknown. We developed a probabilistic protein function prediction method that integrates information from sequences, protein-protein interactions, and gene expression. The method was applied to proteins from Arabidopsis. Evaluation of prediction performance showed that our method has improved performance compared with single source-based prediction approaches and two existing integration approaches. An innovative feature of our method is that it enables transfer of functional information between proteins that are not directly associated with each other. We provide novel function predictions for 5,807 proteins. Recent experimental studies confirmed several of the predictions. We highlight these in detail for proteins predicted to be involved in flowering and floral organ development.


Asunto(s)
Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma de Planta/genética , Animales , Área Bajo la Curva , Teorema de Bayes , Flores/embriología , Flores/genética , Cadenas de Markov , Modelos Genéticos , Anotación de Secuencia Molecular , Organogénesis/genética , Reproducibilidad de los Resultados
5.
PLoS One ; 5(12): e14147, 2010 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-21188141

RESUMEN

A major challenge in the field of systems biology consists of predicting gene regulatory networks based on different training data. Within the DREAM4 initiative, we took part in the multifactorial sub-challenge that aimed to predict gene regulatory networks of size 100 from training data consisting of steady-state levels obtained after applying multifactorial perturbations to the original in silico network. Due to the static character of the challenge data, we tackled the problem via a sparse Gaussian Markov Random Field, which relates network topology with the covariance inverse generated by the gene measurements. As for the computations, we used the Graphical Lasso algorithm which provided a large range of candidate network topologies. The main task was to select the optimal network topology and for that, different model selection criteria were explored. The selected networks were compared with the golden standards and the results ranked using the scoring metrics applied in the challenge, giving a better insight in our submission and the way to improve it.Our approach provides an easy statistical and computational framework to infer gene regulatory networks that is suitable for large networks, even if the number of the observations (perturbations) is greater than the number of variables (genes).


Asunto(s)
Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Perfilación de la Expresión Génica , Cadenas de Markov , Modelos Estadísticos , Análisis Multivariante , Distribución Normal , Reproducibilidad de los Resultados
6.
PLoS One ; 5(2): e9293, 2010 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-20195360

RESUMEN

Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S. cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.


Asunto(s)
Teorema de Bayes , Cadenas de Markov , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Algoritmos , Bases de Datos de Proteínas , Redes Reguladoras de Genes , Método de Montecarlo , Unión Proteica , Proteínas/genética , Reproducibilidad de los Resultados , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
7.
In Silico Biol ; 7(6): 575-82, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18467770

RESUMEN

The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6-60%) of genes have been annotated with multiple and hierarchically independent terms. This may be necessary to attain adequate specificity of description. One noticeable exception is Arabidopsis thaliana, in which genes are much less frequently annotated with multiple terms (6-14%). In contrast, an analysis of the occurrence of InterPro hits in the proteomes of the seven species, followed by a mapping of the hits to GO terms, did not reveal an aberrant pattern for the A. thaliana genome. This study shows the widespread usage of multiple hierarchically independent GO terms in the functional annotation of genes. By consequence, probabilistic methods that aim to predict gene function automatically through integration of diverse genomic datasets, and that employ the GO, must be able to predict such multiple terms. We attribute the low frequency with which multiple GO terms are used in Arabidopsis to deviating practices in the genome annotation and curation process between communities of annotators. This may bias genome-scale comparisons of gene function between different species. GO term assignment should therefore be performed according to strictly similar rules and standards.


Asunto(s)
Regulación de la Expresión Génica , Genes/fisiología , Genoma , Modelos Genéticos , Animales , Arabidopsis/genética , Genoma de Planta , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA