Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 54, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38302873

RESUMEN

BACKGROUND: Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms. RESULTS: We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35-69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58-81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly. CONCLUSION: We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma , Análisis de Secuencia , RNA-Seq , Análisis de Secuencia de ARN , Isoformas de Proteínas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
2.
New Phytol ; 236(5): 1951-1963, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36076311

RESUMEN

Reproductive phase change is well characterized in angiosperm model species, but less studied in gymnosperms. We utilize the early cone-setting acrocona mutant to study reproductive phase change in the conifer Picea abies (Norway spruce), a gymnosperm. The acrocona mutant frequently initiates cone-like structures, called transition shoots, in positions where wild-type P. abies always produces vegetative shoots. We collect acrocona and wild-type samples, and RNA-sequence their messenger RNA (mRNA) and microRNA (miRNA) fractions. We establish gene expression patterns and then use allele-specific transcript assembly to identify mutations in acrocona. We genotype a segregating population of inbred acrocona trees. A member of the SQUAMOSA BINDING PROTEIN-LIKE (SPL) gene family, PaSPL1, is active in reproductive meristems, whereas two putative negative regulators of PaSPL1, miRNA156 and the conifer specific miRNA529, are upregulated in vegetative and transition shoot meristems. We identify a mutation in a putative miRNA156/529 binding site of the acrocona PaSPL1 allele and show that the mutation renders the acrocona allele tolerant to these miRNAs. We show co-segregation between the early cone-setting phenotype and trees homozygous for the acrocona mutation. In conclusion, we demonstrate evolutionary conservation of the age-dependent flowering pathway and involvement of this pathway in regulating reproductive phase change in the conifer P. abies.


Asunto(s)
Picea , Tracheophyta , Picea/genética , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Meristema/metabolismo , Reproducción/genética , Tracheophyta/metabolismo
3.
Nucleic Acids Res ; 45(6): 3253-3265, 2017 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-28175342

RESUMEN

Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation.


Asunto(s)
Arabidopsis/genética , Cromatina/metabolismo , Regulación de la Expresión Génica de las Plantas , Arabidopsis/crecimiento & desarrollo , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Flores/genética , Flores/crecimiento & desarrollo , Duplicación de Gen , Genes de Plantas , Genoma de Planta , Código de Histonas , Regiones Promotoras Genéticas , Factores de Transcripción/genética , Activación Transcripcional
4.
Nucleic Acids Res ; 42(5): 3330-45, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24369430

RESUMEN

Dictyostelium intermediate repeat sequence 1 (DIRS-1) is the founding member of a poorly characterized class of retrotransposable elements that contain inverse long terminal repeats and tyrosine recombinase instead of DDE-type integrase enzymes. In Dictyostelium discoideum, DIRS-1 forms clusters that adopt the function of centromeres, rendering tight retrotransposition control critical to maintaining chromosome integrity. We report that in deletion strains of the RNA-dependent RNA polymerase RrpC, full-length and shorter DIRS-1 messenger RNAs are strongly enriched. Shorter versions of a hitherto unknown long non-coding RNA in DIRS-1 antisense orientation are also enriched in rrpC- strains. Concurrent with the accumulation of long transcripts, the vast majority of small (21 mer) DIRS-1 RNAs vanish in rrpC- strains. RNASeq reveals an asymmetric distribution of the DIRS-1 small RNAs, both along DIRS-1 and with respect to sense and antisense orientation. We show that RrpC is required for post-transcriptional DIRS-1 silencing and also for spreading of RNA silencing signals. Finally, DIRS-1 mis-regulation in the absence of RrpC leads to retrotransposon mobilization. In summary, our data reveal RrpC as a key player in the silencing of centromeric retrotransposon DIRS-1. RrpC acts at the post-transcriptional level and is involved in spreading of RNA silencing signals, both in the 5' and 3' directions.


Asunto(s)
Dictyostelium/genética , Interferencia de ARN , ARN Polimerasa Dependiente del ARN/fisiología , Retroelementos , Núcleo Celular/genética , Dictyostelium/enzimología , Genoma , Regiones Promotoras Genéticas , ARN sin Sentido/metabolismo , ARN Mensajero/metabolismo , ARN Pequeño no Traducido/metabolismo , ARN Polimerasa Dependiente del ARN/genética , Secuencias Repetidas Terminales
5.
BMC Genomics ; 15: 631, 2014 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-25070246

RESUMEN

BACKGROUND: Strand specific RNA sequencing is rapidly replacing conventional cDNA sequencing as an approach for assessing information about the transcriptome. Alongside improved laboratory protocols the development of bioinformatical tools is steadily progressing. In the current procedure the Illumina TruSeq library preparation kit is used, along with additional reagents, to make stranded libraries in an automated fashion which are then sequenced on Illumina HiSeq 2000. By the use of freely available bioinformatical tools we show, through quality metrics, that the protocol is robust and reproducible. We further highlight the practicality of strand specific libraries by comparing expression of strand specific libraries to non-stranded libraries, by looking at known antisense transcription of pseudogenes and by identifying novel transcription. Furthermore, two ribosomal depletion kits, RiboMinus and RiboZero, are compared and two sequence aligners, Tophat2 and STAR, are also compared. RESULTS: The, non-stranded, Illumina TruSeq kit can be adapted to generate strand specific libraries and can be used to access detailed information on the transcriptome. The RiboZero kit is very effective in removing ribosomal RNA from total RNA and the STAR aligner produces high mapping yield in a short time. Strand specific data gives more detailed and correct results than does non-stranded data as we show when estimating expression values and in assembling transcripts. Even well annotated genomes need improvements and corrections which can be achieved using strand specific data. CONCLUSIONS: Researchers in the field should strive to use strand specific data; it allows for more confidence in the data analysis and is less likely to lead to false conclusions. If faced with analysing non-stranded data, researchers should be well aware of the caveats of that approach.


Asunto(s)
Análisis de Secuencia de ARN , Automatización , Línea Celular , Mapeo Cromosómico , Análisis por Conglomerados , Biología Computacional , Biblioteca de Genes , Humanos , Control de Calidad , ARN sin Sentido/metabolismo , Análisis de Secuencia de ARN/normas , Transcriptoma
6.
Plant Physiol ; 161(2): 813-23, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23221834

RESUMEN

Conifers normally go through a long juvenile period, for Norway spruce (Picea abies) around 20 to 25 years, before developing male and female cones. We have grown plants from inbred crosses of a naturally occurring spruce mutant (acrocona). One-fourth of the segregating acrocona plants initiate cones already in their second growth cycle, suggesting control by a single locus. The early cone-setting properties of the acrocona mutant were utilized to identify candidate genes involved in vegetative-to-reproductive phase change in Norway spruce. Poly(A(+)) RNA samples from apical and basal shoots of cone-setting and non-cone-setting plants were subjected to high-throughput sequencing (RNA-seq). We assembled and investigated 33,383 expressed putative protein-coding acrocona transcripts. Eight transcripts were differentially expressed between selected sample pairs. One of these (Acr42124_1) was significantly up-regulated in apical shoot samples from cone-setting acrocona plants, and the encoded protein belongs to the MADS box gene family of transcription factors. Using quantitative real-time polymerase chain reaction with independently derived plant material, we confirmed that the MADS box gene is up-regulated in both needles and buds of cone-inducing shoots when reproductive identity is determined. Our results constitute important steps for the development of a rapid cycling model system that can be used to study gene function in conifers. In addition, our data suggest the involvement of a MADS box transcription factor in the vegetative-to-reproductive phase change in Norway spruce.


Asunto(s)
Perfilación de la Expresión Génica , Proteínas de Dominio MADS/genética , Picea/genética , Proteínas de Plantas/genética , Cruzamientos Genéticos , Regulación del Desarrollo de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Secuenciación de Nucleótidos de Alto Rendimiento , Proteínas de Dominio MADS/clasificación , Proteínas de Dominio MADS/metabolismo , Mutación , Noruega , Fenotipo , Filogenia , Picea/crecimiento & desarrollo , Picea/metabolismo , Hojas de la Planta/genética , Hojas de la Planta/crecimiento & desarrollo , Hojas de la Planta/metabolismo , Proteínas de Plantas/clasificación , Proteínas de Plantas/metabolismo , Brotes de la Planta/genética , Brotes de la Planta/crecimiento & desarrollo , Brotes de la Planta/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Suecia
7.
Genes Chromosomes Cancer ; 52(4): 378-84, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23341325

RESUMEN

Melanoma of the eye is a rare and distinct subtype of melanoma, which only rarely are familial. However, cases of uveal melanoma (UM) have been found in families with mixed cancer syndromes. Here, we describe a comprehensive search for inherited genetic variation in a family with multiple cases of UM but no aggregation of other cancer diagnoses. The proband is a woman diagnosed with UM at 16 years who within 6 months developed liver metastases. We also identified two older paternal relatives of the proband who had died from UM. We performed exome sequencing of germline DNA from members of the affected family. Exome-wide analysis identified a novel loss-of-function mutation in the BAP1 gene, previously suggested as a tumor suppressor. The mutation segregated with the UM phenotype in this family, and we detected a loss of the wild-type allele in the UM tumor of the proband, strongly supporting a causative association with UM. Screening of BAP1 germline mutations in families predisposed for UM may be used to identify individuals at increased risk of disease. Such individuals may then be enrolled in preventive programs and regular screenings to facilitate early detection and thereby improve prognosis.


Asunto(s)
Mutación de Línea Germinal , Melanoma/genética , Proteínas Supresoras de Tumor/genética , Ubiquitina Tiolesterasa/genética , Neoplasias de la Úvea/genética , Adolescente , Análisis Mutacional de ADN , Salud de la Familia , Femenino , Predisposición Genética a la Enfermedad/genética , Humanos , Masculino , Melanoma/patología , Linaje , Factores de Riesgo , Neoplasias de la Úvea/patología
8.
bioRxiv ; 2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37398033

RESUMEN

Muscular atrophy is a mortality risk factor that happens with disuse, chronic disease, and aging. Recovery from atrophy requires changes in several cell types including muscle fibers, and satellite and immune cells. Here we show that Zfp697/ZNF697 is a damage-induced regulator of muscle regeneration, during which its expression is transiently elevated. Conversely, sustained Zfp697 expression in mouse muscle leads to a gene expression signature of chemokine secretion, immune cell recruitment, and extracellular matrix remodeling. Myofiber-specific Zfp697 ablation hinders the inflammatory and regenerative response to muscle injury, compromising functional recovery. We uncover Zfp697 as an essential interferon gamma mediator in muscle cells, interacting primarily with ncRNAs such as the pro-regenerative miR-206. In sum, we identify Zfp697 as an integrator of cell-cell communication necessary for tissue regeneration.

9.
BMC Genomics ; 11: 684, 2010 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-21126332

RESUMEN

BACKGROUND: An interesting field of research in genomics and proteomics is to compare the overlap between the transcriptome and the proteome. Recently, the tools to analyse gene and protein expression on a whole-genome scale have been improved, including the availability of the new generation sequencing instruments and high-throughput antibody-based methods to analyze the presence and localization of proteins. In this study, we used massive transcriptome sequencing (RNA-seq) to investigate the transcriptome of a human osteosarcoma cell line and compared the expression levels with in situ protein data obtained in-situ from antibody-based immunohistochemistry (IHC) and immunofluorescence microscopy (IF). RESULTS: A large-scale analysis based on 2749 genes was performed, corresponding to approximately 13% of the protein coding genes in the human genome. We found the presence of both RNA and proteins to a large fraction of the analyzed genes with 60% of the analyzed human genes detected by all three methods. Only 34 genes (1.2%) were not detected on the transcriptional or protein level with any method. Our data suggest that the majority of the human genes are expressed at detectable transcript or protein levels in this cell line. Since the reliability of antibodies depends on possible cross-reactivity, we compared the RNA and protein data using antibodies with different reliability scores based on various criteria, including Western blot analysis. Gene products detected in all three platforms generally have good antibody validation scores, while those detected only by antibodies, but not by RNA sequencing, generally consist of more low-scoring antibodies. CONCLUSION: This suggests that some antibodies are staining the cells in an unspecific manner, and that assessment of transcript presence by RNA-seq can provide guidance for validation of the corresponding antibodies.


Asunto(s)
Proteínas de Neoplasias/metabolismo , Osteosarcoma/genética , Osteosarcoma/metabolismo , ARN Mensajero/genética , Western Blotting , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Genes Relacionados con las Neoplasias/genética , Humanos , Inmunohistoquímica , Proteínas de Neoplasias/genética , ARN Mensajero/metabolismo , ARN Neoplásico/genética , ARN Neoplásico/metabolismo , Reproducibilidad de los Resultados , Transcripción Genética
10.
Life Sci Alliance ; 2(5)2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31570514

RESUMEN

In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.


Asunto(s)
Biología Computacional/métodos , Péptidos/análisis , Péptidos/genética , Secuencia de Aminoácidos , Cloroplastos/genética , Cloroplastos/metabolismo , Aprendizaje Profundo , Hongos/genética , Hongos/metabolismo , Metionina/metabolismo , Señales de Clasificación de Proteína , Tilacoides/genética , Tilacoides/metabolismo
11.
Front Plant Sci ; 9: 1625, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30483285

RESUMEN

Recent efforts to sequence the genomes and transcriptomes of several gymnosperm species have revealed an increased complexity in certain gene families in gymnosperms as compared to angiosperms. One example of this is the gymnosperm sister clade to angiosperm TM3-like MADS-box genes, which at least in the conifer lineage has expanded in number of genes. We have previously identified a member of this sub-clade, the conifer gene DEFICIENS AGAMOUS LIKE 19 (DAL19), as being specifically upregulated in cone-setting shoots. Here, we show through Sanger sequencing of mRNA-derived cDNA and mapping to assembled conifer genomic sequences that DAL19 produces six mature mRNA splice variants in Picea abies. These splice variants use alternate first and last exons, while their four central exons constitute a core region present in all six transcripts. Thus, they are likely to be transcript isoforms. Quantitative Real-Time PCR revealed that two mutually exclusive first DAL19 exons are differentially expressed across meristems that will form either male or female cones, or vegetative shoots. Furthermore, mRNA in situ hybridization revealed that two mutually exclusive last DAL19 exons were expressed in a cell-specific pattern within bud meristems. Based on these findings in DAL19, we developed a sensitive approach to transcript isoform assembly from short-read sequencing of mRNA. We applied this method to 42 putative MADS-box core regions in P. abies, from which we assembled 1084 putative transcripts. We manually curated these transcripts to arrive at 933 assembled transcript isoforms of 38 putative MADS-box genes. 152 of these isoforms, which we assign to 28 putative MADS-box genes, were differentially expressed across eight female, male, and vegetative buds. We further provide evidence of the expression of 16 out of the 38 putative MADS-box genes by mapping PacBio Iso-Seq circular consensus reads derived from pooled sample sequencing to assembled transcripts. In summary, our analyses reveal the use of mutually exclusive exons of MADS-box gene isoforms during early bud development in P. abies, and we find that the large number of identified MADS-box transcripts in P. abies results not only from expansion of the gene family through gene duplication events but also from the generation of numerous splice variants.

12.
Methods Enzymol ; 411: 282-311, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16939796

RESUMEN

A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.


Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Análisis por Matrices de Proteínas/métodos , Análisis por Matrices de Proteínas/estadística & datos numéricos , Animales , Interpretación Estadística de Datos , Humanos
13.
Sci Rep ; 6: 21134, 2016 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-26887787

RESUMEN

Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively.


Asunto(s)
Alelos , Desequilibrio Alélico , Regulación de la Expresión Génica , Femenino , Humanos , Leucocitos , Masculino
14.
J Mol Biol ; 330(2): 443-56, 2003 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-12823981

RESUMEN

In an attempt to improve our abilities to predict peroxisomal proteins, we have combined machine-learning techniques for analyzing peroxisomal targeting signals (PTS1) with domain-based cross-species comparisons between eight eukaryotic genomes. Our results indicate that this combined approach has a significantly higher specificity than earlier attempts to predict peroxisomal localization, without a loss in sensitivity. This allowed us to predict 430 peroxisomal proteins that almost completely lack a localization annotation. These proteins can be grouped into 29 families covering most of the known steps in all known peroxisomal pathways. In general, plants have the highest number of predicted peroxisomal proteins, and fungi the smallest number.


Asunto(s)
Peroxisomas/genética , Proteoma , Secuencia de Aminoácidos , Animales , Simulación por Computador , Bases de Datos de Proteínas , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Datos de Secuencia Molecular , Oxidación-Reducción , Peroxisomas/metabolismo , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteómica
15.
Protein Sci ; 12(10): 2360-6, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14500894

RESUMEN

We report the development of LumenP, a new neural network-based predictor for the identification of proteins targeted to the thylakoid lumen of plant chloroplasts and prediction of their cleavage sites. When used together with the previously developed TargetP predictor, LumenP reaches a significantly better performance than what has been recorded for previous attempts at predicting thylakoid lumen location, mostly due to a lower false positive rate. The combination of TargetP and LumenP predicts around 1.5%-3% of all proteins encoded in the genomes of Arabidopsis thaliana and Oryza sativa to be located in the lumen of the thylakoid.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Proteínas de Plantas/análisis , Tilacoides/química , Arabidopsis/genética , Arabidopsis/fisiología , Proteínas de Arabidopsis/análisis , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/fisiología , Inteligencia Artificial , Proteínas de Cloroplastos , Cloroplastos/química , Cloroplastos/fisiología , Bases de Datos de Proteínas , Genoma de Planta , Biblioteca Genómica , Oryza/genética , Oryza/fisiología , Pisum sativum/genética , Pisum sativum/fisiología , Proteínas de Plantas/química , Proteínas de Plantas/fisiología , Señales de Clasificación de Proteína/fisiología , Transporte de Proteínas , Proteoma , Sensibilidad y Especificidad , Alineación de Secuencia , Tilacoides/metabolismo
16.
PLoS One ; 9(3): e91851, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24632678

RESUMEN

RNA sequencing has become widely used in gene expression profiling experiments. Prior to any RNA sequencing experiment the quality of the RNA must be measured to assess whether or not it can be used for further downstream analysis. The RNA integrity number (RIN) is a scale used to measure the quality of RNA that runs from 1 (completely degraded) to 10 (intact). Ideally, samples with high RIN (> 8) are used in RNA sequencing experiments. RNA, however, is a fragile molecule which is susceptible to degradation and obtaining high quality RNA is often hard, or even impossible when extracting RNA from certain clinical tissues. Thus, occasionally, working with low quality RNA is the only option the researcher has. Here we investigate the effects of RIN on RNA sequencing and suggest a computational method to handle data from samples with low quality RNA which also enables reanalysis of published datasets. Using RNA from a human cell line we generated and sequenced samples with varying RINs and illustrate what effect the RIN has on the basic procedure of RNA sequencing; both quality aspects and differential expression. We show that the RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads. We introduce 3' tag counting (3TC) as a computational approach to reliably estimate differential expression for samples with low RIN. We show that using the 3TC method in differential expression analysis significantly reduces false positives when comparing samples with different RIN, while retaining reasonable sensitivity.


Asunto(s)
Estabilidad del ARN , ARN/química , ARN/genética , Análisis de Secuencia de ARN/métodos , Línea Celular Tumoral , Humanos , Transcriptoma
17.
PLoS One ; 9(7): e103610, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25076135

RESUMEN

Sepsis is a severe medical condition characterized by a systemic inflammatory response of the body caused by pathogenic microorganisms in the bloodstream. Blood or plasma is typically used for diagnosis, both containing large amount of human DNA, greatly exceeding the DNA of microbial origin. In order to enrich bacterial DNA, we applied the C0t effect to reduce human DNA background: a model system was set up with human and Escherichia coli (E. coli) DNA to mimic the conditions of bloodstream infections; and this system was adapted to plasma and blood samples from septic patients. As a consequence of the C0t effect, abundant DNA hybridizes faster than rare DNA. Following denaturation and re-hybridization, the amount of abundant DNA can be decreased with the application of double strand specific nucleases, leaving the non-hybridized rare DNA intact. Our experiments show that human DNA concentration can be reduced approximately 100,000-fold without affecting the E. coli DNA concentration in a model system with similarly sized amplicons. With clinical samples, the human DNA background was decreased 100-fold, as bacterial genomes are approximately 1,000-fold smaller compared to the human genome. According to our results, background suppression can be a valuable tool to enrich rare DNA in clinical samples where a high amount of background DNA can be found.


Asunto(s)
ADN/sangre , Desoxirribonucleasas/química , Infecciones por Escherichia coli/diagnóstico , Escherichia coli/genética , Sepsis/diagnóstico , Síndrome de Respuesta Inflamatoria Sistémica/diagnóstico , Calibración , ADN/química , Infecciones por Escherichia coli/sangre , Infecciones por Escherichia coli/microbiología , Genes Bacterianos , Humanos , Técnicas de Diagnóstico Molecular/normas , Reacción en Cadena en Tiempo Real de la Polimerasa/normas , Estándares de Referencia , Sensibilidad y Especificidad , Sepsis/sangre , Sepsis/microbiología , Síndrome de Respuesta Inflamatoria Sistémica/sangre , Síndrome de Respuesta Inflamatoria Sistémica/microbiología
18.
PLoS One ; 7(2): e32306, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22384210

RESUMEN

Macrophages play a critical role in innate immunity, and the expression of early response genes orchestrate much of the initial response of the immune system. Macrophages undergo extensive transcriptional reprogramming in response to inflammatory stimuli such as Lipopolysaccharide (LPS).To identify gene transcription regulation patterns involved in early innate immune responses, we used two genome-wide approaches--gene expression profiling and chromatin immunoprecipitation-sequencing (ChIP-seq) analysis. We examined the effect of 2 hrs LPS stimulation on early gene expression and its relation to chromatin remodeling (H3 acetylation; H3Ac) and promoter binding of Sp1 and RNA polymerase II phosphorylated at serine 5 (S5P RNAPII), which is a marker for transcriptional initiation. Our results indicate novel and alternative gene regulatory mechanisms for certain proinflammatory genes. We identified two groups of up-regulated inflammatory genes with respect to chromatin modification and promoter features. One group, including highly up-regulated genes such as tumor necrosis factor (TNF), was characterized by H3Ac, high CpG content and lack of TATA boxes. The second group, containing inflammatory mediators (interleukins and CCL chemokines), was up-regulated upon LPS stimulation despite lacking H3Ac in their annotated promoters, which were low in CpG content but did contain TATA boxes. Genome-wide analysis showed that few H3Ac peaks were unique to either +/-LPS condition. However, within these, an unpacking/expansion of already existing H3Ac peaks was observed upon LPS stimulation. In contrast, a significant proportion of S5P RNAPII peaks (approx 40%) was unique to either condition. Furthermore, data indicated a large portion of previously unannotated TSSs, particularly in LPS-stimulated macrophages, where only 28% of unique S5P RNAPII peaks overlap annotated promoters. The regulation of the inflammatory response appears to occur in a very specific manner at the chromatin level for specific genes and this study highlights the level of fine-tuning that occurs in the immune response.


Asunto(s)
Cromatina/química , Citocinas/metabolismo , Perfilación de la Expresión Génica , Macrófagos/metabolismo , Diferenciación Celular , Inmunoprecipitación de Cromatina , Islas de CpG , Estudio de Asociación del Genoma Completo , Histonas/química , Humanos , Sistema Inmunológico , Inmunidad Innata , Inflamación/genética , Macrófagos/citología , Modelos Biológicos , Monocitos/citología , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos , Regiones Promotoras Genéticas , Unión Proteica , ARN Mensajero/metabolismo , Serina/química
19.
PLoS One ; 5(3): e9762, 2010 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-20360838

RESUMEN

Several recent studies have indicated that transcription is pervasive in regions outside of protein coding genes and that short antisense transcripts can originate from the promoter and terminator regions of genes. Here we investigate transcription of fragments longer than 200 nucleotides, focusing on antisense transcription for known protein coding genes and intergenic transcription. We find that roughly 12% to 16% of all reads that originate from promoter and terminator regions, respectively, map antisense to the gene in question. Furthermore, we detect a high number of novel transcriptionally active regions (TARs) that are generally expressed at a lower level than protein coding genes. We find that the correlation between RNA-seq data and microarray data is dependent on the gene length, with longer genes showing a better correlation. We detect high antisense transcriptional activity from promoter, terminator and intron regions of protein-coding genes and identify a vast number of previously unidentified TARs, including putative novel EGFR transcripts. This shows that in-depth analysis of the transcriptome using RNA-seq is a valuable tool for understanding complex transcriptional events. Furthermore, the development of new algorithms for estimation of gene expression from RNA-seq data is necessary to minimize length bias.


Asunto(s)
Oligonucleótidos Antisentido/genética , Transcripción Genética , Línea Celular Tumoral , Receptores ErbB/genética , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Intrones , Modelos Genéticos , Nucleótidos/química , Análisis de Secuencia por Matrices de Oligonucleótidos , Oligonucleótidos Antisentido/química
20.
PLoS One ; 3(4): e1994, 2008 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-18431481

RESUMEN

Characterization of the chloroplast proteome is needed to understand the essential contribution of the chloroplast to plant growth and development. Here we present a large scale analysis by nanoLC-Q-TOF and nanoLC-LTQ-Orbitrap mass spectrometry (MS) of ten independent chloroplast preparations from Arabidopsis thaliana which unambiguously identified 1325 proteins. Novel proteins include various kinases and putative nucleotide binding proteins. Based on repeated and independent MS based protein identifications requiring multiple matched peptide sequences, as well as literature, 916 nuclear-encoded proteins were assigned with high confidence to the plastid, of which 86% had a predicted chloroplast transit peptide (cTP). The protein abundance of soluble stromal proteins was calculated from normalized spectral counts from LTQ-Obitrap analysis and was found to cover four orders of magnitude. Comparison to gel-based quantification demonstrates that 'spectral counting' can provide large scale protein quantification for Arabidopsis. This quantitative information was used to determine possible biases for protein targeting prediction by TargetP and also to understand the significance of protein contaminants. The abundance data for 550 stromal proteins was used to understand abundance of metabolic pathways and chloroplast processes. We highlight the abundance of 48 stromal proteins involved in post-translational proteome homeostasis (including aminopeptidases, proteases, deformylases, chaperones, protein sorting components) and discuss the biological implications. N-terminal modifications were identified for a subset of nuclear- and chloroplast-encoded proteins and a novel N-terminal acetylation motif was discovered. Analysis of cTPs and their cleavage sites of Arabidopsis chloroplast proteins, as well as their predicted rice homologues, identified new species-dependent features, which will facilitate improved subcellular localization prediction. No evidence was found for suggested targeting via the secretory system. This study provides the most comprehensive chloroplast proteome analysis to date and an expanded Plant Proteome Database (PPDB) in which all MS data are projected on identified gene models.


Asunto(s)
Arabidopsis/química , Cloroplastos/química , Procesamiento Proteico-Postraduccional , Señales de Clasificación de Proteína , Proteoma/química , Proteoma/metabolismo , Secuencia de Aminoácidos , Proteínas de Arabidopsis/análisis , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/aislamiento & purificación , Proteínas de Cloroplastos , Secuencia de Consenso , Homeostasis , Datos de Secuencia Molecular , Oryza/química , Proteoma/análisis , Especificidad de la Especie , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA