RESUMEN
BACKGROUND: Conventional differential gene expression analysis pipelines for non-model organisms require computationally expensive transcriptome assembly. We recently proposed an alternative strategy of directly aligning RNA-seq reads to a protein database, and demonstrated drastic improvements in speed, memory usage, and accuracy in identifying differentially expressed genes. RESULT: Here we report a further speed-up by replacing DNA-protein alignment by quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compare quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity. CONCLUSION: We provide a quick-and-dirty differential gene expression analysis pipeline for non-model organisms without a reference transcriptome, which directly quasi-maps RNA-seq reads to a reference protein database, avoiding computationally expensive transcriptome assembly.
Asunto(s)
Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Transcriptoma/genética , ADN/genética , ADN/metabolismo , Alineación de Secuencia/métodos , Proteínas/genética , Proteínas/metabolismoRESUMEN
In recent years, microRNAs (miRNAs) and tRNA-derived RNA fragments (tRFs) have been reported extensively following different approaches of identification and analysis. Comprehensively analyzing the present approaches to overcome the existing variations, we developed a benchmarking methodology each for the identification of miRNAs and tRFs, termed as miRNA Prediction Methodology (miRPreM) and tRNA-induced small non-coding RNA Prediction Methodology (tiRPreM), respectively. We emphasized the use of respective genome of organism under study for mapping reads, sample data with at least two biological replicates, normalized read count support and novel miRNA prediction by two standard tools with multiple runs. The performance of these methodologies was evaluated by using Oryza coarctata, a wild rice species as a case study for model and non-model organisms. With organism-specific reference genome approach, 98 miRNAs and 60 tRFs were exclusively found. We observed high accuracy (13 out of 15) when tested these genome-specific miRNAs in support of analyzing the data with respective organism. Such a strong impact of miRPreM, we have predicted more than double number of miRNAs (186) as compared with the traditional approaches (79) and with tiRPreM, we have predicted all known classes of tRFs within the same small RNA data. Moreover, the methodologies presented here are in standard form in order to extend its applicability to different organisms rather than restricting to plants. Hence, miRPreM and tiRPreM can fulfill the need of a comprehensive methodology for miRNA prediction and tRF identification, respectively, for model and non-model organisms.
Asunto(s)
MicroARNs , MicroARNs/genética , Plantas/genética , ARN de Transferencia/genéticaRESUMEN
Geobacillus thermoglucosidasius NCIMB 11955 possesses advantages, such as high-temperature tolerance, rapid growth rate, and low contamination risk. Additionally, it features efficient gene editing tools, making it one of the most promising next-generation cell factories. However, as a non-model microorganism, a lack of metabolic information significantly hampers the construction of high-precision metabolic flux models. Here, we propose a BioIntelliModel (BIM) strategy based on artificial intelligence technology for the automated construction of enzyme-constrained models. 1). BIM utilises the Contrastive Learning Enabled Enzyme Annotation (CLEAN) prediction tool to analyse the entire genome sequence of G. thermoglucosidasius NCIMB 11955, uncovering potential functional proteins in non-model strains. 2). The MetaPatchM module of BIM automates the repair of the metabolic network model. 3). The Tianjin University of Science and Technology-kcat (TUST-kcat) module predicts the kcat values of enzymes within the model. 4). The Enzyme-insert procedure constructs an enzyme-constrained model and performs a global scan to address overconstraint issues. Enzymatic data were automatically integrated into the metabolic flux model, creating an enzyme-constrained model, ec_G-ther11955. To validate model accuracy, we used both the p-thermo and ec_G-ther11955 models to predict riboflavin production strategies. The ec_G-ther11955 model demonstrated significantly higher accuracy. To further verify its efficacy, we employed ec_G-ther11955 to guide the rational design of L-valine-producing strains. Using the Optimisation Procedure for Identifying All Genetic Manipulations Leading to Targeted Overproductions (OptForce), Predictive Knockout Targeting (PKT), and Flux Scanning based on Enforced Objective Flux (FSEOF) algorithms, we identified 24 knockout and overexpression targets, achieving an accuracy rate of 87.5%. Ultimately, this led to an increase of 664.04% in L-valine titre. This study provides a novel strategy for rapidly constructing non-model strain models and demonstrates the tremendous potential of artificial intelligence in metabolic engineering.
RESUMEN
Corynebacterium glutamicum ATCC 13032 is a promising microbial chassis for industrial production of valuable compounds, including aromatic amino acids derived from the shikimate pathway. In this work, we developed two whole-cell, transcription factor based fluorescent biosensors to track cis,cis-muconic acid (ccMA) and chorismate in C. glutamicum. Chorismate is a key intermediate in the shikimate pathway from which value-added chemicals can be produced, and a shunt from the shikimate pathway can divert carbon to ccMA, a high value chemical. We transferred a ccMA-inducible transcription factor, CatM, from Acinetobacter baylyi ADP1 into C. glutamicum and screened a promoter library to isolate variants with high sensitivity and dynamic range to ccMA by providing benzoate, which is converted to ccMA intracellularly. The biosensor also detected exogenously supplied ccMA, suggesting the presence of a putative ccMA transporter in C. glutamicum, though the external ccMA concentration threshold to elicit a response was 100-fold higher than the concentration of benzoate required to do so through intracellular ccMA production. We then developed a chorismate biosensor, in which a chorismate inducible promoter regulated by natively expressed QsuR was optimized to exhibit a dose-dependent response to exogenously supplemented quinate (a chorismate precursor). A chorismate-pyruvate lyase encoding gene, ubiC, was introduced into C. glutamicum to lower the intracellular chorismate pool, which resulted in loss of dose dependence to quinate. Further, a knockout strain that blocked the conversion of quinate to chorismate also resulted in absence of dose dependence to quinate, validating that the chorismate biosensor is specific to intracellular chorismate pool. The ccMA and chorismate biosensors were dually inserted into C. glutamicum to simultaneously detect intracellularly produced chorismate and ccMA. Biosensors, such as those developed in this study, can be applied in C. glutamicum for multiplex sensing to expedite pathway design and optimization through metabolic engineering in this promising chassis organism. ONE-SENTENCE SUMMARY: High-throughput screening of promoter libraries in Corynebacterium glutamicum to establish transcription factor based biosensors for key metabolic intermediates in shikimate and ß-ketoadipate pathways.
Asunto(s)
Técnicas Biosensibles , Ácido Corísmico , Corynebacterium glutamicum , Ácido Sórbico , Corynebacterium glutamicum/metabolismo , Corynebacterium glutamicum/genética , Técnicas Biosensibles/métodos , Ácido Sórbico/metabolismo , Ácido Sórbico/análogos & derivados , Ácido Corísmico/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Regiones Promotoras Genéticas , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Acinetobacter/metabolismo , Acinetobacter/genéticaRESUMEN
This review describes the development of evolutionary studies of sex based on the volvocine lineage of green algae, which was facilitated by whole-genome analyses of both model and non-model species. Volvocine algae, which include Chlamydomonas and Volvox species, have long been considered a model group for experimental studies investigating the evolution of sex. Thus, whole-genomic information on the sex-determining regions of volvocine algal sex chromosomes has been sought to elucidate the molecular genetic basis of sex evolution. By 2010, whole genomes were published for two model species in this group, Chlamydomonas reinhardtii and Volvox carteri. Recent improvements in sequencing technology, particularly next-generation sequencing, allowed our studies to obtain complete genomes for non-model, but evolutionary important, volvocine algal species. These genomes have provided critical details about sex-determining regions that will contribute to our understanding of the diversity and evolution of sex.
Asunto(s)
Evolución Molecular , Volvox , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/métodos , Volvox/genética , Volvox/clasificación , Cromosomas Sexuales/genética , Genoma de Planta , Chlorophyta/genética , Chlorophyta/clasificación , Variación GenéticaRESUMEN
The innate immune protection provided by cationic antimicrobial peptides (CAMPs) has been shown to extend to antiviral activity, with putative mechanisms of action including direct interaction with host cells or pathogen membranes. The lack of therapeutics available for the treatment of viruses such as Venezuelan equine encephalitis virus (VEEV) underscores the urgency of novel strategies for antiviral discovery. American alligator plasma has been shown to exhibit strong in vitro antibacterial activity, and functionalized hydrogel particles have been successfully employed for the identification of specific CAMPs from alligator plasma. Here, a novel bait strategy in which particles were encapsulated in membranes from either healthy or VEEV-infected cells was implemented to identify peptides preferentially targeting infected cells for subsequent evaluation of antiviral activity. Statistical analysis of peptide identification results was used to select five candidate peptides for testing, of which one exhibited a dose-dependent inhibition of VEEV and also significantly inhibited infectious titers. Results suggest our bioprospecting strategy provides a versatile platform that may be adapted for antiviral peptide identification from complex biological samples.
Asunto(s)
Caimanes y Cocodrilos , Virus de la Encefalitis Equina Venezolana , Encefalomielitis Equina Venezolana , Animales , Caballos , Virus de la Encefalitis Equina Venezolana/fisiología , Antivirales/farmacología , Antivirales/uso terapéutico , Encefalomielitis Equina Venezolana/tratamiento farmacológico , Encefalomielitis Equina Venezolana/prevención & control , Bioprospección , Replicación Viral , PéptidosRESUMEN
Thanks to genetics, biochemistry, and structural biology many features of the ribosome´s life cycles in models of bacteria, eukaryotes, and some organelles have been revealed to near-atomic details. Collectively, these studies have provided a very detailed understanding of what are now well-established prototypes for ribosome biogenesis and function as viewed from a 'classical' model organisms perspective. However, very important challenges remain ahead to explore the functional and structural diversity of both ribosome biogenesis and function across the biological diversity on earth. Particularly, the 'third domain of life', the archaea, and also many non-model bacterial and eukaryotic organisms have been comparatively neglected. Importantly, characterizing these additional biological systems will not only offer a yet untapped window to enlighten the evolution of ribosome biogenesis and function but will also help to unravel fundamental principles of molecular adaptation of these central cellular processes.
Asunto(s)
Archaea , Ribosomas , Animales , Archaea/genética , Bacterias/genética , Eucariontes/genética , Estadios del Ciclo de Vida , Ribosomas/genéticaRESUMEN
The California Conservation Genomics Project (CCGP) is a unique, critically important step forward in the use of comprehensive landscape genetic data to modernize natural resource management at a regional scale. We describe the CCGP, including all aspects of project administration, data collection, current progress, and future challenges. The CCGP will generate, analyze, and curate a single high-quality reference genome and 100-150 resequenced genomes for each of 153 species projects (representing 235 individual species) that span the ecological and phylogenetic breadth of California's marine, freshwater, and terrestrial ecosystems. The resulting portfolio of roughly 20 000 resequenced genomes will be analyzed with identical informatic and landscape genomic pipelines, providing a comprehensive overview of hotspots of within-species genomic diversity, potential and realized corridors connecting these hotspots, regions of reduced diversity requiring genetic rescue, and the distribution of variation critical for rapid climate adaptation. After 2 years of concerted effort, full funding ($12M USD) has been secured, species identified, and funds distributed to 68 laboratories and 114 investigators drawn from all 10 University of California campuses. The remaining phases of the CCGP include completion of data collection and analyses, and delivery of the resulting genomic data and inferences to state and federal regulatory agencies to help stabilize species declines. The aspirational goals of the CCGP are to identify geographic regions that are critical to long-term preservation of California biodiversity, prioritize those regions based on defensible genomic criteria, and provide foundational knowledge that informs management strategies at both the individual species and ecosystem levels.
Asunto(s)
Biodiversidad , Ecosistema , Filogenia , Genómica , Agua Dulce , California , Conservación de los Recursos NaturalesRESUMEN
The recent progress in sequencing technology allowed the compilation of gene lists for a large number of organisms, though many of these organisms are hardly experimentally tractable when compared with well-established model organisms. One popular approach to further characterize genes identified in a poorly tractable organism is to express these genes in a model organism, and then ask what the protein does in this system or if the gene is capable of replacing the homologous endogenous one when the latter is mutated. While this is a valid approach for certain questions, I argue that the results of such experiments are frequently wrongly interpreted. If, for example, a gene from a parasitic nematode is capable of replacing its homologous gene in the model nematode Caenorhabditis elegans, it is often concluded that the gene is most likely involved in the same biological process in its own organism as the C. elegans gene is in C. elegans. This conclusion is not valid. All this experiment tells us is that the chemical properties of the parasite protein are similar enough to the ones of the C. elegans protein that it can perform the function of the C. elegans protein in C. elegans. Here I discuss this misconception and illustrate it using the analog of similar electric switches (components) controlling various devices (processes).
Asunto(s)
Proteínas de Caenorhabditis elegans , Caenorhabditis elegans , Animales , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genéticaRESUMEN
There is an increasing demand for elucidating the biosynthetic pathway of medicinal plants, which are capable of producing several metabolites with great potentials for industrial drug production. Digitalis species are important medicinal plants for the production of cardenolide compounds. Advancement on culture techniques is strictly related to our understanding of the genomic background of species. There are a limited number of genomic studies on Digitalis species. The goal of this study is to contribute to the genomic data of Digitalis ferruginea subsp. schischkinii by presenting transcriptome annotation. Digitalis ferruginea subsp. schischkinii has a limited distribution in Turkey and Transcaucasia, and has a high level of lanatoside C, an important cardenolide. In the study, we sequenced the cDNA library prepared from RNA pools of D. ferruginea subsp. schischkinii tissues treated with various stress conditions. Comprehensive bioinformatics approaches were used for de novo assembly and functional annotation of D. ferruginea subsp. schischkinii transcriptome sequence data along with TF families predictions and phylogenetic analysis. In the study, 58,369 unigenes were predicted and unigenes were annotated by analyzing the sequence data in the non-redundant (NR) protein database, the non-redundant nucleotide (NT) database, Gene Orthology (GO), EuKaryotic Orthologous Groups (KOG), Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt, and InterPro databases. This study is the first transcriptome data for D. ferruginea subsp. schischkinii.
Asunto(s)
Vías Biosintéticas/genética , Digitalis/genética , Repeticiones de Microsatélite/genética , Transcriptoma/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Filogenia , Plantas Medicinales/químicaRESUMEN
BACKGROUND: Transcriptome analysis by next-generation sequencing has become a popular technique in recent years. This approach is quite suitable for non-model organism study, as de novo assembly is independent of prior genomic sequences of organisms. De novo sequencing has benefited many studies on commercially important fish species. However, to understand the functions of these assembled sequences, they still need to be annotated with existing sequence databases. By combining Basic Local Alignment Search Tool (BLAST) and Gene Ontology analysis, we were able to identify homologous sequences of assembled sequences and describe their characteristics using pre-defined tags for each gene, though the above conventional annotation results obtained for non-model assembled sequences was still associated with a lack of pre-defined tags and poorly documented records in the database. RESULTS: We introduced Blast2Fish, a novel approach for performing functional enrichment analysis on non-model teleost fish transcriptome data. The Blast2Fish pipeline was designed to be a reference-based enrichment method. Instead of annotating the BLAST single top hit by a pre-defined gene-to-tag database, we included 500 hits to search related PubMed articles and parse biological terms. These descriptive terms were then sorted and recorded as annotations for the query. The results showed that Blast2Fish was capable of providing meaningful annotations on immunology topics for non-model fish transcriptome analysis. CONCLUSION: Blast2Fish provides a novel approach for annotating sequences of non-model fish. The reference-based strategy allows annotation to be performed without pre-defined tags for each gene. This method strongly benefits non-model teleost fish studies for gene functional enrichment analysis.
Asunto(s)
Biología Computacional/métodos , Proteínas de Peces/genética , Peces/genética , Anotación de Secuencia Molecular/métodos , Animales , Bases de Datos de Ácidos Nucleicos , Proteínas de Peces/química , Proteínas de Peces/metabolismo , Peces/metabolismo , Perfilación de la Expresión Génica , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Programas Informáticos , TranscriptomaRESUMEN
BACKGROUND: Pteropods are planktonic gastropods that are considered as bio-indicators to monitor impacts of ocean acidification on marine ecosystems. In order to gain insight into their adaptive potential to future environmental changes, it is critical to use adequate molecular tools to delimit species and population boundaries and to assess their genetic connectivity. We developed a set of target capture probes to investigate genetic variation across their large-sized genome using a population genomics approach. Target capture is less limited by DNA amount and quality than other genome-reduced representation protocols, and has the potential for application on closely related species based on probes designed from one species. RESULTS: We generated the first draft genome of a pteropod, Limacina bulimoides, resulting in a fragmented assembly of 2.9 Gbp. Using this assembly and a transcriptome as a reference, we designed a set of 2899 genome-wide target capture probes for L. bulimoides. The set of probes includes 2812 single copy nuclear targets, the 28S rDNA sequence, ten mitochondrial genes, 35 candidate biomineralisation genes, and 41 non-coding regions. The capture reaction performed with these probes was highly efficient with 97% of the targets recovered on the focal species. A total of 137,938 single nucleotide polymorphism markers were obtained from the captured sequences across a test panel of nine individuals. The probes set was also tested on four related species: L. trochiformis, L. lesueurii, L. helicina, and Heliconoides inflatus, showing an exponential decrease in capture efficiency with increased genetic distance from the focal species. Sixty-two targets were sufficiently conserved to be recovered consistently across all five species. CONCLUSION: The target capture protocol used in this study was effective in capturing genome-wide variation in the focal species L. bulimoides, suitable for population genomic analyses, while providing insights into conserved genomic regions in related species. The present study provides new genomic resources for pteropods and supports the use of target capture-based protocols to efficiently characterise genomic variation in small non-model organisms with large genomes.
Asunto(s)
Gastrópodos/genética , Genoma/genética , Biología Marina , Océanos y Mares , Animales , Gastrópodos/metabolismo , Genómica/tendencias , Concentración de Iones de Hidrógeno , Filogenia , Polimorfismo de Nucleótido Simple/genética , Agua de Mar/química , Especificidad de la Especie , Transcriptoma/genéticaRESUMEN
BACKGROUND: The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. RESULTS: TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. CONCLUSIONS: TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors.
Asunto(s)
Análisis de Secuencia de ARN , Programas Informáticos , Transcriptoma/genética , Emparejamiento Base/genética , Hongos/genética , Perfilación de la Expresión Génica , Plantas/genética , Análisis de Componente Principal , Reproducibilidad de los Resultados , Flujo de TrabajoRESUMEN
More than five thousand genes annotated in the recently published Xenopus laevis and Xenopus tropicalis genomes do not have a candidate orthologous counterpart in other vertebrate species. To determine whether these sequences represent genuine amphibian-specific genes or annotation errors, it is necessary to analyze them alongside sequences from other amphibian species. However, due to large genome sizes and an abundance of repeat sequences, there are limited numbers of gene sequences available from amphibian species other than Xenopus. AmphiBase is a new genomic resource covering non-model amphibian species, based on public domain transcriptome data and computational methods developed during the X. laevis genome project. Here, I review the current status of AmphiBase, including amphibian species with available transcriptome data or biological samples, and describe the challenges of building a comprehensive amphibian genomic resource in the absence of genomes. This mini-review will be informative for researchers interested in functional genomic experiments using amphibian model organisms, such as Xenopus and axolotl, and will assist in interpretation of results implicating "orphan genes." Additionally, this study highlights an opportunity for researchers working on non-model amphibian species to collaborate in their future efforts and develop amphibian genomic resources as a community.
Asunto(s)
Bases de Datos Genéticas , Genómica , Transcriptoma/genética , Xenopus/genética , Animales , GenomaRESUMEN
BACKGROUND: Cornu aspersum is a quite intriguing species from the point of view of ecology and evolution and its potential use in medical and environmental applications. It is a species of economic importance since it is farmed and used for culinary purposes. However, the genomic tools that would allow a thorough insight into the ecology, evolution, nutritional and medical properties of this highly adaptable organism, are missing. In this work, using next-generation sequencing (NGS) techniques we assessed a significant portion of the transcriptome of this non-model organism. RESULTS: Out of the 9445 de novo assembled contigs, 2886 (30.6%) returned significant hits and for 2261 (24%) of them Gene Ontology (GO) terms associated to the hits were retrieved. A high percentage of the contigs (69.4%) produced no BLASTx hits. The GO terms were grouped to reflect biological processes, molecular functions and cellular components. Certain GO terms were dominant in all groups. After scanning the assembled transcriptome for microsatellites (simple sequence repeats, SSRs), a total of 563 SSRs were recovered. Among the identified SSRs, trinucleotide repeats were the predominant followed by tetranucleotide and dinucleotide repeats. CONCLUSION: The annotation success of the transcriptome of C. aspersum was relatively low. This is probably due to the very limited number of annotated reference genomes existing for mollusc species, especially terrestrial ones. Several biological processes being active in the aestivating species were revealed through the association of the transcripts to enzymes relating to the pathways. The genomic tools provided herein will eventually aid in the study of the global genomic diversity of the species and the investigation of aspects of the ecology, evolution, behavior, nutritional and medical properties of this highly adaptable organism.
Asunto(s)
Perfilación de la Expresión Génica , Caracoles Helix/genética , Anotación de Secuencia Molecular , Animales , Ontología de Genes , Repeticiones de Microsatélite/genéticaRESUMEN
BACKGROUND: Aedes aegypti is a vector for the (re-)emerging human pathogens dengue, chikungunya, yellow fever and Zika viruses. Almost half of the Ae. aegypti genome is comprised of transposable elements (TEs). Transposons have been linked to diverse cellular processes, including the establishment of viral persistence in insects, an essential step in the transmission of vector-borne viruses. However, up until now it has not been possible to study the overall proteome derived from an organism's mobile genetic elements, partly due to the highly divergent nature of TEs. Furthermore, as for many non-model organisms, incomplete genome annotation has hampered proteomic studies on Ae. aegypti. RESULTS: We analysed the Ae. aegypti proteome using our new proteomics informed by transcriptomics (PIT) technique, which bypasses the need for genome annotation by identifying proteins through matched transcriptomic (rather than genomic) data. Our data vastly increase the number of experimentally confirmed Ae. aegypti proteins. The PIT analysis also identified hotspots of incomplete genome annotation, and showed that poor sequence and assembly quality do not explain all annotation gaps. Finally, in a proof-of-principle study, we developed criteria for the characterisation of proteomically active TEs. Protein expression did not correlate with a TE's genomic abundance at different levels of classification. Most notably, long terminal repeat (LTR) retrotransposons were markedly enriched compared to other elements. PIT was superior to 'conventional' proteomic approaches in both our transposon and genome annotation analyses. CONCLUSIONS: We present the first proteomic characterisation of an organism's repertoire of mobile genetic elements, which will open new avenues of research into the function of transposon proteins in health and disease. Furthermore, our study provides a proof-of-concept that PIT can be used to evaluate a genome's annotation to guide annotation efforts which has the potential to improve the efficiency of annotation projects in non-model organisms. PIT therefore represents a valuable new tool to study the biology of the important vector species Ae. aegypti, including its role in transmitting emerging viruses of global public health concern.
Asunto(s)
Aedes/metabolismo , Elementos Transponibles de ADN/genética , Genoma , Proteoma/análisis , Proteómica/métodos , Aedes/genética , Animales , Línea Celular , Cromatografía Líquida de Alta Presión , Mapeo Contig , Proteínas de Insectos/análisis , Proteínas de Insectos/aislamiento & purificación , ARN/aislamiento & purificación , ARN/metabolismo , Análisis de Secuencia de ARN , Espectrometría de Masas en TándemRESUMEN
Staphylococcus aureus (S. aureus) is a prominent human and livestock pathogen investigated widely using omic technologies. Critically, due to availability, low visibility or scattered resources, robust network and statistical contextualisation of the resulting data is generally under-represented. Here, we present novel meta-analyses of freely-accessible molecular network and gene ontology annotation information resources for S. aureus omics data interpretation. Furthermore, through the application of the gene ontology annotation resources we demonstrate their value and ability (or lack-there-of) to summarise and statistically interpret the emergent properties of gene expression and protein abundance changes using publically available data. This analysis provides simple metrics for network selection and demonstrates the availability and impact that gene ontology annotation selection can have on the contextualisation of bacterial omics data.
Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Redes Reguladoras de Genes , Anotación de Secuencia Molecular , Staphylococcus aureus/genética , HumanosRESUMEN
The distribution of genomic variation across landscapes can provide insights into the complex interactions between the environment and the genome that influence the distribution of species, and mediate phenotypic adaptation to local conditions. High throughput sequencing technologies now offer unprecedented power to explore these interactions, allowing powerful inferences about historical processes of colonization, gene flow and divergence, as well as the identification of loci that mediate local adaptation. These 'landscape genomic' approaches have been validated in model species and are now being applied to nonmodel organisms, including foundation species that have substantial effects on ecosystem processes. Here we review the growing field of landscape genomics from a very broad perspective. In particular, we describe the inferential power that is gained by taking a genome-wide view of genetic variation, strategies for study design to best capture adaptive variation, and how to apply this information to practical challenges, such as restoration.
Asunto(s)
Variación Genética , Genómica/métodos , Modelos Biológicos , Especificidad de la EspecieRESUMEN
Insects are the most diverse group of organisms on the planet. Variation in gene expression lies at the heart of this biodiversity and recent advances in sequencing technology have spawned a revolution in researchers' ability to survey tissue-specific transcriptional complexity across a wide range of insect taxa. Increasingly, studies are using a comparative approach (across species, sexes and life stages) that examines the transcriptional basis of phenotypic diversity within an evolutionary context. In the present review, we summarize much of this research, focusing in particular on three critical aspects of insect biology: morphological development and plasticity; physiological response to the environment; and sexual dimorphism. A common feature that is emerging from these investigations concerns the dynamic nature of transcriptome evolution as indicated by rapid changes in the overall pattern of gene expression, the differential expression of numerous genes with unknown function, and the incorporation of novel, lineage-specific genes into the transcriptional profile.
Asunto(s)
Insectos/genética , Transcriptoma/genética , Animales , Secuencia de Bases , Evolución Biológica , Femenino , Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Variación Genética , Insectos/crecimiento & desarrollo , Insectos/fisiología , Masculino , Fenotipo , ARN/genética , Caracteres Sexuales , Estrés FisiológicoRESUMEN
BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.