Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
BMJ Glob Health ; 9(10)2024 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-39477336

RESUMO

Microbial data sharing underlies evidence-based microbial research, as well as pathogen surveillance and analysis essential to public health. While the need for data sharing was highlighted during the SARS-CoV-2 pandemic, some concerns regarding secondary data use have also surfaced. Although general guidelines are available for data sharing, we note the absence of a set of established, universal, unambiguous and accessible principles to guide the secondary use of microbial data. Here, we propose the Public Health Alliance for Genomic Epidemiology (PHA4GE) Microbial Data-Sharing Accord to consolidate consensus norms and accepted practices for the secondary use of microbial data. The Accord provides a set of seven simple, baseline principles to address key concerns that may arise for researchers providing microbial datasets for secondary use and to guide responsible use by data users. By providing clear rules for secondary use of microbial data, the Accord can increase confidence in sharing by data providers and protect against data mis-use during secondary analyses.


Assuntos
COVID-19 , Disseminação de Informação , Humanos , Consenso , SARS-CoV-2 , Saúde Pública , Pandemias
3.
PLoS Comput Biol ; 19(1): e1010752, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36622853

RESUMO

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Assuntos
Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Análise de Dados , Pesquisadores
4.
Plants (Basel) ; 11(16)2022 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-36015459

RESUMO

While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. Long-read sequencing is becoming standard procedure for plant genome research, as these reads can span repetitive regions of the DNA, substantially facilitating reassembly of a contiguous genome. With the MinION, Oxford Nanopore offers a cost-efficient sequencing method to generate long reads; however, DNA purification protocols must be adapted for each plant species to generate ultra-pure DNA, essential for these analyses. Here, we describe a cost-effective procedure for the extraction and purification of plant DNA and evaluate diverse genome assembly approaches for the reconstruction of the genome of rooibos (Aspalathus linearis), an endemic South African medicinal plant widely used for tea production. We discuss the pros and cons of nine tested assembly programs, specifically Redbean and NextDenovo, which generated the most contiguous assemblies, and Flye, which produced an assembly closest to the predicted genome size.

5.
BMC Genomics ; 23(1): 520, 2022 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-35850574

RESUMO

Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Presently, 15 lineages of RVFV have been identified to be circulating within the Sub-Saharan Africa. International trade in livestock and movement of mosquitoes are thought to be responsible for the outbreaks occurring outside endemic or enzootic regions. Virus spillover events contribute to outbreaks as was demonstrated by the largest epidemic of 1977 in Egypt. Genomic surveillance of the virus evolution is crucial in developing intervention strategies. Therefore, we have developed a computational tool for rapidly classifying and assigning lineages of the RVFV isolates. The computational method is presented both as a command line tool and a web application hosted at https://www.genomedetective.com/app/typingtool/rvfv/ . Validation of the tool has been performed on a large dataset using glycoprotein gene (Gn) and whole genome sequences of the Large (L), Medium (M) and Small (S) segments of the RVFV retrieved from the National Center for Biotechnology Information (NCBI) GenBank database. Using the Gn nucleotide sequences, the RVFV typing tool was able to correctly classify all 234 RVFV sequences at species level with 100% specificity, sensitivity and accuracy. All the sequences in lineages A (n = 10), B (n = 1), C (n = 88), D (n = 1), E (n = 3), F (n = 2), G (n = 2), H (n = 105), I (n = 2), J (n = 1), K (n = 4), L (n = 8), M (n = 1), N (n = 5) and O (n = 1) were also correctly classified at phylogenetic level. Lineage assignment using whole RVFV genome sequences (L, M and S-segments) did not achieve 100% specificity, sensitivity and accuracy for all the sequences analyzed. We further tested our tool using genomic data that we generated by sequencing 5 samples collected following a recent RVF outbreak in Kenya. All the 5 samples were assigned lineage C by both the partial (Gn) and whole genome sequence classifiers. The tool is useful in tracing the origin of outbreaks and supporting surveillance efforts.Availability: https://github.com/ajodeh-juma/rvfvtyping.


Assuntos
Febre do Vale de Rift , Vírus da Febre do Vale do Rift , Animais , Comércio , Genômica , Internacionalidade , Quênia , Filogenia , Febre do Vale de Rift/epidemiologia , Vírus da Febre do Vale do Rift/genética
6.
Bioinform Adv ; 2(1): vbac030, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35669346

RESUMO

Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

7.
PLoS One ; 17(3): e0265492, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35298540

RESUMO

The growing demands on protein producers and the dwindling available resources have made Hermetia illucens (the black soldier fly, BSF) an economically important species. Insights into the genome of this insect will better allow for robust breeding protocols, and more efficient production to be used as a replacement of animal feed protein. The use of microRNA as a method to understand how gene regulation allows insect species to adapt to changes in their environment, has been established in multiple species. The baseline and life stage expression levels established in this study, allow for insight into the development and sex-linked microRNA regulation in BSF. To accomplish this, microRNA was extracted and sequenced from 15 different libraries with each life stage in triplicate. Of the total 192 microRNAs found, 168 were orthologous to known arthropod microRNAs and 24 microRNAs were unique to BSF. Twenty-six of the 168 microRNAs conserved across arthropods had a statistically significant (p < 0.05) differential expression between Egg to Larval stages. The development from larva to pupa was characterized by 16 statistically significant differentially expressed microRNA. Seven and 9 microRNA were detected as statistically significant between pupa to adult female and pupa to adult male, respectively. All life stages had a nearly equal split between up and down regulated microRNAs. Ten of the unique 24 miRNA were detected exclusively in one life stage. The egg life stage expressed five microRNA (hil-miR-m, hil-miR-p, hil-miR-r, hil-miR-s, and hil-miR-u) not seen in any other life stages. The female adult and pupa life stages expressed one miRNA each hil-miR-h and hil-miR-ac respectively. Both male and female adult life stages expressed hil-miR-a, hil-miR-b, and hil-miR-y. There were no unique microRNAs found only in the larva stage. Twenty-two microRNAs with 56 experimentally validated target genes in the closely related Drosophila melanogaster were identified. Thus, the microRNA found display the unique evolution of BSF, along with the life stages and potential genes to target for robust mass rearing. Understanding of the microRNA expression in BSF will further their use in the crucial search for alternative and sustainable protein sources.


Assuntos
Dípteros , MicroRNAs , Ração Animal/análise , Animais , Drosophila melanogaster , Feminino , Larva , Masculino , MicroRNAs/genética , MicroRNAs/metabolismo , Pupa
8.
BMJ Glob Health ; 7(2)2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35144922

RESUMO

There is an increasing recognition of the importance of including benefit sharing in research programmes in order to ensure equitable and just distribution of the benefits arising from research. Whilst there are global efforts to promote benefit sharing when using non-human biological resources, benefit sharing plans and implementation do not yet feature prominently in research programmes, funding applications or requirements by ethics review boards. Whilst many research stakeholders may agree with the concept of benefit sharing, it can be difficult to operationalise benefit sharing within research programmes. We present a framework designed to assist with identifying benefit sharing opportunities in research programmes. The framework has two dimensions: the first represents microlevel, mesolevel and macrolevel stakeholders as defined using a socioecological model; and the second identifies nine different types of benefit sharing that might be achieved during a research programme. We provide an example matrix identifying different types of benefit sharing that might be undertaken during genomics research, and present a case study evaluating benefit sharing in Africa during the SARS-CoV-2 pandemic. This framework, with examples, is intended as a practical tool to assist research stakeholders with identifying opportunities for benefit sharing, and inculcating intentional benefit sharing in their research programmes from inception.


Assuntos
Pesquisa Biomédica , COVID-19 , África , Humanos , SARS-CoV-2
9.
mSphere ; 7(1): e0099121, 2022 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-35138128

RESUMO

Whole-genome sequencing (WGS) is a powerful method for detecting drug resistance, genetic diversity, and transmission dynamics of Mycobacterium tuberculosis. Implementation of WGS in public health microbiology laboratories is impeded by a lack of user-friendly, automated, and semiautomated pipelines. We present the COMBAT-TB Workbench, a modular, easy-to-install application that provides a web-based environment for Mycobacterium tuberculosis bioinformatics. The COMBAT-TB Workbench is built using two main software components: the IRIDA platform for its web-based user interface and data management capabilities and the Galaxy bioinformatics workflow platform for workflow execution. These components are combined into a single easy-to-install application using Docker container technology. We implemented two workflows, for M. tuberculosis sample analysis and phylogeny, in Galaxy. Building our workflows involved updating some Galaxy tools (Trimmomatic, snippy, and snp-sites) and writing new Galaxy tools (snp-dists, TB-Profiler, tb_variant_filter, and TB Variant Report). The irida-wf-ga2xml tool was updated to be able to work with recent versions of Galaxy and was further developed into IRIDA plugins for both workflows. In the case of the M. tuberculosis sample analysis, an interface was added to update the metadata stored for each sequence sample with results gleaned from the Galaxy workflow output. Data can be loaded into the COMBAT-TB Workbench via the web interface or via the command line IRIDA uploader tool. The COMBAT-TB Workbench application deploys IRIDA, the COMBAT-TB IRIDA plugins, the MariaDB database, and Galaxy using Docker containers (https://github.com/COMBAT-TB/irida-galaxy-deploy). IMPORTANCE While the reduction in the cost of WGS is making sequencing more affordable in lower- and middle-income countries (LMICs), public health laboratories in these countries seldom have access to bioinformaticians and system support engineers adept at using the Linux command line and complex bioinformatics software. The COMBAT-TB Workbench provides an open-source, modular, easy-to-deploy and -use environment for managing and analyzing M. tuberculosis WGS data and thereby makes WGS usable in practice in the LMIC context.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Biologia Computacional/métodos , Humanos , Mycobacterium tuberculosis/genética , Software , Fluxo de Trabalho
10.
Microbiol Resour Announc ; 9(27)2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32616644

RESUMO

As a contribution to the global efforts to track and trace the ongoing coronavirus pandemic, here we present the sequence, phylogenetic analysis, and modeling of nonsynonymous mutations for a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome that was detected in a South African patient with coronavirus disease 2019 (COVID-19).

11.
Bioinformatics ; 36(3): 982-983, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504165

RESUMO

MOTIVATION: Recent advancements in genomic technologies have enabled high throughput cost-effective generation of 'omics' data from M.tuberculosis (M.tb) isolates, which then gets shared via a number of heterogeneous publicly available biological databases. Albeit useful, fragmented curation negatively impacts the researcher's ability to leverage the data via federated queries. RESULTS: We present Combat-TB-NeoDB, an integrated M.tb 'omics' knowledge-base. Combat-TB-NeoDB is based on Neo4j and was created by binding the labeled property graph model to a suitable ontology namely Chado. Combat-TB-NeoDB enables researchers to execute complex federated queries by linking prominent biological databases, and supplementary M.tb variants data from published literature. AVAILABILITY AND IMPLEMENTATION: The Combat-TB-NeoDB (https://neodb.sanbi.ac.za) repository and all tools mentioned in this manuscript are freely available at https://github.com/COMBAT-TB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Bases de Dados Factuais , Genoma , Genômica , Humanos , Software
12.
Bioinform Biol Insights ; 13: 1177932219882347, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-35173421

RESUMO

Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templates at a time. However, these large datasets can only be efficiently explored using bioinformatics analyses requiring huge data storage and computational resources adapted for high-performance processing. High-performance computing allows for efficient handling of large data and tasks that may require multi-threading and prolonged computational times, which is not feasible with ordinary computers. However, high-performance computing resources are costly and therefore not always readily available in low-income settings. We describe the establishment of an affordable high-performance computing bioinformatics cluster consisting of 3 nodes, constructed using ordinary desktop computers and open-source software including Linux Fedora, SLURM Workload Manager, and the Conda package manager. For the analysis of large antibody sequence datasets and for complex viral phylodynamic analyses, the cluster out-performed desktop computers. This has demonstrated that it is possible to construct high-performance computing capacity capable of analyzing large NGS data from relatively low-cost hardware and entirely free (open-source) software, even in resource-limited settings. Such a cluster design has broad utility beyond bioinformatics to other studies that require high-performance computing.

13.
AAS Open Res ; 1: 9, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-32382696

RESUMO

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

14.
BMC Genet ; 18(1): 119, 2017 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-29273003

RESUMO

BACKGROUND: Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. RESULTS: We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress. CONCLUSIONS: This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.


Assuntos
Genes de Plantas , Anotação de Sequência Molecular , Sorghum/genética , Sorghum/fisiologia , Simulação por Computador , Secas , Éxons , Redes e Vias Metabólicas , Locos de Características Quantitativas , Transcriptoma
15.
Front Microbiol ; 8: 13, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28167933

RESUMO

Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies.

17.
PLoS Genet ; 12(4): e1005954, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27082250

RESUMO

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.


Assuntos
Bass/genética , Mapeamento Cromossômico , Animais , Bass/classificação , Genoma , Hibridização in Situ Fluorescente , Filogenia
18.
BMC Res Notes ; 9: 144, 2016 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-26945860

RESUMO

BACKGROUND: The National Institutes of Health (USA) has committed 5 years of funding to the Bioinformatics Network of the Human Heredity and Health in Africa initiative. This pan-African network aims to develop capacity for bioinformatics research, in order to provide support to human health genomics research programs ongoing on the continent. Over the 5 years of funding, it is imperative to track changes in bioinformatics capacity at the funded centres and to document how the funding has translated into capacity development during this time frame. RESULTS: The Network capacity database, NetCapDB, is a relational database that captures quantitative metrics for bioinformatics capacity, and tracks the changes in these metrics over time. A graphical user interface allows for straight-forward, browser-based data entry by users across Africa; and for visual and graph-based exploration of captured data. A reporting interface allows for semi-automated generation of standardized reports for monitoring and evaluation purposes.


Assuntos
Biologia Computacional/economia , Genoma Humano , National Institutes of Health (U.S.)/economia , Avaliação de Programas e Projetos de Saúde/estatística & dados numéricos , África , Financiamento de Capital , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Estados Unidos , Interface Usuário-Computador
19.
BMC Bioinformatics ; 16: 58, 2015 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-25880035

RESUMO

BACKGROUND: De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. RESULTS: Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5' and 3') regions and non-coding gene loci. CONCLUSIONS: IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.


Assuntos
Algoritmos , Genoma Fúngico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular , Neurospora crassa/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Análise por Conglomerados , Software
20.
J Exp Zool B Mol Dev Evol ; 322(6): 403-14, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24106203

RESUMO

G-protein coupled chemosensory receptors (GPCR-CRs) aid in the perception of odors and tastes in vertebrates. So far, six GPCR-CR families have been identified that are conserved in most vertebrate species. Phylogenetic analyses indicate differing evolutionary dynamics between teleost fish and tetrapods. The coelacanth Latimeria chalumnae belongs to the lobe-finned fishes, which represent a phylogenetic link between these two groups. We searched the genome of L. chalumnae for GPCR-CRs and found that coelacanth taste receptors are more similar to those in tetrapods than in teleost fish: two coelacanth T1R2s co-segregate with the tetrapod T1R2s that recognize sweet substances, and our phylogenetic analyses indicate that the teleost T1R2s are closer related to T1R1s (umami taste receptors) than to tetrapod T1R2s. Furthermore, coelacanths are the first fish with a large repertoire of bitter taste receptors (58 T2Rs). Considering current knowledge on feeding habits of coelacanths the question arises if perception of bitter taste is the only function of these receptors. Similar to teleost fish, coelacanths have a variety of olfactory receptors (ORs) necessary for perception of water-soluble substances. However, they also have seven genes in the two tetrapod OR subfamilies predicted to recognize airborne molecules. The two coelacanth vomeronasal receptor families are larger than those in teleost fish, and similar to tetrapods and form V1R and V2R monophyletic clades. This may point to an advanced development of the vomeronasal organ as reported for lungfish. Our results show that the intermediate position of Latimeria in the phylogeny is reflected in its GPCR-CR repertoire.


Assuntos
Peixes/genética , Receptores Odorantes/genética , Paladar/genética , Animais , Evolução Molecular , Filogenia , Receptores Acoplados a Proteínas G/genética , Vertebrados/genética , Órgão Vomeronasal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA