RESUMO
Copepods encompass numerous ecological roles including parasites, detrivores and phytoplankton grazers. Nonetheless, copepod genome assemblies remain scarce. Lepeophtheirus salmonis is an economically and ecologically important ectoparasitic copepod found on salmonid fish. We present the 695.4 Mbp L. salmonis genome assembly containing ≈60% repetitive regions and 13,081 annotated protein-coding genes. The genome comprises 14 autosomes and a ZZ-ZW sex chromosome system. Assembly assessment identified 92.4% of the expected arthropod genes. Transcriptomics supported annotation and indicated a marked shift in gene expression after host attachment, including apparent downregulation of genes related to circadian rhythm coinciding with abandoning diurnal migration. The genome shows evolutionary signatures including loss of genes needed for peroxisome biogenesis, presence of numerous FNII domains, and an incomplete heme homeostasis pathway suggesting heme proteins to be obtained from the host. Despite repeated development of resistance against chemical treatments L. salmonis exhibits low numbers of many genes involved in detoxification.
Assuntos
Copépodes , Doenças dos Peixes , Parasitos , Aclimatação , Animais , Copépodes/genética , Copépodes/parasitologia , Doenças dos Peixes/genética , Parasitos/genética , TranscriptomaRESUMO
BACKGROUND: Marine fish populations are often characterized by high levels of gene flow and correspondingly low genetic divergence. This presents a challenge to define management units. Goldsinny wrasse (Ctenolabrus rupestris) is a heavily exploited species due to its importance as a cleaner-fish in commercial salmonid aquaculture. However, at the present, the population genetic structure of this species is still largely unresolved. Here, full-genome sequencing was used to produce the first genomic reference for this species, to study population-genomic divergence among four geographically distinct populations, and, to identify informative SNP markers for future studies. RESULTS: After construction of a de novo assembly, the genome was estimated to be highly polymorphic and of ~600Mbp in size. 33,235 SNPs were thereafter selected to assess genomic diversity and differentiation among four populations collected from Scandinavia, Scotland, and Spain. Global FST among these populations was 0.015-0.092. Approximately 4% of the investigated loci were identified as putative global outliers, and ~ 1% within Scandinavia. SNPs showing large divergence (FST > 0.15) were picked as candidate diagnostic markers for population assignment. One hundred seventy-three of the most diagnostic SNPs between the two Scandinavian populations were validated by genotyping 47 individuals from each end of the species' Scandinavian distribution range. Sixty-nine of these SNPs were significantly (p < 0.05) differentiated (mean FST_173_loci = 0.065, FST_69_loci = 0.140). Using these validated SNPs, individuals were assigned with high probability (≥ 94%) to their populations of origin. CONCLUSIONS: Goldsinny wrasse displays a highly polymorphic genome, and substantial population genomic structure. Diversifying selection likely affects population structuring globally and within Scandinavia. The diagnostic loci identified now provide a promising and cost-efficient tool to investigate goldsinny wrasse populations further.
Assuntos
Deriva Genética , Genética Populacional , Perciformes/genética , Polimorfismo de Nucleotídeo Único , Animais , Genoma , Países Escandinavos e Nórdicos , Escócia , EspanhaRESUMO
Liver samples of two gadoid species, Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus), sampled in the southern Barents Sea in the period 1992-2015, were studied for the levels of six types of persistent organic pollutants (POPs): polychlorinated biphenyls (PCBs), chlorinated organic pesticides (DDTs, hexachlorocyclohexanes (HCHs), hexachlorobenzene (HCB), trans-nonachlor (TNC)), and polybrominated diphenyl ethers (PBDEs). Higher average levels were found in cod than in haddock. Sampling approximately every third year allowed studies of temporal trends for all the compound groups except PBDEs. Time series are reported for 1992-2015 for Atlantic cod and for 1998-2015 for haddock. Decreasing temporal trends have been modeled in cod for the analyzed POPs for this time period. The decrease seems to be slowing down in the later years. HCB levels showed least decrease with time among all the contaminants, with the poorest fit to the proposed model. Similar time trends were found in haddock, but the decrease is less apparent due to shorter time series. The observed time trends of legacy POPs document the effectiveness of efforts during the 1990s to reduce the levels of these contaminants in the marine environment but question the possibility to eliminate them altogether from the marine environment in the foreseeable future.
Assuntos
Monitoramento Ambiental , Gadiformes , Gadus morhua , Hidrocarbonetos Aromáticos , Poluentes Químicos da Água , Animais , Gadiformes/metabolismo , Gadus morhua/metabolismo , Hidrocarbonetos Aromáticos/análise , Hidrocarbonetos Aromáticos/metabolismo , Oceanos e Mares , Alimentos Marinhos/análise , Estações do Ano , Poluentes Químicos da Água/análise , Poluentes Químicos da Água/metabolismoRESUMO
BACKGROUND: In fish, morphological colour changes occur from variations in pigment concentrations and in the morphology, density, and distribution of chromatophores in the skin. However, the underlying mechanisms remain unresolved in most species. Here, we describe the first investigation into the genetic and environmental basis of spot pattern development in one of the world's most studied fishes, the Atlantic salmon. We reared 920 salmon from 64 families of domesticated, F1-hybrid and wild origin in two contrasting environments (Hatchery; tanks for the freshwater stage and sea cages for the marine stage, and River; a natural river for the freshwater stage and tanks for the marine stage). Fish were measured, photographed and spot patterns evaluated. RESULTS: In the Hatchery experiment, significant but modest differences in spot density were observed among domesticated, F1-hybrid (1.4-fold spottier than domesticated) and wild salmon (1.7-fold spottier than domesticated). A heritability of 6% was calculated for spot density, and a significant QTL on linkage group SSA014 was detected. In the River experiment, significant but modest differences in spot density were also observed among domesticated, F1-hybrid (1.2-fold spottier than domesticated) and wild salmon (1.8-fold spottier than domesticated). Domesticated salmon were sevenfold spottier in the Hatchery vs. River experiment. While different wild populations were used for the two experiments, on average, these were 6.2-fold spottier in the Hatchery vs. River experiment. Fish in the Hatchery experiment displayed scattered to random spot patterns while fish in the River experiment displayed clustered spot patterns. CONCLUSIONS: These data demonstrate that while genetics plays an underlying role, environmental variation represents the primary determinant of spot pattern development in Atlantic salmon.
Assuntos
Meio Ambiente , Pigmentação/fisiologia , Salmo salar/fisiologia , Animais , Pigmentação/genética , Salmo salar/genéticaRESUMO
BACKGROUND: In the marine environment, where there are few absolute physical barriers, contemporary contact between previously isolated species can occur across great distances, and in some cases, may be inter-oceanic. An example of this can be seen in the minke whale species complex. Antarctic minke whales are genetically and morphologically distinct from the common minke found in the north Atlantic and Pacific oceans, and the two species are estimated to have been isolated from each other for 5 million years or more. Recent atypical migrations from the southern to the northern hemisphere have been documented and fertile hybrids and back-crossed individuals between both species have also been identified. However, it is not known whether this represents a contemporary event, potentially driven by ecosystem changes in the Antarctic, or a sporadic occurrence happening over an evolutionary time-scale. We successfully used whole genome resequencing to identify a panel of diagnostic SNPs which now enable us address this evolutionary question. RESULTS: A large number of SNPs displaying fixed or nearly fixed allele frequency differences among the minke whale species were identified from the sequence data. Five panels of putatively diagnostic markers were established on a genotyping platform for validation of allele frequencies; two panels (26 and 24 SNPs) separating the two species of minke whale, and three panels (22, 23, and 24 SNPs) differentiating the three subspecies of common minke whale. The panels were validated against a set of reference samples, demonstrating the ability to accurately identify back-crossed whales up to three generations. CONCLUSIONS: This work has resulted in the development of a panel of novel diagnostic genetic markers to address inter-oceanic and global contact among the genetically isolated minke whale species and sub-species. These markers, including a globally relevant genetic reference data set for this species complex, are now openly available for researchers interested in identifying other potential whale hybrids in the world's oceans. The approach used here, combining whole genome resequencing and high-throughput genotyping, represents a universal approach to develop similar tools for other species and population complexes.
Assuntos
Migração Animal , Marcadores Genéticos , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Hibridização Genética , Baleia Anã/genética , Alelos , Animais , Mapeamento Cromossômico , Cruzamentos Genéticos , Frequência do Gene , Genética Populacional , Genômica/métodos , Genótipo , Polimorfismo de Nucleotídeo Único , Dinâmica Populacional , Reprodutibilidade dos TestesRESUMO
Atlantic cod (Gadus morhua) is a large, cold-adapted teleost that sustains long-standing commercial fisheries and incipient aquaculture. Here we present the genome sequence of Atlantic cod, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates. The genome assembly was obtained exclusively by 454 sequencing of shotgun and paired-end libraries, and automated annotation identified 22,154 genes. The major histocompatibility complex (MHC) II is a conserved feature of the adaptive immune system of jawed vertebrates, but we show that Atlantic cod has lost the genes for MHC II, CD4 and invariant chain (Ii) that are essential for the function of this pathway. Nevertheless, Atlantic cod is not exceptionally susceptible to disease under natural conditions. We find a highly expanded number of MHC I genes and a unique composition of its Toll-like receptor (TLR) families. This indicates how the Atlantic cod immune system has evolved compensatory mechanisms in both adaptive and innate immunity in the absence of MHC II. These observations affect fundamental assumptions about the evolution of the adaptive immune system and its components in vertebrates.
Assuntos
Gadus morhua/genética , Gadus morhua/imunologia , Genoma/genética , Sistema Imunitário/imunologia , Imunidade/genética , Animais , Evolução Molecular , Genômica , Hemoglobinas/genética , Imunidade/imunologia , Complexo Principal de Histocompatibilidade/genética , Complexo Principal de Histocompatibilidade/imunologia , Masculino , Polimorfismo Genético/genética , Sintenia/genética , Receptores Toll-Like/genéticaRESUMO
BACKGROUND: Nuclear receptors have crucial roles in all metazoan animals as regulators of gene transcription. A wide range of studies have elucidated molecular and biological significance of nuclear receptors but there are still a large number of animals where the knowledge is very limited. In the present study we have identified an RXR type of nuclear receptor in the salmon louse (Lepeophtheirus salmonis) (i.e. LsRXR). RXR is one of the two partners of the Ecdysteroid receptor in arthropods, the receptor for the main molting hormone 20-hydroxyecdysone (E20) with a wide array of effects in arthropods. RESULTS: Five different LsRXR transcripts were identified by RACE showing large differences in domain structure. The largest isoforms contained complete DNA binding domain (DBD) and ligand binding domain (LBD), whereas some variants had incomplete or no DBD. LsRXR is transcribed in several tissues in the salmon louse including ovary, subcuticular tissue, intestine and glands. By using Q-PCR it is evident that the LsRXR mRNA levels vary throughout the L. salmonis life cycle. We also show that the truncated LsRXR transcript comprise about 50% in all examined samples. We used RNAi to knock-down the transcription in adult reproducing female lice. This resulted in close to zero viable offspring. We also assessed the LsRXR RNAi effects using a L. salmonis microarray and saw significant effects on transcription in the female lice. Transcription of the major yolk proteins was strongly reduced by knock-down of LsRXR. Genes involved in lipid metabolism and transport were also down regulated. Furthermore, different types of growth processes were up regulated and many cuticle proteins were present in this group. CONCLUSIONS: The present study demonstrates the significance of LsRXR in adult female L. salmonis and discusses the functional aspects in relation to other arthropods. LsRXR has a unique structure that should be elucidated in the future.
Assuntos
Copépodes/genética , Interações Hospedeiro-Parasita/genética , Receptores do Ácido Retinoico/genética , Animais , Copépodes/patogenicidade , Proteínas de Ligação a DNA/genética , Ecdisterona/genética , Ecdisterona/metabolismo , Feminino , Estágios do Ciclo de Vida , Metabolismo dos Lipídeos/genética , Dados de Sequência Molecular , Ovário/crescimento & desenvolvimento , Ovário/parasitologia , Estrutura Terciária de Proteína , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Receptores do Ácido Retinoico/metabolismo , Reprodução/genética , Salmão/parasitologiaRESUMO
BACKGROUND: High-throughput sequencing is a cost effective method for identifying genetic variation, and it is currently in use on a large scale across the field of biology, including ecology and population genetics. Correctly identifying variable sites and allele frequencies from sequencing data remains challenging, in large part due to artifacts and biases inherent in the sequencing process. Selecting variants that are diagnostic is commonly done using diversity statistics like FST, but these measures are not ideal for the task. RESULTS: Here, we develop a method that directly calculates the expected amount of information gained from observing each variant site. We then develop and implement a conservative estimator that takes into account uncertainity introduced by sampling bias and sequencing error. This estimator is applied to simulated and real sequencing data, and we discuss how it performs compared to the commonly used existing methods for identifying diagnostic polymorphisms. CONCLUSION: The expected information content gives an easy to interpret measure for the usefulness of variant sites. The results show that we achieve a clear separation between true variants and noise, allowing us to select candidate sites with a high degree of confidence.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , Polimorfismo Genético , Software , Algoritmos , Animais , Conjuntos de Dados como Assunto , Frequência do Gene , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: The salmon louse, Lepeophtheirus salmonis, is an ectoparasite of salmonids that causes huge economic losses in salmon farming, and has also been causatively linked with declines of wild salmonid populations. Lice control on farms is reliant upon a few groups of pesticides that have all shown time-limited efficiency due to resistance development. However, to date, this example of human-induced evolution is poorly documented at the population level due to the lack of molecular tools. As such, important evolutionary and management questions, linked to the development and dispersal of pesticide resistance in this parasite, remain unanswered. Here, we introduce the first Single Nucleotide Polymorphism (SNP) array for the salmon louse, which includes 6000 markers, and present a population genomic scan using this array on 576 lice from twelve farms distributed across the North Atlantic. RESULTS: Our results support the hypothesis of a single panmictic population of lice in the Atlantic, and importantly, revealed very strong selective sweeps on linkage groups 1 and 5. These sweeps included candidate genes potentially connected to pesticide resistance. After genotyping a further 576 lice from 12 full sibling families, a genome-wide association analysis established a highly significant association between the major sweep on linkage group 5 and resistance to emamectin benzoate, the most widely used pesticide in salmonid aquaculture for more than a decade. CONCLUSIONS: The analysis of conserved haplotypes across samples from the Atlantic strongly suggests that emamectin benzoate resistance developed at a single source, and rapidly spread across the Atlantic within the period 1999 when the chemical was first introduced, to 2010 when samples for the present study were obtained. These results provide unique insights into the development and spread of pesticide resistance in the marine environment, and identify a small genomic region strongly linked to emamectin benzoate resistance. Finally, these results have highly significant implications for the way pesticide resistance is considered and managed within the aquaculture industry.
Assuntos
Copépodes/efeitos dos fármacos , Copépodes/genética , Resistência a Medicamentos , Polimorfismo de Nucleotídeo Único , Animais , Oceano Atlântico , Copépodes/classificação , Evolução Molecular , Humanos , Inseticidas/farmacologia , Ivermectina/análogos & derivados , Ivermectina/farmacologia , Salmão/parasitologiaRESUMO
MOTIVATION: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. RESULTS: With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. AVAILABILITY: JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos , Teorema de Bayes , Mapeamento Cromossômico , MetagenômicaRESUMO
BACKGROUND: Zygotic transcription in fish embryos initiates around the time of gastrulation, and all prior development is initiated and controlled by maternally derived messenger RNAs. Atlantic cod egg and embryo viability is variable, and it is hypothesized that the early development depends upon the feature of these maternal RNAs. Both the length and the presence of specific motifs in the 3'UTR of maternal RNAs are believed to regulate expression and stability of the maternal transcripts. Therefore, the aim of this study was to characterize the overall composition and 3'UTR structure of the most common maternal RNAs found in cod eggs and pre-zygotic embryos. RESULTS: 22229 Sanger-sequences were obtained from 3'-end sequenced cDNA libraries prepared from oocyte, 1-2 cell, blastula and gastrula stages. Quantitative PCR revealed that EST copy number below 9 did not reflect the gene expression profile. Consequently genes represented by less than 9 ESTs were excluded from downstream analyses, in addition to sequences with low-quality gene hits. This provided 12764 EST sequences, encoding 257 unique genes, for further analysis. Mitochondrial transcripts accounted for 45.9-50.6% of the transcripts isolated from the maternal stages, but only 12.2% of those present at the onset of zygotic transcription. 3'UTR length was predicted in nuclear sequences with poly-A tail, which identified 191 3'UTRs. Their characteristics indicated a more complex regulation of transcripts that are abundant prior to the onset of zygotic transcription. Maternal and stable transcripts had longer 3'UTR (mean 187.1 and 208.8 bp) and more 3'UTR isoforms (45.7 and 34.6%) compared to zygotic transcripts, where 15.4% had 3'UTR isoforms and the mean 3'UTR length was 76 bp. Also, diversity and the amount of putative polyadenylation motifs were higher in both maternal and stable transcripts. CONCLUSIONS: We report on the most pronounced processes in the maternally transferred cod transcriptome. Maternal stages are characterized by a rich abundance of mitochondrial transcripts. Maternal and stable transcripts display longer 3'UTRs with more variation of both polyadenylation motifs and 3'UTR isoforms. These data suggest that cod eggs possess a complex array of maternal RNAs which likely act to tightly regulate early developmental processes in the newly fertilized egg.
Assuntos
Regiões 3' não Traduzidas/genética , Embrião não Mamífero/metabolismo , Gadus morhua/genética , Animais , Etiquetas de Sequências Expressas , Regulação da Expressão Gênica no Desenvolvimento , Reação em Cadeia da Polimerase , Zigoto/metabolismoRESUMO
UNLABELLED: The SFF file format produced by Roche's 454 sequencing technology is a compact, binary format that contains the flow values that are used for base and quality calling of the reads. Applications, e.g. in metagenomics, often depend on accurate sequence information, and access to flow values is important to estimate the probability of errors. Unfortunately, the programs supplied by Roche for accessing this information are not publicly available. Flower is a program that can extract the information contained in SFF files, and convert it to various textual output formats. AVAILABILITY: Flower is freely available under the General Public License.
Assuntos
Análise de Sequência de DNA , Software , Sequência de BasesRESUMO
MOTIVATION: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. RESULTS: By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. AVAILABILITY: The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. CONTACT: susanne.balzer@imr.no.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Animais , Bass/genética , Simulação por Computador , Reação em Cadeia da PolimeraseRESUMO
The genome size of organisms impacts their evolution and biology and is often assumed to be characteristic of a species. Here we present the first published estimates of genome size of the ecologically and economically important ectoparasite, Lepeophtheirus salmonis (Copepoda, Caligidae). Four independent L. salmonis genome assemblies of the North Atlantic subspecies Lepeophtheirus salmonis salmonis, including two chromosome level assemblies, yield assemblies ranging from 665 to 790 Mbps. These genome assemblies are congruent in their findings, and appear very complete with Benchmarking Universal Single-Copy Orthologs analyses finding > 92% of expected genes and transcriptome datasets routinely mapping > 90% of reads. However, two cytometric techniques, flow cytometry and Feulgen image analysis densitometry, yield measurements of 1.3-1.6 Gb in the haploid genome. Interestingly, earlier cytometric measurements reported genome sizes of 939 and 567 Mbps in L. salmonis salmonis samples from Bay of Fundy and Norway, respectively. Available data thus suggest that the genome sizes of salmon lice are variable. Current understanding of eukaryotic genome dynamics suggests that the most likely explanation for such variability involves repetitive DNA, which for L. salmonis makes up ≈ 60% of the genome assemblies.
Assuntos
Copépodes , Doenças dos Peixes , Animais , Copépodes/genética , Doenças dos Peixes/genética , Genoma , Noruega , TranscriptomaRESUMO
MOTIVATION: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to approximately 500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. RESULTS: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. AVAILABILITY: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/.
Assuntos
Simulação por Computador , Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Sequência de Bases , Bass/genética , Mapeamento Cromossômico , DNA Bacteriano , Processamento Eletrônico de Dados , Escherichia coli/genética , Análise de Sequência de DNA/instrumentaçãoRESUMO
The notochord functions as the midline structural element of all vertebrate embryos, and allows movement and growth at early developmental stages. Moreover, during embryonic development, notochord cells produce secreted factors that provide positional and fate information to a broad variety of cells within adjacent tissues, for instance those of the vertebrae, central nervous system and somites. Due to the large size of the embryo, the salmon notochord is useful to study as a model for exploring notochord development. To investigate factors that might be involved in notochord development, a normalized cDNA library was constructed from a mix of notochords from â¼500 to â¼800 day°. From the 1968 Sanger-sequenced transcripts, 22 genes were identified to be predominantly expressed in the notochord compared to other organs of salmon. Twelve of these genes were found to show expressional regulation around mineralization of the notochord sheath; 11 genes were up-regulated and one gene was down-regulated. Two genes were found to be specifically expressed in the notochord; these genes showed similarity to vimentin (acc. no GT297094) and elastin (acc. no GT297478). In-situ results showed that the vimentin- like transcript was expressed in both chordocytes and chordoblasts, whereas the elastin- like transcript was uniquely expressed in the chordoblasts lining the notochordal sheath. In salmon aquaculture, vertebral deformities are a common problem, and some malformations have been linked to the notochord. The expression of identified transcripts provides further insight into processes taking place in the developing notochord, prior to and during the early mineralization period.
Assuntos
Elastina/genética , Notocorda/embriologia , Notocorda/metabolismo , Salmo salar/embriologia , Salmo salar/genética , Vimentina/genética , Animais , Elastina/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Biblioteca Gênica , Microdissecção , Anotação de Sequência Molecular , Notocorda/citologia , Notocorda/ultraestrutura , Fases de Leitura Aberta/genética , Especificidade de Órgãos/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Vimentina/metabolismoRESUMO
MOTIVATION: The nucleotide sequencing process produces not only the sequence of nucleotides, but also associated quality values. Quality values provide valuable information, but are primarily used only for trimming sequences and generally ignored in subsequent analyses. RESULTS: This article describes how the scoring schemes of standard alignment algorithms can be modified to take into account quality values to produce improved alignments and statistically more accurate scores. A prototype implementation is also provided, and used to post-process a set of BLAST results. Quality-adjusted alignment is a natural extension of standard alignment methods, and can be implemented with only a small constant factor performance penalty. The method can also be applied to related methods including heuristic search algorithms like BLAST and FASTA. AVAILABILITY: http://malde.org/~ketil/qaa.
Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
To identify and characterize genes and proteins of the Atlantic halibut (Hippoglossus hippoglossus) immune system, six cDNA libraries were constructed from liver, kidney, spleen, peripheral blood, and thymus. Halibut were injected with nodavirus, infectious pancreatic necrosis virus (IPNV), or vibriosis vaccine and tissue samples were collected at various time points. Leukocytes from peripheral blood and spleen from stimulated and mock-injected fish were isolated and further in vitro activated with the mitogens, concanavalin A (Con A) and phorbol myristate acetate (PMA) to facilitate activation and proliferation. A total of 5117 high quality expressed sequence tags (ESTs) were identified and assembled into 781 contigs and 2796 singletons. Amongst these ESTs, 147 different putative immune related genes were identified. Several genes involved in innate and adaptive immune responses such as complement proteins, immunoglobulins, cell surface receptors, and cytokines and chemokines were identified. Of the immune related genes identified in this study, 44% had no match against any of the publicly available sequence data for halibut and thus can be considered as novel identification in halibut species. The approach of combining in vivo antigenic with in vitro mitogen stimulation, in addition to preparation of cDNA libraries from thymus enabled identification of many of the interesting genes including those involved in T-cell receptor complex.
Assuntos
Linguado/genética , Linguado/imunologia , Animais , Sequência de Bases , Etiquetas de Sequências Expressas , Biblioteca Gênica , Vírus da Necrose Pancreática Infecciosa/imunologia , Mitógenos/imunologia , Dados de Sequência Molecular , Nodaviridae/imunologia , Alinhamento de Sequência , Análise de Sequência de DNA , Vibrio/imunologiaRESUMO
BACKGROUND: Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats. RESULTS: Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome). CONCLUSION: Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries.
Assuntos
Etiquetas de Sequências Expressas , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos , Animais , Mapeamento Cromossômico , Análise por Conglomerados , Biblioteca Gênica , Genoma , Oryzias/genética , Peixe-Zebra/genéticaRESUMO
The age structure of a fish population has important implications for recruitment processes and population fluctuations, and is a key input to fisheries-assessment models. The current method of determining age structure relies on manually reading age from otoliths, and the process is labor intensive and dependent on specialist expertise. Recent advances in machine learning have provided methods that have been remarkably successful in a variety of settings, with potential to automate analysis that previously required manual curation. Machine learning models have previously been successfully applied to object recognition and similar image analysis tasks. Here we investigate whether deep learning models can also be used for estimating the age of otoliths from images. We adapt a pre-trained convolutional neural network designed for object recognition, to estimate the age of fish from otolith images. The model is trained and validated on a large collection of images of Greenland halibut otoliths. We show that the model works well, and that its precision is comparable to documented precision obtained by human experts. Automating this analysis may help to improve consistency, lower cost, and increase the extent of age estimation. Given that adequate data are available, this method could also be used to estimate age of other species using images of otoliths or fish scales.