Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Nat Commun ; 15(1): 372, 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38191463

ABSTRACT

Homing-based gene drives are recently proposed interventions promising the area-wide, species-specific genetic control of harmful insect populations. Here we characterise a first set of gene drives in a tephritid agricultural pest species, the Mediterranean fruit fly Ceratitis capitata (medfly). Our results show that the medfly is highly amenable to homing-based gene drive strategies. By targeting the medfly transformer gene, we also demonstrate how CRISPR-Cas9 gene drive can be coupled to sex conversion, whereby genetic females are transformed into fertile and harmless XX males. Given this unique malleability of sex determination, we modelled gene drive interventions that couple sex conversion and female sterility and found that such approaches could be effective and tolerant of resistant allele selection in the target population. Our results open the door for developing gene drive strains for the population suppression of the medfly and related tephritid pests by co-targeting female reproduction and shifting the reproductive sex ratio towards males. They demonstrate the untapped potential for gene drives to tackle agricultural pests in an environmentally friendly and economical way.


Subject(s)
Ceratitis capitata , Gene Drive Technology , Female , Male , Animals , Ceratitis capitata/genetics , Agriculture , Alleles , Electric Power Supplies
2.
Mem Inst Oswaldo Cruz ; 118: e230122, 2023.
Article in English | MEDLINE | ID: mdl-37937604

ABSTRACT

BACKGROUND: Epstein-Barr virus (EBV) is a human gammaherpesvirus etiologically linked to several benign and malignant diseases. EBV-associated malignancies exhibit an unusual global distribution that might be partly attributed to virus and host genetic backgrounds. OBJECTIVES: To assemble a new genome of EBV (CEMO3) from a paediatric Burkitt's lymphoma from Rio de Janeiro State (Southeast Brazil). In addition, to perform global phylogenetic analysis using complete EBV genomes, including CEMO3, and investigate the genetic relationship of some South American (SA) genomes through EBV subgenomic targets. METHODS: CEMO3 was sequenced through next generation sequencing and its coverage and gaps were corrected through the Sanger method. CEMO3 and 67 EBV genomes representing diverse geographic regions were evaluated through maximum likelihood phylogenetic analysis. Further, the polymorphism of subgenomic regions of some SA EBV genomes were assessed. FINDINGS: The whole bulk tumour sequencing yielded 23,217 reads related to EBV, which 172,713 base pairs of the newly EBV genome CEMO3 was assembled. The CEMO3 and most SA EBV genomes clustered within the SA subclade closely related to the African Raji strain, forming the South American/Raji clade. Notably, these Raji-related genomes exhibit significant genetic diversity, characterised by distinctive synapomorphies at some gene levels absent in the original Raji strain. CONCLUSION: The CEMO3 represents a new South American EBV genome assembled. Albeit the majority of EBV genomes from SA are Raji-related, it harbours a high diversity different from the original Raji strain.


Subject(s)
Epstein-Barr Virus Infections , Herpesvirus 4, Human , Child , Humans , Herpesvirus 4, Human/genetics , Epstein-Barr Virus Infections/genetics , Epstein-Barr Virus Infections/pathology , Phylogeny , Genome, Viral/genetics , Brazil
3.
DNA Res ; 30(1)2023 Feb 01.
Article in English | MEDLINE | ID: mdl-36370138

ABSTRACT

The New World Screwworm, Cochliomyia hominivorax (Calliphoridae), is the most important myiasis-causing species in America. Screwworm myiasis is a zoonosis that can cause severe lesions in livestock, domesticated and wild animals, and occasionally in people. Beyond the sanitary problems associated with this species, these infestations negatively impact economic sectors, such as the cattle industry. Here, we present a chromosome-scale assembly of C. hominivorax's genome, organized in 6 chromosome-length and 515 unplaced scaffolds spanning 534 Mb. There was a clear correspondence between the D. melanogaster linkage groups A-E and the chromosomal-scale scaffolds. Chromosome quotient (CQ) analysis identified a single scaffold from the X chromosome that contains most of the orthologs of genes that are on the D. melanogaster fourth chromosome (linkage group F or dot chromosome). CQ analysis also identified potential X and Y unplaced scaffolds and genes. Y-linkage for selected regions was confirmed by PCR with male and female DNA. Some of the long chromosome-scale scaffolds include Y-linked sequences, suggesting misassembly of these regions. These resources will provide a basis for future studies aiming at understanding the biology and evolution of this devastating obligate parasite.


Subject(s)
Myiasis , Screw Worm Infection , Animals , Male , Female , Cattle , Calliphoridae , Drosophila melanogaster , Myiasis/veterinary , Screw Worm Infection/veterinary , Chromosomes
4.
Mem. Inst. Oswaldo Cruz ; 118: e230122, 2023. tab, graf
Article in English | LILACS-Express | LILACS | ID: biblio-1521242

ABSTRACT

BACKGROUND Epstein-Barr virus (EBV) is a human gammaherpesvirus etiologically linked to several benign and malignant diseases. EBV-associated malignancies exhibit an unusual global distribution that might be partly attributed to virus and host genetic backgrounds. OBJECTIVES To assemble a new genome of EBV (CEMO3) from a paediatric Burkitt's lymphoma from Rio de Janeiro State (Southeast Brazil). In addition, to perform global phylogenetic analysis using complete EBV genomes, including CEMO3, and investigate the genetic relationship of some South American (SA) genomes through EBV subgenomic targets. METHODS CEMO3 was sequenced through next generation sequencing and its coverage and gaps were corrected through the Sanger method. CEMO3 and 67 EBV genomes representing diverse geographic regions were evaluated through maximum likelihood phylogenetic analysis. Further, the polymorphism of subgenomic regions of some SA EBV genomes were assessed. FINDINGS The whole bulk tumour sequencing yielded 23,217 reads related to EBV, which 172,713 base pairs of the newly EBV genome CEMO3 was assembled. The CEMO3 and most SA EBV genomes clustered within the SA subclade closely related to the African Raji strain, forming the South American/Raji clade. Notably, these Raji-related genomes exhibit significant genetic diversity, characterised by distinctive synapomorphies at some gene levels absent in the original Raji strain. CONCLUSION The CEMO3 represents a new South American EBV genome assembled. Albeit the majority of EBV genomes from SA are Raji-related, it harbours a high diversity different from the original Raji strain.

5.
Sci Rep ; 12(1): 7619, 2022 05 10.
Article in English | MEDLINE | ID: mdl-35538127

ABSTRACT

Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the first report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, high-accuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.


Subject(s)
DNA Barcoding, Taxonomic , High-Throughput Nucleotide Sequencing , DNA Barcoding, Taxonomic/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
7.
BMC Biol ; 19(1): 78, 2021 04 16.
Article in English | MEDLINE | ID: mdl-33863334

ABSTRACT

BACKGROUND: Genetic sex ratio distorters are systems aimed at effecting a bias in the reproductive sex ratio of a population and could be applied for the area-wide control of sexually reproducing insects that vector disease or disrupt agricultural production. One example of such a system leading to male bias is X-shredding, an approach that interferes with the transmission of the X-chromosome by inducing multiple DNA double-strand breaks during male meiosis. Endonucleases targeting the X-chromosome and whose activity is restricted to male gametogenesis have recently been pioneered as a means to engineer such traits. RESULTS: Here, we enabled endogenous CRISPR/Cas9 and CRISPR/Cas12a activity during spermatogenesis of the Mediterranean fruit fly Ceratitis capitata, a worldwide agricultural pest of extensive economic significance. In the absence of a chromosome-level assembly, we analysed long- and short-read genome sequencing data from males and females to identify two clusters of abundant and X-chromosome-specific sequence repeats. When targeted by gRNAs in conjunction with Cas9, cleavage of these repeats yielded a significant and consistent distortion of the sex ratio towards males in independent transgenic strains, while the combination of distinct distorters induced a strong bias (~ 80%). CONCLUSION: We provide a first demonstration of CRISPR-based sex distortion towards male bias in a non-model organism, the global pest insect Ceratitis capitata. Although the sex ratio bias reached in our study would require improvement, possibly through the generation and combination of additional transgenic lines, to result in a system with realistic applicability in the field, our results suggest that strains with characteristics suitable for field application can now be developed for a range of medically or agriculturally relevant insect species.


Subject(s)
Ceratitis capitata , Animals , Animals, Genetically Modified , CRISPR-Cas Systems/genetics , Ceratitis capitata/genetics , Female , Male , RNA, Guide, Kinetoplastida , Sex Ratio , X Chromosome/genetics
8.
Genome Biol ; 21(1): 215, 2020 08 26.
Article in English | MEDLINE | ID: mdl-32847630

ABSTRACT

BACKGROUND: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes. RESULTS: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes. CONCLUSION: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures.


Subject(s)
Aedes/genetics , Arboviruses/genetics , Genome , Mosquito Vectors/genetics , Aedes/immunology , Aedes/virology , Animals , Chromosome Mapping , Chromosomes , Genome Size , Immunity , Insect Vectors , Mosquito Vectors/immunology , Mosquito Vectors/virology , RNA, Small Interfering/genetics , Transcriptome
9.
BMC Genomics ; 19(Suppl 8): 860, 2018 Dec 11.
Article in English | MEDLINE | ID: mdl-30537925

ABSTRACT

BACKGROUND: In living organisms, small heat shock proteins (sHSPs) are triggered in response to stress situations. This family of proteins is large in plants and, in the case of tomato (Solanum lycopersicum), 33 genes have been identified, most of them related to heat stress response and to the ripening process. Transcriptomic and proteomic studies have revealed complex patterns of expression for these genes. In this work, we investigate the coregulation of these genes by performing a computational analysis of their promoter architecture to find regulatory motifs known as heat shock elements (HSEs). We leverage the presence of sHSP members that originated from tandem duplication events and analyze the promoter architecture diversity of the whole sHSP family, focusing on the identification of HSEs. RESULTS: We performed a search for conserved genomic sequences in the promoter regions of the sHSPs of tomato, plus several other proteins (mainly HSPs) that are functionally related to heat stress situations or to ripening. Several computational analyses were performed to build multiple sequence motifs and identify transcription factor binding sites (TFBS) homologous to HSF1AE and HSF21 in Arabidopsis. We also investigated the expression and interaction of these proteins under two heat stress situations in whole tomato plants and in protoplast cells, both in the presence and in the absence of heat shock transcription factor A2 (HsfA2). The results of these analyses indicate that different sHSPs are up-regulated depending on the activation or repression of HsfA2, a key regulator of HSPs. Further, the analysis of protein-protein interaction between the sHSP protein family and other heat shock response proteins (Hsp70, Hsp90 and MBF1c) suggests that several sHSPs are mediating alternative stress response through a regulatory subnetwork that is not dependent on HsfA2. CONCLUSIONS: Overall, this study identifies two regulatory motifs (HSF1AE and HSF21) associated with the sHSP family in tomato which are considered genomic HSEs. The study also suggests that, despite the apparent redundancy of these proteins, which has been linked to gene duplication, tomato sHSPs showed different up-regulation and different interaction patterns when analyzed under different stress situations.


Subject(s)
Gene Expression Regulation, Plant , Heat-Shock Proteins, Small/genetics , Nucleotide Motifs , Plant Proteins/genetics , Regulatory Sequences, Nucleic Acid , Solanum lycopersicum/genetics , Gene Duplication , Heat-Shock Proteins, Small/metabolism , Heat-Shock Response , Solanum lycopersicum/growth & development , Solanum lycopersicum/metabolism , Plant Proteins/metabolism , Promoter Regions, Genetic , Protein Interaction Maps
10.
PLoS Genet ; 14(11): e1007770, 2018 11.
Article in English | MEDLINE | ID: mdl-30388103

ABSTRACT

Y chromosomes are widely believed to evolve from a normal autosome through a process of massive gene loss (with preservation of some male genes), shaped by sex-antagonistic selection and complemented by occasional gains of male-related genes. The net result of these processes is a male-specialized chromosome. This might be expected to be an irreversible process, but it was found in 2005 that the Drosophila pseudoobscura Y chromosome was incorporated into an autosome. Y chromosome incorporations have important consequences: a formerly male-restricted chromosome reverts to autosomal inheritance, and the species may shift from an XY/XX to X0/XX sex-chromosome system. In order to assess the frequency and causes of this phenomenon we searched for Y chromosome incorporations in 400 species from Drosophila and related genera. We found one additional large scale event of Y chromosome incorporation, affecting the whole montium subgroup (40 species in our sample); overall 13% of the sampled species (52/400) have Y incorporations. While previous data indicated that after the Y incorporation the ancestral Y disappeared as a free chromosome, the much larger data set analyzed here indicates that a copy of the Y survived as a free chromosome both in montium and pseudoobscura species, and that the current Y of the pseudoobscura lineage results from a fusion between this free Y and the neoY. The 400 species sample also showed that the previously suggested causal connection between X-autosome fusions and Y incorporations is, at best, weak: the new case of Y incorporation (montium) does not have X-autosome fusion, whereas nine independent cases of X-autosome fusions were not followed by Y incorporations. Y incorporation is an underappreciated mechanism affecting Y chromosome evolution; our results show that at least in Drosophila it plays a relevant role and highlight the need of similar studies in other groups.


Subject(s)
Drosophila/classification , Drosophila/genetics , Y Chromosome/genetics , Animals , Evolution, Molecular , Female , Gene Duplication , Genes, Insect , Genetic Linkage , Male , Models, Genetic , Phylogeny , Selection, Genetic , Species Specificity , Translocation, Genetic , X Chromosome/genetics
11.
Sci Rep ; 8(1): 7757, 2018 05 17.
Article in English | MEDLINE | ID: mdl-29773825

ABSTRACT

The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC+, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC+ classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC+ classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC+ classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.


Subject(s)
Arabidopsis/metabolism , Computational Biology/methods , Drosophila melanogaster/metabolism , Gene Ontology , Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Solanum lycopersicum/metabolism , Animals , Databases, Protein , Molecular Sequence Annotation , Proteins/analysis , Proteomics , Software
12.
Int J Mol Sci ; 19(3)2018 Mar 13.
Article in English | MEDLINE | ID: mdl-29534015

ABSTRACT

Classical Hodgkin lymphoma (cHL) cells overexpress heat-shock protein 90 (HSP90), an important intracellular signaling hub regulating cell survival, which is emerging as a promising therapeutic target. Here, we report the antitumor effect of celastrol, an anti-inflammatory compound and a recognized HSP90 inhibitor, in Hodgkin and Reed-Sternberg cell lines. Two disparate responses were recorded. In KM-H2 cells, celastrol inhibited cell proliferation, induced G0/G1 arrest, and triggered apoptosis through the activation of caspase-3/7. Conversely, L428 cells exhibited resistance to the compound. A proteomic screening identified a total of 262 differentially expressed proteins in sensitive KM-H2 cells and revealed that celastrol's toxicity involved the suppression of the MAPK/ERK (extracellular signal regulated kinase/mitogen activated protein kinase) pathway. The apoptotic effects were preceded by a decrease in RAS (proto-oncogene protein Ras), p-ERK1/2 (phospho-extracellular signal-regulated Kinase-1/2), and c-Fos (proto-oncogene protein c-Fos) protein levels, as validated by immunoblot analysis. The L428 resistant cells exhibited a marked induction of HSP27 mRNA and protein after celastrol treatment. Our results provide the first evidence that celastrol has antitumor effects in cHL cells through the suppression of the MAPK/ERK pathway. Resistance to celastrol has rarely been described, and our results suggest that in cHL it may be mediated by the upregulation of HSP27. The antitumor properties of celastrol against cHL and whether the disparate responses observed in vitro have clinical correlates deserve further research.


Subject(s)
Antineoplastic Agents/pharmacology , Drug Resistance, Neoplasm , HSP90 Heat-Shock Proteins/antagonists & inhibitors , Hodgkin Disease/metabolism , Reed-Sternberg Cells/metabolism , Triterpenes/pharmacology , Apoptosis , Cell Line, Tumor , Cell Proliferation , Humans , MAP Kinase Signaling System , Mitogen-Activated Protein Kinase 1/metabolism , Mitogen-Activated Protein Kinase 3/metabolism , Pentacyclic Triterpenes , Proteome , Proto-Oncogene Mas , Reed-Sternberg Cells/drug effects , ras Proteins/metabolism
13.
Bioinformatics ; 33(6): 807-813, 2017 03 15.
Article in English | MEDLINE | ID: mdl-27259539

ABSTRACT

Motivation: To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. Results: We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Availability and Implementation: Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . Contact: ezpeleta@cifasis-conicet.gov.ar.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Computer Simulation
14.
G3 (Bethesda) ; 6(10): 3027-3034, 2016 10 13.
Article in English | MEDLINE | ID: mdl-27565886

ABSTRACT

In plants, fruit maturation and oxidative stress can induce small heat shock protein (sHSP) synthesis to maintain cellular homeostasis. Although the tomato reference genome was published in 2012, the actual number and functionality of sHSP genes remain unknown. Using a transcriptomic (RNA-seq) and evolutionary genomic approach, putative sHSP genes in the Solanum lycopersicum (cv. Heinz 1706) genome were investigated. A sHSP gene family of 33 members was established. Remarkably, roughly half of the members of this family can be explained by nine independent tandem duplication events that determined, evolutionarily, their functional fates. Within a mitochondrial class subfamily, only one duplicated member, Solyc08g078700, retained its ancestral chaperone function, while the others, Solyc08g078710 and Solyc08g078720, likely degenerated under neutrality and lack ancestral chaperone function. Functional conservation occurred within a cytosolic class I subfamily, whose four members, Solyc06g076570, Solyc06g076560, Solyc06g076540, and Solyc06g076520, support ∼57% of the total sHSP RNAm in the red ripe fruit. Subfunctionalization occurred within a new subfamily, whose two members, Solyc04g082720 and Solyc04g082740, show heterogeneous differential expression profiles during fruit ripening. These findings, involving the birth/death of some genes or the preferential/plastic expression of some others during fruit ripening, highlight the importance of tandem duplication events in the expansion of the sHSP gene family in the tomato genome. Despite its evolutionary diversity, the sHSP gene family in the tomato genome seems to be endowed with a core set of four homeostasis genes: Solyc05g014280, Solyc03g082420, Solyc11g020330, and Solyc06g076560, which appear to provide a baseline protection during both fruit ripening and heat shock stress in different tomato tissues.


Subject(s)
Gene Duplication , Genes, Plant , Heat-Shock Proteins, Small/genetics , Multigene Family , Solanum lycopersicum/genetics , Tandem Repeat Sequences , Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation, Plant , Heat-Shock Proteins, Small/classification , Heat-Shock Proteins, Small/metabolism , Solanum lycopersicum/metabolism , Molecular Sequence Annotation , Phylogeny , Protein Transport , Transcriptome
15.
PLoS One ; 11(1): e0146986, 2016.
Article in English | MEDLINE | ID: mdl-26771463

ABSTRACT

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.


Subject(s)
Drosophila melanogaster/genetics , Gene Ontology , Algorithms , Animals , Arabidopsis/genetics , Computational Biology , Solanum lycopersicum/genetics , Saccharomyces cerevisiae/genetics , Software
16.
PLoS One ; 10(10): e0140459, 2015.
Article in English | MEDLINE | ID: mdl-26492348

ABSTRACT

For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).


Subject(s)
DNA Barcoding, Taxonomic/methods , Probability
17.
G3 (Bethesda) ; 5(6): 1145-50, 2015 Apr 09.
Article in English | MEDLINE | ID: mdl-25858959

ABSTRACT

The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 295-307) found 18 Y-linked copies of Mst77F ("Mst77Y"), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes.


Subject(s)
Drosophila melanogaster/genetics , Gene Dosage , Genes, Insect , Sequence Analysis, DNA/methods , Y Chromosome/genetics , Algorithms , Animals , Evolution, Molecular , Molecular Sequence Data , Reproducibility of Results
18.
Genetics ; 184(1): 295-307, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19897751

ABSTRACT

The Y chromosome of Drosophila melanogaster has <20 protein-coding genes. These genes originated from the duplication of autosomal genes and have male-related functions. In 1993, Russell and Kaiser found three Y-linked pseudogenes of the Mst77F gene, which is a testis-expressed autosomal gene that is essential for male fertility. We did a thorough search using experimental and computational methods and found 18 Y-linked copies of this gene (named Mst77Y-1-Mst77Y-18). Ten Mst77Y genes encode defective proteins and the other eight are potentially functional. These eight genes produce approximately 20% of the functional Mst77F-like mRNA, and molecular evolutionary analysis shows that they evolved under purifying selection. Hence several Mst77Y genes have all the features of functional genes. Mst77Y genes are present only in D. melanogaster, and phylogenetic analysis confirmed that the duplication is a recent event. The identification of functional Mst77Y genes reinforces the previous finding that gene gains play a prominent role in the evolution of the Drosophila Y chromosome.


Subject(s)
Drosophila Proteins/genetics , Drosophila melanogaster/genetics , Gene Dosage , Genes, Insect/genetics , Histones/genetics , Y Chromosome/genetics , Animals , DNA Restriction Enzymes/metabolism , Drosophila Proteins/metabolism , Evolution, Molecular , Female , Genes, Y-Linked/genetics , Histones/metabolism , Male , Sequence Analysis, DNA , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...