Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Sci Rep ; 14(1): 12983, 2024 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-38839808

RESUMEN

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli often results in a myriad of unpredictable issues with regard to protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as a valuable expression platform as a testbed for rapid prototyping expression parameters. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We applied a library of constructs with different combinations of promoters and rppA coding sequences to investigate the synergies between promoter and codon usage. Subsequently, we assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. More importantly, the choice of coding sequences and promoters impact protein expression synergistically, which should be considered for future efforts to use CFE for high-yield protein expression. The promoter strategy when applied to RppA was not completely correlated with that observed with GFP, indicating that different promoter strategies should be applied for different proteins. In vivo experiments suggest that there is correlation, but not complete alignment between expressing in cell free and in vivo. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs, which advances CFE as a tool for natural product research.


Asunto(s)
Sistema Libre de Células , Regiones Promotoras Genéticas , Streptomyces griseus/enzimología , Streptomyces griseus/genética , Streptomyces griseus/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Familia de Multigenes , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Codón/genética , Aciltransferasas
2.
bioRxiv ; 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38077034

RESUMEN

Some of the most metabolically diverse species of bacteria (e.g., Actinobacteria) have higher GC content in their DNA, differ substantially in codon usage, and have distinct protein folding environments compared to tractable expression hosts like Escherichia coli. Consequentially, expressing biosynthetic gene clusters (BGCs) from these bacteria in E. coli frequently results in a myriad of unpredictable issues with protein expression and folding, delaying the biochemical characterization of new natural products. Current strategies to achieve soluble, active expression of these enzymes in tractable hosts, such as BGC refactoring, can be a lengthy trial-and-error process. Cell-free expression (CFE) has emerged as 1) a valuable expression platform for enzymes that are challenging to synthesize in vivo, and as 2) a testbed for rapid prototyping that can improve cellular expression. Here, we use a type III polyketide synthase from Streptomyces griseus, RppA, which catalyzes the formation of the red pigment flaviolin, as a reporter to investigate BGC refactoring techniques. We synergistically tune promoter and codon usage to improve flaviolin production from cell-free expressed RppA. We then assess the utility of cell-free systems for prototyping these refactoring tactics prior to their implementation in cells. Overall, codon harmonization improves natural product synthesis more than traditional codon optimization across cell-free and cellular environments. Refactoring promoters and/or coding sequences via CFE can be a valuable strategy to rapidly screen for catalytically functional production of enzymes from BCGs. By showing the coordinators between CFE versus in vivo expression, this work advances CFE as a tool for natural product research.

3.
J Econ Entomol ; 116(3): 935-944, 2023 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-37311017

RESUMEN

The fall armyworm, Spodoptera frugiperda (J. E. Smith), is a highly polyphagous pest native to the tropical Americas that has recently spread to become a global super-pest threatening food and fiber production. Transgenic crops producing insecticidal Cry and Vip3Aa proteins from Bacillus thuringiensis (Bt) are used for control of this pest in its native range. The evolution of practical resistance represents the greatest threat to sustainability of this technology and its potential efficacy in the S. frugiperda invasive range. Monitoring for resistance is vital to management approaches delaying S. frugiperda resistance to Bt crops. DNA-based resistance screening provides higher sensitivity and cost-effectiveness than currently used bioassay-based monitoring. So far, practical S. frugiperda resistance to Bt corn-producing Cry1F has been genetically linked to mutations in the SfABCC2 gene, providing a model to develop and test monitoring tools. In this study, we performed targeted SfABCC2 sequencing followed by Sanger sequencing to confirm the detection of known and candidate resistance alleles to Cry1F corn in field-collected S. frugiperda from continental USA, Puerto Rico, Africa (Ghana, Togo, and South Africa), and Southeast Asia (Myanmar). Results confirm that the distribution of a previously characterized resistance allele (SfABCC2mut) is limited to Puerto Rico and identify 2 new candidate SfABCC2 alleles for resistance to Cry1F, one of them potentially spreading along the S. frugiperda migratory route in North America. No candidate resistance alleles were found in samples from the invasive S. frugiperda range. These results provide support for the potential use of targeted sequencing in Bt resistance monitoring programs.


Asunto(s)
Bacillus thuringiensis , Heterópteros , Animales , Spodoptera/genética , Alelos , Productos Agrícolas
4.
PLoS Negl Trop Dis ; 17(4): e0010862, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37043542

RESUMEN

Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the kinetoplastid parasites of the genus Leishmania, the causative agents of devastating diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. No approved efficacious vaccine exists for leishmaniasis and available therapeutic drugs are either toxic and/or expensive, or the parasites are becoming resistant to the more recently developed drugs. Therefore, sand fly and/or reservoir control are currently the most effective strategies to break transmission. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two geographically widespread and important sand fly vector species: Phlebotomus papatasi, a vector of Leishmania parasites that cause cutaneous leishmaniasis, (distributed in Europe, the Middle East and North Africa) and Lutzomyia longipalpis, a vector of Leishmania parasites that cause visceral leishmaniasis (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites.


Asunto(s)
Leishmania , Leishmaniasis Cutánea , Phlebotomus , Psychodidae , Animales , Humanos , Phlebotomus/parasitología , Psychodidae/parasitología , Leishmania/genética , Genómica
5.
Proc Natl Acad Sci U S A ; 120(11): e2219835120, 2023 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-36881629

RESUMEN

Species distributed across heterogeneous environments often evolve locally adapted ecotypes, but understanding of the genetic mechanisms involved in their formation and maintenance in the face of gene flow is incomplete. In Burkina Faso, the major African malaria mosquito Anopheles funestus comprises two strictly sympatric and morphologically indistinguishable yet karyotypically differentiated forms reported to differ in ecology and behavior. However, knowledge of the genetic basis and environmental determinants of An. funestus diversification was impeded by lack of modern genomic resources. Here, we applied deep whole-genome sequencing and analysis to test the hypothesis that these two forms are ecotypes differentially adapted to breeding in natural swamps versus irrigated rice fields. We demonstrate genome-wide differentiation despite extensive microsympatry, synchronicity, and ongoing hybridization. Demographic inference supports a split only ~1,300 y ago, closely following the massive expansion of domesticated African rice cultivation ~1,850 y ago. Regions of highest divergence, concentrated in chromosomal inversions, were under selection during lineage splitting, consistent with local adaptation. The origin of nearly all variations implicated in adaptation, including chromosomal inversions, substantially predates the ecotype split, suggesting that rapid adaptation was fueled mainly by standing genetic variation. Sharp inversion frequency differences likely facilitated adaptive divergence between ecotypes by suppressing recombination between opposing chromosomal orientations of the two ecotypes, while permitting free recombination within the structurally monomorphic rice ecotype. Our results align with growing evidence from diverse taxa that rapid ecological diversification can arise from evolutionarily old structural genetic variants that modify genetic recombination.


Asunto(s)
Anopheles , Malaria , Oryza , Animales , Inversión Cromosómica , Ecotipo , Fitomejoramiento , Anopheles/genética , Oryza/genética
6.
Proteins ; 90(9): 1721-1731, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35441395

RESUMEN

Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which performed better than or comparable to state-of-the-art sequence or other 3D structure-based PSC approaches. However, existing PSN-based PSC approaches model the whole 3D structure of a protein as a static (i.e., single-layer) PSN. Because folding of a protein is a dynamic process, where some parts (i.e., sub-structures) of a protein fold before others, modeling the 3D structure of a protein as a PSN that captures the sub-structures might further help improve the existing PSC performance. Here, we propose to model 3D structures of proteins as multi-layer sequential PSNs that approximate 3D sub-structures of proteins, with the hypothesis that this will improve upon the current state-of-the-art PSC approaches that are based on single-layer PSNs (and thus upon the existing state-of-the-art sequence and other 3D structural approaches). Indeed, we confirm this on 72 datasets spanning ~44 000 CATH and SCOPe protein domains.


Asunto(s)
Proteínas , Secuencia de Aminoácidos , Proteínas/química , Alineación de Secuencia
7.
Protein Sci ; 31(1): 221-231, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34738275

RESUMEN

There is a growing appreciation that synonymous codon usage, although historically regarded as phenotypically silent, can instead alter a wide range of mechanisms related to functional protein production, a term we use here to describe the net effect of transcription (mRNA synthesis), mRNA half-life, translation (protein synthesis) and the probability of a protein folding correctly to its active, functional structure. In particular, recent discoveries have highlighted the important role that sub-optimal codons can play in modifying co-translational protein folding. These results have drawn increased attention to the patterns of synonymous codon usage within coding sequences, particularly in light of the discovery that these patterns can be conserved across evolution for homologous proteins. Because synonymous codon usage differs between organisms, for heterologous gene expression it can be desirable to make synonymous codon substitutions to match the codon usage pattern from the original organism in the heterologous expression host. Here we present CHARMING (for Codon HARMonizING), a robust and versatile algorithm to design mRNA sequences for heterologous gene expression and other related codon harmonization tasks. CHARMING can be run as a downloadable Python script or via a web portal at http://www.codons.org.


Asunto(s)
Uso de Codones , Biosíntesis de Proteínas , Pliegue de Proteína , Proteínas , ARN Mensajero/genética , Programas Informáticos , Proteínas/genética , Proteínas/metabolismo
8.
Insects ; 12(7)2021 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-34357278

RESUMEN

Evolution of practical resistance is the main threat to the sustainability of transgenic crops producing insecticidal proteins from Bacillus thuringiensis (Bt crops). Monitoring of resistance to Cry and Vip3A proteins produced by Bt crops is critical to mitigate the development of resistance. Currently, Cry/Vip3A resistance allele monitoring is based on bioassays with larvae from inbreeding field-collected moths. As an alternative, DNA-based monitoring tools should increase sensitivity and reduce overall costs compared to bioassay-based screening methods. Here, we evaluated targeted sequencing as a method allowing detection of known and novel candidate resistance alleles to Cry proteins. As a model, we sequenced a Cry1F receptor gene (SfABCC2) in fall armyworm (Spodoptera frugiperda) moths from Puerto Rico, a location reporting continued practical field resistance to Cry1F-producing corn. Targeted sequencing detected a previously reported Cry1F resistance allele (SfABCC2mut), in addition to a resistance allele originally described in S. frugiperda populations from Brazil. Moreover, targeted sequencing detected mutations in SfABCC2 as novel candidate resistance alleles. These results support further development of targeted sequencing for monitoring resistance to Bt crops and provide unexpected evidence for common resistance alleles in S. frugiperda from Brazil and Puerto Rico.

10.
Commun Biol ; 4(1): 734, 2021 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-34127785

RESUMEN

Genetic crosses are most powerful for linkage analysis when progeny numbers are high, parental alleles segregate evenly and numbers of inbred progeny are minimized. We previously developed a novel genetic crossing platform for the human malaria parasite Plasmodium falciparum, an obligately sexual, hermaphroditic protozoan, using mice carrying human hepatocytes (the human liver-chimeric FRG NOD huHep mouse) as the vertebrate host. We report on two genetic crosses-(1) an allopatric cross between a laboratory-adapted parasite (NF54) of African origin and a recently patient-derived Asian parasite, and (2) a sympatric cross between two recently patient-derived Asian parasites. We generated 144 unique recombinant clones from the two crosses, doubling the number of unique recombinant progeny generated in the previous 30 years. The allopatric African/Asian cross has minimal levels of inbreeding and extreme segregation distortion, while in the sympatric Asian cross, inbred progeny predominate and parental alleles segregate evenly. Using simulations, we demonstrate that these progeny provide the power to map small-effect mutations and epistatic interactions. The segregation distortion in the allopatric cross slightly erodes power to detect linkage in several genome regions. We greatly increase the power and the precision to map biomedically important traits with these new large progeny panels.


Asunto(s)
Mapeo Cromosómico/métodos , Cruzamientos Genéticos , Hepatocitos/parasitología , Plasmodium falciparum/genética , Animales , Estudios de Asociación Genética , Hepatocitos/trasplante , Humanos , Ratones , Quimera por Trasplante
11.
BMC Biol ; 19(1): 41, 2021 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-33750380

RESUMEN

BACKGROUND: The stable fly, Stomoxys calcitrans, is a major blood-feeding pest of livestock that has near worldwide distribution, causing an annual cost of over $2 billion for control and product loss in the USA alone. Control of these flies has been limited to increased sanitary management practices and insecticide application for suppressing larval stages. Few genetic and molecular resources are available to help in developing novel methods for controlling stable flies. RESULTS: This study examines stable fly biology by utilizing a combination of high-quality genome sequencing and RNA-Seq analyses targeting multiple developmental stages and tissues. In conjunction, 1600 genes were manually curated to characterize genetic features related to stable fly reproduction, vector host interactions, host-microbe dynamics, and putative targets for control. Most notable was characterization of genes associated with reproduction and identification of expanded gene families with functional associations to vision, chemosensation, immunity, and metabolic detoxification pathways. CONCLUSIONS: The combined sequencing, assembly, and curation of the male stable fly genome followed by RNA-Seq and downstream analyses provide insights necessary to understand the biology of this important pest. These resources and new data will provide the groundwork for expanding the tools available to control stable fly infestations. The close relationship of Stomoxys to other blood-feeding (horn flies and Glossina) and non-blood-feeding flies (house flies, medflies, Drosophila) will facilitate understanding of the evolutionary processes associated with development of blood feeding among the Cyclorrhapha.


Asunto(s)
Genoma de los Insectos , Interacciones Huésped-Parásitos/genética , Control de Insectos , Muscidae/genética , Animales , Reproducción/genética
12.
BMC Genomics ; 22(1): 179, 2021 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-33711916

RESUMEN

BACKGROUND: The fall armyworm (Spodoptera frugiperda (J.E. Smith)) is a highly polyphagous agricultural pest with long-distance migratory behavior threatening food security worldwide. This pest has a host range of > 80 plant species, but two host strains are recognized based on their association with corn (C-strain) or rice and smaller grasses (R-strain). The population genomics of the United States (USA) fall armyworm remains poorly characterized to date despite its agricultural threat. RESULTS: In this study, the population structure and genetic diversity in 55 S. frugiperda samples from Argentina, Brazil, Kenya, Puerto Rico and USA were surveyed to further our understanding of whole genome nuclear diversity. Comparisons at the genomic level suggest a panmictic S. frugiperda population, with only a minor reduction in gene flow between the two overwintering populations in the continental USA, also corresponding to distinct host strains at the mitochondrial level. Two maternal lines were detected from analysis of mitochondrial genomes. We found members from the Eastern Hemisphere interspersed within both continental USA overwintering subpopulations, suggesting multiple individuals were likely introduced to Africa. CONCLUSIONS: Our research is the largest diverse collection of United States S. frugiperda whole genome sequences characterized to date, covering eight continental states and a USA territory (Puerto Rico). The genomic resources presented provide foundational information to understand gene flow at the whole genome level among S. frugiperda populations. Based on the genomic similarities found between host strains and laboratory vs. field samples, our findings validate the experimental use of laboratory strains and the host strain differentiation based on mitochondria and sex-linked genetic markers extends to minor genome wide differences with some exceptions showing mixture between host strains is likely occurring in field populations.


Asunto(s)
Flujo Génico , Zea mays , Animales , Brasil , Humanos , Kenia , Spodoptera , Zea mays/genética
13.
Hereditas ; 158(1): 7, 2021 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-33509290

RESUMEN

BACKGROUND: The Aedes aegypti mosquito is a threat to human health across the globe. The A. aegypti genome was recently re-sequenced and re-assembled. Due to a combination of long-read PacBio and Hi-C sequencing, the AaegL5 assembly is chromosome complete and significantly improves the assembly in key areas such as the M/m sex-determining locus. Release of the updated genome assembly has precipitated the need to reprocess historical functional genomic data sets, including cis-regulatory element (CRE) maps that had previously been generated for A. aegypti. RESULTS: We re-processed and re-analyzed the A. aegypti whole embryo FAIRE seq data to create an updated embryonic CRE map for the AaegL5 genome. We validated that the new CRE map recapitulates key features of the original AaegL3 CRE map. Further, we built on the improved assembly in the M/m locus to analyze overlaps of open chromatin regions with genes. To support the validation, we created a new method (PeakMatcher) for matching peaks from the same experimental data set across genome assemblies. CONCLUSION: Use of PeakMatcher software, which is available publicly under an open-source license, facilitated the release of an updated and validated CRE map, which is available through the NIH GEO. These findings demonstrate that PeakMatcher software will be a useful resource for validation and transferring of previous annotations to updated genome assemblies.


Asunto(s)
Aedes/genética , Elementos Reguladores de la Transcripción , Aedes/embriología , Animales , Genoma de los Insectos , Anotación de Secuencia Molecular
14.
PLoS One ; 15(10): e0240429, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33119626

RESUMEN

Chromosomal inversions can lead to reproductive isolation and adaptation in insects such as Drosophila melanogaster and the non-model malaria vector Anopheles gambiae. Inversions can be detected and characterized using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). To aid in developing such methods, we formed a new benchmark derived from three publicly-available insect data. We then used this benchmark to perform an extended validation of our software for inversion analysis (Asaph). Through that process, we identified and characterized several problematic test cases liable to misinterpretation that can help guide PCA-based inversion detection. Lastly, we re-analyzed the 2R chromosome arm of 150 An. gambiae and coluzzii samples and observed two inversions (2Rc and 2Rd) that were previously known but not annotated in these particular individuals. The resulting benchmark data set and methods will be useful for future inversion detection based solely on SNP data.


Asunto(s)
Anopheles/genética , Cromosomas de Insectos/genética , Biología Computacional/métodos , Drosophila melanogaster/genética , Animales , Inversión Cromosómica , Conjuntos de Datos como Asunto , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Programas Informáticos
15.
Sci Rep ; 10(1): 13455, 2020 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-32778675

RESUMEN

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

16.
Bioinformatics ; 36(19): 4876-4884, 2020 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-32609328

RESUMEN

MOTIVATION: Most amino acids are encoded by multiple synonymous codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. RESULTS: We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the structural core are network-central, and those on the surface are not. Then, we study potential differences between network centralities and thus structural positions of amino acids encoded by conserved rare, non-conserved rare and commonly used codons. We find that in 84% of proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different codon centrality trends, i.e. different relationships between structural positions of the three codon categories. We see several cases of all proteins from our data with some structural or functional property being in the same group. Also, we see a case of all proteins in some group having the same property. Our work shows that codon usage is linked to the final protein structure and thus possibly to co-translational protein folding. AVAILABILITY AND IMPLEMENTATION: https://nd.edu/∼cone/CodonUsage/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Uso de Codones , Pliegue de Proteína , Secuencia de Aminoácidos , Codón/genética , Proteínas/genética
17.
PLoS One ; 15(4): e0232003, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32352987

RESUMEN

Improved computational modeling of protein translation rates, including better prediction of where translational slowdowns along an mRNA sequence may occur, is critical for understanding co-translational folding. Because codons within a synonymous codon group are translated at different rates, many computational translation models rely on analyzing synonymous codons. Some models rely on genome-wide codon usage bias (CUB), believing that globally rare and common codons are the most informative of slow and fast translation, respectively. Others use the CUB observed only in highly expressed genes, which should be under selective pressure to be translated efficiently (and whose CUB may therefore be more indicative of translation rates). No prior work has analyzed these models for their ability to predict translational slowdowns. Here, we evaluate five models for their association with slowly translated positions as denoted by two independent ribosome footprint (RFP) count experiments from S. cerevisiae, because RFP data is often considered as a "ground truth" for translation rates across mRNA sequences. We show that all five considered models strongly associate with the RFP data and therefore have potential for estimating translational slowdowns. However, we also show that there is a weak correlation between RFP counts for the same genes originating from independent experiments, even when their experimental conditions are similar. This raises concerns about the efficacy of using current RFP experimental data for estimating translation rates and highlights a potential advantage of using computational models to understand translation rates instead.


Asunto(s)
Uso de Codones/genética , Biología Computacional/métodos , Biosíntesis de Proteínas/fisiología , Codón/genética , Bases de Datos Genéticas , Modelos Teóricos , Biosíntesis de Proteínas/genética , ARN Mensajero/genética , Ribosomas/genética , Saccharomyces cerevisiae/genética
18.
Methods Protoc ; 3(1)2020 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-32178466

RESUMEN

The advent of next-generation sequencing has allowed for higher-throughput determination of which species live within a specific location. Here we establish that three analysis methods for estimating diversity within samples-namely, Operational Taxonomic Units; the newer Amplicon Sequence Variants; and a method commonly found in sequence analysis, minhash-are affected by various properties of these sequence data. Using simulations we show that the presence of Single Nucleotide Polymorphisms and the depth of coverage from each species affect the correlations between these approaches. Through this analysis, we provide insights which would affect the decisions on the application of each method. Specifically, the presence of sequence read errors and variability in sequence read coverage deferentially affects these processing methods.

19.
BMC Biol ; 18(1): 1, 2020 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-31898513

RESUMEN

BACKGROUND: New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from 'finished'. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. RESULTS: We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. CONCLUSIONS: Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.


Asunto(s)
Anopheles/genética , Evolución Biológica , Cromosomas , Técnicas Genéticas/instrumentación , Genómica/métodos , Sintenía , Animales , Mapeo Cromosómico
20.
IEEE Trans Nanobioscience ; 18(3): 316-323, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31180865

RESUMEN

As a specific type of structural variation, inversions are enjoying particular traction as a result of their established role in evolution. Using third-generation sequencing technology to predict inversions is growing in interest, but many such methods focus on improving sensitivity, giving rise to either too many false positives or very long running times. In this paper, we propose a new framework for inversion detection based on a combination of two novel theoretical models: rectangle clustering and representative rectangle prediction. This combination can automatically filter out false positive inversion predictions while retaining correct ones, leading to a method that has both high sensitivity and high positive prediction values (PPV). Further, this new framework can run very fast on available data. Our software can be freely obtained at https://github.com/UTbioinf/RigInv.


Asunto(s)
Inversión Cromosómica/genética , Genómica/métodos , Modelos Estadísticos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Anopheles/genética , Análisis por Conglomerados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...