Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 18(3): e1009815, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35255079

RESUMO

Many fungal species utilize hydroxyderivatives of benzene and benzoic acid as carbon sources. The yeast Candida parapsilosis metabolizes these compounds via the 3-oxoadipate and gentisate pathways, whose components are encoded by two metabolic gene clusters. In this study, we determine the chromosome level assembly of the C. parapsilosis strain CLIB214 and use it for transcriptomic and proteomic investigation of cells cultivated on hydroxyaromatic substrates. We demonstrate that the genes coding for enzymes and plasma membrane transporters involved in the 3-oxoadipate and gentisate pathways are highly upregulated and their expression is controlled in a substrate-specific manner. However, regulatory proteins involved in this process are not known. Using the knockout mutants, we show that putative transcriptional factors encoded by the genes OTF1 and GTF1 located within these gene clusters function as transcriptional activators of the 3-oxoadipate and gentisate pathway, respectively. We also show that the activation of both pathways is accompanied by upregulation of genes for the enzymes involved in ß-oxidation of fatty acids, glyoxylate cycle, amino acid metabolism, and peroxisome biogenesis. Transcriptome and proteome profiles of the cells grown on 4-hydroxybenzoate and 3-hydroxybenzoate, which are metabolized via the 3-oxoadipate and gentisate pathway, respectively, reflect their different connection to central metabolism. Yet we find that the expression profiles differ also in the cells assimilating 4-hydroxybenzoate and hydroquinone, which are both metabolized in the same pathway. This finding is consistent with the phenotype of the Otf1p-lacking mutant, which exhibits impaired growth on hydroxybenzoates, but still utilizes hydroxybenzenes, thus indicating that additional, yet unidentified transcription factor could be involved in the 3-oxoadipate pathway regulation. Moreover, we propose that bicarbonate ions resulting from decarboxylation of hydroxybenzoates also contribute to differences in the cell responses to hydroxybenzoates and hydroxybenzenes. Finally, our phylogenetic analysis highlights evolutionary paths leading to metabolic adaptations of yeast cells assimilating hydroxyaromatic substrates.


Assuntos
Candida parapsilosis , Gentisatos , Candida parapsilosis/metabolismo , Carbono , Gentisatos/metabolismo , Hidroxibenzoatos/metabolismo , Filogenia , Proteoma/genética , Proteômica , Saccharomyces cerevisiae/metabolismo , Transcriptoma/genética
2.
Bioinformatics ; 39(39 Suppl 1): i288-i296, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387134

RESUMO

MOTIVATION: The analysis of bacterial isolates to detect plasmids is important due to their role in the propagation of antimicrobial resistance. In short-read sequence assemblies, both plasmids and bacterial chromosomes are typically split into several contigs of various lengths, making identification of plasmids a challenging problem. In plasmid contig binning, the goal is to distinguish short-read assembly contigs based on their origin into plasmid and chromosomal contigs and subsequently sort plasmid contigs into bins, each bin corresponding to a single plasmid. Previous works on this problem consist of de novo approaches and reference-based approaches. De novo methods rely on contig features such as length, circularity, read coverage, or GC content. Reference-based approaches compare contigs to databases of known plasmids or plasmid markers from finished bacterial genomes. RESULTS: Recent developments suggest that leveraging information contained in the assembly graph improves the accuracy of plasmid binning. We present PlasBin-flow, a hybrid method that defines contig bins as subgraphs of the assembly graph. PlasBin-flow identifies such plasmid subgraphs through a mixed integer linear programming model that relies on the concept of network flow to account for sequencing coverage, while also accounting for the presence of plasmid genes and the GC content that often distinguishes plasmids from chromosomes. We demonstrate the performance of PlasBin-flow on a real dataset of bacterial samples. AVAILABILITY AND IMPLEMENTATION: https://github.com/cchauve/PlasBin-flow.


Assuntos
Algoritmos , Genoma Bacteriano , Plasmídeos/genética , Movimento Celular , Bases de Dados Factuais
3.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37326967

RESUMO

MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.


Assuntos
Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma , Algoritmos , Repetições de Microssatélites , Análise de Sequência de DNA
4.
Bioinformatics ; 38(Suppl 1): i203-i211, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758770

RESUMO

MOTIVATION: Genome annotations are a common way to represent genomic features such as genes, regulatory elements or epigenetic modifications. The amount of overlap between two annotations is often used to ascertain if there is an underlying biological connection between them. In order to distinguish between true biological association and overlap by pure chance, a robust measure of significance is required. One common way to do this is to determine if the number of intervals in the reference annotation that intersect the query annotation is statistically significant. However, currently employed statistical frameworks are often either inefficient or inaccurate when computing P-values on the scale of the whole human genome. RESULTS: We show that finding the P-values under the typically used 'gold' null hypothesis is NP-hard. This motivates us to reformulate the null hypothesis using Markov chains. To be able to measure the fidelity of our Markovian null hypothesis, we develop a fast direct sampling algorithm to estimate the P-value under the gold null hypothesis. We then present an open-source software tool MCDP that computes the P-values under the Markovian null hypothesis in O(m2+n) time and O(m) memory, where m and n are the numbers of intervals in the reference and query annotations, respectively. Notably, MCDP runtime and memory usage are independent from the genome length, allowing it to outperform previous approaches in runtime and memory usage by orders of magnitude on human genome annotations, while maintaining the same level of accuracy. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/fmfi-compbio/mc-overlaps. All data for reproducibility are available at https://github.com/fmfi-compbio/mc-overlaps-reproducibility. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Software , Ouro , Humanos , Cadeias de Markov , Reprodutibilidade dos Testes
5.
BMC Bioinformatics ; 23(1): 551, 2022 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-36536300

RESUMO

BACKGROUND: The genomes of SARS-CoV-2 are classified into variants, some of which are monitored as variants of concern (e.g. the Delta variant B.1.617.2 or Omicron variant B.1.1.529). Proportions of these variants circulating in a human population are typically estimated by large-scale sequencing of individual patient samples. Sequencing a mixture of SARS-CoV-2 RNA molecules from wastewater provides a cost-effective alternative, but requires methods for estimating variant proportions in a mixed sample. RESULTS: We propose a new method based on a probabilistic model of sequencing reads, capturing sequence diversity present within individual variants, as well as sequencing errors. The algorithm is implemented in an open source Python program called VirPool. We evaluate the accuracy of VirPool on several simulated and real sequencing data sets from both Illumina and nanopore sequencing platforms, including wastewater samples from Austria and France monitoring the onset of the Alpha variant. CONCLUSIONS: VirPool is a versatile tool for wastewater and other mixed-sample analysis that can handle both short- and long-read sequencing data. Our approach does not require pre-selection of characteristic mutations for variant profiles, it is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read.


Assuntos
COVID-19 , SARS-CoV-2 , Águas Residuárias , Humanos , RNA Viral , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Águas Residuárias/virologia
6.
Bioinformatics ; 37(24): 4661-4667, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34314502

RESUMO

MOTIVATION: MinION is a portable nanopore sequencing device that can be easily operated in the field with features including monitoring of run progress and selective sequencing. To fully exploit these features, real-time base calling is required. Up to date, this has only been achieved at the cost of high computing requirements that pose limitations in terms of hardware availability in common laptops and energy consumption. RESULTS: We developed a new base caller DeepNano-coral for nanopore sequencing, which is optimized to run on the Coral Edge Tensor Processing Unit, a small USB-attached hardware accelerator. To achieve this goal, we have designed new versions of two key components used in convolutional neural networks for speech recognition and base calling. In our components, we propose a new way of factorization of a full convolution into smaller operations, which decreases memory access operations, memory access being a bottleneck on this device. DeepNano-coral achieves real-time base calling during sequencing with the accuracy slightly better than the fast mode of the Guppy base caller and is extremely energy efficient, using only 10 W of power. AVAILABILITY AND IMPLEMENTATION: https://github.com/fmfi-compbio/coral-basecaller. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Redes Neurais de Computação
7.
Bioinformatics ; 36(14): 4191-4192, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32374816

RESUMO

MOTIVATION: Oxford Nanopore MinION is a portable DNA sequencer that is marketed as a device that can be deployed anywhere. Current base callers, however, require a powerful GPU to analyze data produced by MinION in real time, which hampers field applications. RESULTS: We have developed a fast base caller DeepNano-blitz that can analyze stream from up to two MinION runs in real time using a common laptop CPU (i7-7700HQ), with no GPU requirements. The base caller settings allow trading accuracy for speed and the results can be used for real time run monitoring (i.e. sample composition, barcode balance, species identification, etc.) or prefiltering of results for more detailed analysis (i.e. filtering out human DNA from human-pathogen runs). AVAILABILITY AND IMPLEMENTATION: DeepNano-blitz has been developed and tested on Linux and Intel processors and is available under MIT license at https://github.com/fmfi-compbio/deepnano-blitz. CONTACT: vladimir.boza@fmph.uniba.sk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Nanoporos , DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
8.
Virus Genes ; 57(6): 556-560, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34448987

RESUMO

SARS-CoV-2 mutants carrying the ∆H69/∆V70 deletion in the amino-terminal domain of the Spike protein emerged independently in at least six lineages of the virus (namely, B.1.1.7, B.1.1.298, B.1.160, B.1.177, B.1.258, B.1.375). We analyzed SARS-CoV-2 samples collected from various regions of Slovakia between November and December 2020 that were presumed to contain B.1.1.7 variant due to drop-out of the Spike gene target in an RT-qPCR test caused by this deletion. Sequencing of these samples revealed that although in some cases the samples were indeed confirmed as B.1.1.7, a substantial fraction of samples contained another ∆H69/∆V70 carrying mutant belonging to the lineage B.1.258, which has been circulating in Central Europe since August 2020, long before the import of B.1.1.7. Phylogenetic analysis shows that the early sublineage of B.1.258 acquired the N439K substitution in the receptor-binding domain (RBD) of the Spike protein and, later on, also the deletion ∆H69/∆V70 in the Spike N-terminal domain (NTD). This variant was particularly common in several European countries including the Czech Republic and Slovakia but has been quickly replaced by B.1.1.7 early in 2021.


Assuntos
COVID-19/epidemiologia , COVID-19/virologia , Filogenia , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Deleção de Sequência , Glicoproteína da Espícula de Coronavírus/genética , Europa (Continente)/epidemiologia , Humanos , SARS-CoV-2/classificação , Fatores de Tempo
9.
Bioinformatics ; 35(8): 1310-1317, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30203023

RESUMO

MOTIVATION: Short tandem repeats (STRs) are stretches of repetitive DNA in which short sequences, typically made of 2-6 nucleotides, are repeated several times. Since STRs have many important biological roles and also belong to the most polymorphic parts of the human genome, they became utilized in several molecular-genetic applications. Precise genotyping of STR alleles, therefore, was of high relevance during the last decades. Despite this, massively parallel sequencing (MPS) still lacks the analysis methods to fully utilize the information value of STRs in genome scale assays. RESULTS: We propose an alignment-free algorithm, called Dante, for genotyping and characterization of STR alleles at user-specified known loci based on sequence reads originating from STR loci of interest. The method accounts for natural deviations from the expected sequence, such as variation in the repeat count, sequencing errors, ambiguous bases and complex loci containing several different motifs. In addition, we implemented a correction for copy number defects caused by the polymerase induced stutter effect as well as a prediction of STR expansions that, according to the conventional view, cannot be fully captured by inherently short MPS reads. We tested Dante on simulated datasets and on datasets obtained by targeted sequencing of protein coding parts of thousands of selected clinically relevant genes. In both these datasets, Dante outperformed HipSTR and GATK genotyping tools. Furthermore, Dante was able to predict allele expansions in all tested clinical cases. AVAILABILITY AND IMPLEMENTATION: Dante is open source software, freely available for download at https://github.com/jbudis/dante. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Repetições de Microssatélites , Alelos , Genótipo , Humanos , Análise de Sequência de DNA
10.
Proc Natl Acad Sci U S A ; 111(16): 5926-31, 2014 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-24711422

RESUMO

Programmed translational bypassing is a process whereby ribosomes "ignore" a substantial interval of mRNA sequence. Although discovered 25 y ago, the only experimentally confirmed example of this puzzling phenomenon is expression of the bacteriophage T4 gene 60. Bypassing requires translational blockage at a "takeoff codon" immediately upstream of a stop codon followed by a hairpin, which causes peptidyl-tRNA dissociation and reassociation with a matching "landing triplet" 50 nt downstream, where translation resumes. Here, we report 81 translational bypassing elements (byps) in mitochondria of the yeast Magnusiomyces capitatus and demonstrate in three cases, by transcript analysis and proteomics, that byps are retained in mitochondrial mRNAs but not translated. Although mitochondrial byps resemble the bypass sequence in the T4 gene 60, they utilize unused codons instead of stops for translational blockage and have relaxed matching rules for takeoff/landing sites. We detected byp-like sequences also in mtDNAs of several Saccharomycetales, indicating that byps are mobile genetic elements. These byp-like sequences lack bypassing activity and are tolerated when inserted in-frame in variable protein regions. We hypothesize that byp-like elements have the potential to contribute to evolutionary diversification of proteins by adding new domains that allow exploration of new structures and functions.


Assuntos
Mitocôndrias/genética , Biossíntese de Proteínas/genética , Leveduras/genética , Carbono/farmacologia , DNA Mitocondrial/metabolismo , Fermentação/efeitos dos fármacos , Fermentação/genética , Genes Fúngicos/genética , Genes Mitocondriais/genética , Dados de Sequência Molecular , Mutagênese Insercional/genética , Fases de Leitura Aberta/genética , Filogenia , Processamento Pós-Transcricional do RNA/efeitos dos fármacos , Processamento Pós-Transcricional do RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Leveduras/efeitos dos fármacos , Leveduras/crescimento & desenvolvimento
11.
BMC Bioinformatics ; 17(1): 216, 2016 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-27188396

RESUMO

BACKGROUND: In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. RESULTS: We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. CONCLUSIONS: We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .


Assuntos
Motivos de Nucleotídeos , RNA/química , Análise de Sequência de RNA/métodos , Algoritmos , Entropia , Humanos
12.
Curr Genet ; 60(1): 49-59, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24071901

RESUMO

Jaminaea angkorensis is an anamorphic basidiomycetous yeast species originally isolated from decaying leaves in Cambodia. Taxonomically, J. angkorensis is affiliated with Microstromatales (Exobasidiomycetes, Ustilaginomycotina, Basidiomycota) and represents a basal phylogenetic lineage of this fungal order. To perform a comparative analysis of J. angkorensis with other basidiomycetes, we determined and analyzed its complete mitochondrial DNA sequence. The mitochondrial genome is represented by 29,999 base pairs long, circular DNA containing 32 % guanine and cytosine residues. Its genetic organization is relatively compact and comprises typical genes for 15 conserved proteins involved in oxidative phosphorylation (atp6, 8, and 9; cob; cox1, 2, and 3; and nad1, 2, 3, 4, 4L, 5, and 6) and translation (rps3), two ribosomal RNAs (rnl and rns) and twenty-two transfer RNAs (trnA-Y). Although the gene content is similar to other basidiomycetes, the gene orders in the examined species exhibit only a limited synteny, reflecting their phylogenetic distances and extensive genome rearrangements. In addition, a comparative analysis of basidiomycete mitochondrial genomes indicates that stop-to-tryptophan reassignment of the UGA codon was accompanied by structural alterations of tRNA-Trp(CCA). These results provide an insight into the evolution of the genetic code in fungal mitochondria.


Assuntos
Basidiomycota/genética , Genes Fúngicos , Genoma Mitocondrial , Códon , Ordem dos Genes , Genes de RNAr , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia
13.
Front Bioinform ; 4: 1391086, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39011297

RESUMO

We generalize a problem of finding maximum-scoring segment sets, previously studied by Csurös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139-150), from sequences to graphs. Namely, given a vertex-weighted graph G and a non-negative startup penalty c, we can find a set of vertex-disjoint paths in G with maximum total score when each path's score is its vertices' total weight minus c. We call this new problem maximum-scoring path sets (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.

14.
DNA Res ; 31(3)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38686638

RESUMO

Lodderomyces beijingensis is an ascosporic ascomycetous yeast. In contrast to related species Lodderomyces elongisporus, which is a recently emerging human pathogen, L. beijingensis is associated with insects. To provide an insight into its genetic makeup, we investigated the genome of its type strain, CBS 14171. We demonstrate that this yeast is diploid and describe the high contiguity nuclear genome assembly consisting of eight chromosome-sized contigs with a total size of about 15.1 Mbp. We find that the genome sequence contains multiple copies of the mating type loci and codes for essential components of the mating pheromone response pathway, however, the missing orthologs of several genes involved in the meiotic program raise questions about the mode of sexual reproduction. We also show that L. beijingensis genome codes for the 3-oxoadipate pathway enzymes, which allow the assimilation of protocatechuate. In contrast, the GAL gene cluster underwent a decay resulting in an inability of L. beijingensis to utilize galactose. Moreover, we find that the 56.5 kbp long mitochondrial DNA is structurally similar to known linear mitochondrial genomes terminating on both sides with covalently closed single-stranded hairpins. Finally, we discovered a new double-stranded RNA mycovirus from the Totiviridae family and characterized its genome sequence.


Assuntos
Cromossomos Fúngicos , Genes Fúngicos Tipo Acasalamento , Genoma Fúngico , Cromossomos Fúngicos/genética , Saccharomycetales/genética , Saccharomycetales/metabolismo
15.
Nucleic Acids Res ; 39(10): 4202-19, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21266473

RESUMO

Mitochondrial genome diversity in closely related species provides an excellent platform for investigation of chromosome architecture and its evolution by means of comparative genomics. In this study, we determined the complete mitochondrial DNA sequences of eight Candida species and analyzed their molecular architectures. Our survey revealed a puzzling variability of genome architecture, including circular- and linear-mapping and multipartite linear forms. We propose that the arrangement of large inverted repeats identified in these genomes plays a crucial role in alterations of their molecular architectures. In specific arrangements, the inverted repeats appear to function as resolution elements, allowing genome conversion among different topologies, eventually leading to genome fragmentation into multiple linear DNA molecules. We suggest that molecular transactions generating linear mitochondrial DNA molecules with defined telomeric structures may parallel the evolutionary emergence of linear chromosomes and multipartite genomes in general and may provide clues for the origin of telomeres and pathways implicated in their maintenance.


Assuntos
Candida/genética , Cromossomos Fúngicos , DNA Mitocondrial/química , Evolução Molecular , Genoma Fúngico , Genoma Mitocondrial , Sequência de Bases , Candida/classificação , Mapeamento Cromossômico , Eletroforese em Gel de Campo Pulsado , Ordem dos Genes , Sequências Repetidas Invertidas , Dados de Sequência Molecular , Filogenia
16.
bioRxiv ; 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38045397

RESUMO

An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes, conserved elements, and epigenetic modifications. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing two random unrelated annotations. Previous approaches to this problem remain too slow or inaccurate. To incorporate more background information into such analyses and avoid biased results, we propose a new null model based on a Markov chain which differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or sequencing gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistics and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. The use of genomic contexts to correct for GC-bias also resulted in the reversal of some previously published findings. Availability: The software is freely available at https://github.com/fmfi-compbio/mcdp2 under the MIT licence. All data for reproducibility are available at https://github.com/fmfi-compbio/mcdp2-reproducibility.

17.
Front Microbiol ; 14: 1267695, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37869681

RESUMO

Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.

18.
Microbiol Resour Announc ; 12(3): e0000523, 2023 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-36840572

RESUMO

Candida verbasci is an anamorphic ascomycetous yeast. We report the genome sequence of its type strain, 11-1055 (CBS 12699). The nuclear genome assembly consists of seven chromosome-sized contigs with a total size of 12.1 Mbp and has a relatively low G+C content (28.1%).

19.
BMC Genomics ; 13: 382, 2012 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-22876864

RESUMO

BACKGROUND: The fungus Marssonina brunnea is a causal pathogen of Marssonina leaf spot that devastates poplar plantations by defoliating susceptible trees before normal fall leaf drop. RESULTS: We sequence the genome of M. brunnea with a size of 52 Mb assembled into 89 scaffolds, representing the first sequenced Dermateaceae genome. By inoculating this fungus onto a poplar hybrid clone, we investigate how M. brunnea interacts and co-evolves with its host to colonize poplar leaves. While a handful of virulence genes in M. brunnea, mostly from the LysM family, are detected to up-regulate during infection, the poplar down-regulates its resistance genes, such as nucleotide binding site domains and leucine rich repeats, in response to infection. From 10,027 predicted proteins of M. brunnea in a comparison with those from poplar, we identify four poplar transferases that stimulate the host to resist M. brunnea. These transferas-encoding genes may have driven the co-evolution of M. brunnea and Populus during the process of infection and anti-infection. CONCLUSIONS: Our results from the draft sequence of the M. brunnea genome provide evidence for genome-genome interactions that play an important role in poplar-pathogen co-evolution. This knowledge could help to design effective strategies for controlling Marssonina leaf spot in poplar.


Assuntos
Ascomicetos/genética , Evolução Biológica , Genoma Fúngico , Interações Hospedeiro-Patógeno , Populus/microbiologia , Ascomicetos/patogenicidade , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Filogenia , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Populus/genética , RNA Fúngico/genética , Análise de Sequência de DNA
20.
Genome Res ; 19(12): 2324-33, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19767417

RESUMO

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.


Assuntos
Clonagem Molecular/métodos , Biologia Computacional/métodos , DNA Complementar/genética , Biblioteca Gênica , Genes/genética , Mamíferos/genética , Animais , DNA/biossíntese , Humanos , Camundongos , National Institutes of Health (U.S.) , Ratos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA