RESUMO
Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field.
Assuntos
Processamento Alternativo , Arabidopsis , Animais , Humanos , Camundongos , Transcriptoma , Splicing de RNA , Análise de Sequência de RNA/métodos , RNA , Arabidopsis/genética , Perfilação da Expressão Gênica/métodosRESUMO
BACKGROUND: Autopolyploidy is a valuable model for studying whole-genome duplication (WGD) without hybridization, yet little is known about the genomic structural and functional changes that occur in autopolyploids after WGD. Cyclocarya paliurus (Juglandaceae) is a natural diploid-autotetraploid species. We generated an allele-aware autotetraploid genome, a chimeric chromosome-level diploid genome, and whole-genome resequencing data for 106 autotetraploid individuals at an average depth of 60 × per individual, along with 12 diploid individuals at an average depth of 90 × per individual. RESULTS: Autotetraploid C. paliurus had 64 chromosomes clustered into 16 homologous groups, and the majority of homologous chromosomes demonstrated similar chromosome length, gene numbers, and expression. The regions of synteny, structural variation and nonalignment to the diploid genome accounted for 81.3%, 8.8% and 9.9% of the autotetraploid genome, respectively. Our analyses identified 20,626 genes (69.18%) with four alleles and 9191 genes (30.82%) with one, two, or three alleles, suggesting post-polyploid allelic loss. Genes with allelic loss were found to occur more often in proximity to or within structural variations and exhibited a marked overlap with transposable elements. Additionally, such genes showed a reduced tendency to interact with other genes. We also found 102 genes with more than four copies in the autotetraploid genome, and their expression levels were significantly higher than their diploid counterparts. These genes were enriched in enzymes involved in stress response and plant defense, potentially contributing to the evolutionary success of autotetraploids. Our population genomic analyses suggested a single origin of autotetraploids and recent divergence (~ 0.57 Mya) from diploids, with minimal interploidy admixture. CONCLUSIONS: Our results indicate the potential for genomic and functional reorganization, which may contribute to evolutionary success in autotetraploid C. paliurus.
Assuntos
Duplicação Gênica , Tetraploidia , Humanos , Alelos , Poliploidia , GenômicaRESUMO
Although hybridization plays a large role in speciation, some unknown fraction of hybrid individuals never reproduces, instead remaining as genetic dead-ends. We investigated a morphologically distinct and culturally important Chinese walnut, Juglans hopeiensis, suspected to have arisen from hybridization of Persian walnut (J. regia) with Asian butternuts (J. cathayensis, J. mandshurica, and hybrids between J. cathayensis and J. mandshurica). Based on 151 whole-genome sequences of the relevant taxa, we discovered that all J. hopeiensis individuals are first-generation hybrids, with the time for the onset of gene flow estimated as 370,000 years, implying both strong postzygotic barriers and the presence of J. regia in China by that time. Six inversion regions enriched for genes associated with pollen germination and pollen tube growth may be involved in the postzygotic barriers that prevent sexual reproduction in the hybrids. Despite its long-recurrent origination and distinct traits, J. hopeiensis does not appear on the way to speciation.
Assuntos
Juglans , Fluxo Gênico , Genômica , Humanos , Hibridização Genética , Juglans/genética , ÁrvoresRESUMO
BACKGROUND: Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species. RESULTS: We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS ( https://github.com/CMB-BNU/PLAINS.git ), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing. CONCLUSIONS: Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.
Assuntos
Povo Asiático , Genoma , Humanos , Análise de Sequência de DNA/métodosRESUMO
What kind of genetic variation contributes the most to adaptation is a fundamental question in evolutionary biology. By resequencing genomes of 80 individuals, we inferred the origin of genomic variants associated with a complex adaptive syndrome involving multiple quantitative traits, namely, adaptation between high and low altitudes, in the vinous-throated parrotbill (Sinosuthora webbiana) in Taiwan. By comparing these variants with those in the Asian mainland population, we revealed standing variation in 24 noncoding genomic regions to be the predominant genetic source of adaptation. Parrotbills at both high and low altitudes exhibited signatures of recent selection, suggesting that not only the front but also the trailing edges of postglacial expanding populations could be subjected to environmental stresses. This study verifies and quantifies the importance of standing variation in adaptation in a cohort of genes, illustrating that the evolutionary potential of a population depends significantly on its preexisting genetic diversity. These findings provide important context for understanding adaptation and conservation of species in the Anthropocene.
Assuntos
Adaptação Biológica , Evolução Biológica , Variação Genética , Aves Canoras/genética , Animais , Meio Ambiente , Genética Populacional , Genoma , Genômica/métodos , Polimorfismo de Nucleotídeo Único , RNA não Traduzido , Seleção Genética , TaiwanRESUMO
BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. RESULTS: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. CONCLUSIONS: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB .
Assuntos
Algoritmos , Genômica , Genoma , Metagenômica , Análise de Sequência de DNARESUMO
BACKGROUND: Alternative splicing (AS) is an important mechanism of posttranscriptional modification and dynamically regulates multiple physiological processes in plants, including fruit ripening. However, little is known about alternative splicing during fruit development in fleshy fruits. RESULTS: We studied the alternative splicing at the immature and ripe stages during fruit development in cucumber, melon, papaya and peach. We found that 14.96-17.48% of multiexon genes exhibited alternative splicing. Intron retention was not always the most frequent event, indicating that the alternative splicing pattern during different developmental process differs. Alternative splicing was significantly more prevalent at the ripe stage than at the immature stage in cucumber and melon, while the opposite trend was shown in papaya and peach, implying that developmental stages adopt different alternative splicing strategies for their specific functions. Some genes involved in fruit ripening underwent stage-specific alternative splicing, indicating that alternative splicing regulates fruits ripening. Conserved alternative splicing events did not appear to be stage-specific. Clustering fruit developmental stages across the four species based on alternative splicing profiles resulted in species-specific clustering, suggesting that diversification of alternative splicing contributes to lineage-specific evolution in fleshy fruits. CONCLUSIONS: We obtained high quality transcriptomes and alternative splicing events during fruit development across the four species. Dynamics and nonconserved alternative splicing were discovered. The candidate stage-specific AS genes involved in fruit ripening will provide valuable insight into the roles of alternative splicing during the developmental processes of fleshy fruits.
Assuntos
Frutas , Prunus persica , Processamento Alternativo , Frutas/genética , Regulação da Expressão Gênica de Plantas , Plantas , TranscriptomaRESUMO
BACKGROUND: Alternative splicing (AS) plays a critical regulatory role in modulating transcriptome and proteome diversity. In particular, it increases the functional diversity of proteins. Recent genome-wide analysis of AS using RNA-Seq has revealed that AS is highly pervasive in plants. Furthermore, it has been suggested that most AS events are subject to tissue-specific regulation. DESCRIPTION: To reveal the functional characteristics induced by AS and tissue-specific splicing events, a database for exploring these characteristics is needed, especially in plants. To address these goals, we constructed a database of annotated transcripts generated by alternative splicing in cucumbers (CuAS: http://cmb.bnu.edu.cn/alt_iso/index.php) that integrates genomic annotations, isoform-level functions, isoform-level features, and tissue-specific AS events among multiple tissues. CuAS supports a retrieval system that identifies unique IDs (gene ID, isoform ID, UniProt ID, and gene name), chromosomal positions, and gene families, and a browser for visualization of each gene. CONCLUSION: We believe that CuAS could be helpful for revealing the novel functional characteristics induced by AS and tissue-specific AS events in cucumbers. CuAS is freely available at http://cmb.bnu.edu.cn/alt_iso/index.php.
Assuntos
Processamento Alternativo , Cucumis sativus/genética , Bases de Dados Genéticas , Genes de Plantas , TranscriptomaRESUMO
Bats can perceive the world by using a wide range of sensory systems, and some of the systems have become highly specialized, such as auditory sensory perception. Among bat species, the Old World leaf-nosed bats and horseshoe bats (rhinolophoid bats) possess the most sophisticated echolocation systems. Here, we reported the whole-genome sequencing and de novo assembles of two rhinolophoid bats-the great leaf-nosed bat (Hipposideros armiger) and the Chinese rufous horseshoe bat (Rhinolophus sinicus). Comparative genomic analyses revealed the adaptation of auditory sensory perception in the rhinolophoid bat lineages, probably resulting from the extreme selectivity used in the auditory processing by these bats. Pseudogenization of some vision-related genes in rhinolophoid bats was observed, suggesting that these genes have undergone relaxed natural selection. An extensive contraction of olfactory receptor gene repertoires was observed in the lineage leading to the common ancestor of bats. Further extensive gene contractions can be observed in the branch leading to the rhinolophoid bats. Such concordance suggested that molecular changes at one sensory gene might have direct consequences for genes controlling for other sensory modalities. To characterize the population genetic structure and patterns of evolution, we re-sequenced the genome of 20 great leaf-nosed bats from four different geographical locations of China. The result showed similar sequence diversity values and little differentiation among populations. Moreover, evidence of genetic adaptations to high altitudes in the great leaf-nosed bats was observed. Taken together, our work provided a useful resource for future research on the evolution of bats.
Assuntos
Quirópteros/genética , Ecolocação/fisiologia , Genoma , Adaptação Fisiológica/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Evolução Biológica , China , Hibridização Genômica Comparativa/métodos , Evolução Molecular , Feminino , Filogenia , Seleção GenéticaRESUMO
BACKGROUND: Alternative splicing (AS) is an important post-transcriptional process. It has been suggested that most AS events are subject to tissue-specific regulation. However, the global dynamics of AS in different tissues are poorly explored. RESULTS: To analyse global changes in AS in multiple tissues, we identified the AS events and constructed a comprehensive catalogue of AS events within each tissue based on the genome-wide RNA-seq reads from ten tissues in cucumber. First, we found that 58% of the multi-exon genes underwent AS. We further obtained 565 genes with significantly more AS events compared with random genes. These genes were found significant enrichment in biological processes related to the regulation of actin filament length. Second, significantly different AS event profiles among ten tissues were found. The tissues with the same origin of development are more likely to have a relatively similar AS profile. Moreover, 7370 genes showed tissue-specific AS events and were highly enriched in biological processes related to the positive regulation of cellular component organization. Root-specificity AS genes were related to the cellular response to DNA damage stimulus. Third, the genes with different intron retention (IR) patterns among the ten tissues showed significant difference in GC percentages of the retained intron, and the number of exons and FPKM of the major transcripts. CONCLUSIONS: Our study provided a comprehensive view of AS in multiple tissues. We revealed novel insights into the patterns of AS in multiple tissues and the tissue-specific AS in cucumber.
Assuntos
Processamento Alternativo , Cucumis sativus/genética , Transcriptoma , Cucumis sativus/metabolismo , Perfilação da Expressão GênicaRESUMO
Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.
Assuntos
Núcleo Celular/genética , DNA Ribossômico/genética , Genoma Fúngico , Filogenia , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Dados de Sequência Molecular , Micorrizas/genética , Fases de Leitura Aberta/genética , Esporos Fúngicos/genéticaRESUMO
BACKGROUND: The identification, description and understanding of protein-protein networks are important in cell biology and medicine, especially for the study of system biology where the focus concerns the interaction of biomolecules. Hubs and bottlenecks refer to the important proteins of a protein interaction network. Until now, very little attention has been paid to differentiate these two protein groups. RESULTS: By integrating human protein-protein interaction networks and human genome-wide variations across populations, we described the differences between hubs and bottlenecks in this study. Our findings showed that similar to interspecies, hubs and bottlenecks changed significantly more slowly than non-hubs and non-bottlenecks. To distinguish hubs from bottlenecks, we extracted their special members: hub-non-bottlenecks and non-hub-bottlenecks. The differences between these two groups represent what is between hubs and bottlenecks. We found that the variation rate of hubs was significantly lower than that of bottlenecks. In addition, we verified that stronger constraint is exerted on hubs than on bottlenecks. We further observed fewer non-synonymous sites on the domains of hubs than on those of bottlenecks and different molecular functions between them. CONCLUSIONS: Based on these results, we conclude that in recent human history, different variation patterns exist in hubs and bottlenecks in protein interaction networks. By revealing the difference between hubs and bottlenecks, our results might provide further insights in the relationship between evolution and biological structure.
Assuntos
Mapas de Interação de Proteínas , Proteínas/química , Evolução Molecular , Variação Genética , Genoma Humano , Humanos , Domínios Proteicos , Mapeamento de Interação de Proteínas , Proteínas/genéticaRESUMO
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Assuntos
Polimorfismo de Nucleotídeo Único , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Mapeamento Cromossômico , Evolução Molecular , Frequência do Gene , Humanos , Fases de Leitura Aberta , Domínios Proteicos , Seleção GenéticaRESUMO
Anthocyanin is the main pigment forming floral diversity. Several transcription factors that regulate the expression of anthocyanin biosynthetic genes belong to the R2R3-MYB family. Here we examined the transcriptomes of inflorescence buds of Scutellaria species (skullcaps), identified the expression R2R3-MYBs, and detected the genetic signatures of positive selection for adaptive divergence across the rapidly evolving skullcaps. In the inflorescence buds, seven R2R3-MYBs were identified. MYB11 and MYB16 were detected to be positively selected. The signature of positive selection on MYB genes indicated that species diversification could be affected by transcriptional regulation, rather than at the translational level. When comparing among the background lineages of Arabidopsis, tomato, rice, and Amborella, heterogeneous evolutionary rates were detected among MYB paralogs, especially between MYB13 and MYB19. Significantly different evolutionary rates were also evidenced by type-I functional divergence between MYB13 and MYB19, and the accelerated evolutionary rates in MYB19, implied the acquisition of novel functions. Another paralogous pair, MYB2/7 and MYB11, revealed significant radical amino acid changes, indicating divergence in the regulation of different anthocyanin-biosynthetic enzymes. Our findings not only showed that Scutellaria R2R3-MYBs are functionally divergent and positively selected, but also indicated the adaptive relevance of regulatory genes in floral diversification.
Assuntos
Regulação da Expressão Gênica de Plantas , Genes de Plantas , Proteínas de Plantas/genética , Scutellaria/genética , Fatores de Transcrição/genética , Sequência de Aminoácidos , Evolução Molecular , Inflorescência/genética , Inflorescência/metabolismo , Dados de Sequência Molecular , Filogenia , Proteínas de Plantas/metabolismo , Scutellaria/classificação , Scutellaria/metabolismo , Seleção Genética , Alinhamento de Sequência , Fatores de Transcrição/metabolismoRESUMO
The mechanisms underlying the organization and evolution of the telencephalic pallium are not yet clear.. To address this issue, we first performed comparative analysis of genes critical for the development of the pallium (Emx1/2 and Pax6) and subpallium (Dlx2 and Nkx1/2) among 500 vertebrate species. We found that these genes have no obvious variations in chromosomal duplication/loss, gene locus synteny or Darwinian selection. However, there is an additional fragment of approximately 20 amino acids in mammalian Emx1 and a poly-(Ala)6-7 in Emx2. Lentiviruses expressing mouse or chick Emx2 (m-Emx2 or c-Emx2 Lv) were injected into the ventricle of the chick telencephalon at embryonic Day 3 (E3), and the embryos were allowed to develop to E12-14 or to posthatchling. After transfection with m-Emx2 Lv, the cells expressing Reelin, Vimentin or GABA increased, and neurogenesis of calbindin cells changed towards the mammalian inside-out pattern in the dorsal pallium and mesopallium. In addition, a behavior test for posthatched chicks indicated that the passive avoidance ratio increased significantly. The study suggests that the acquisition of an additional fragment in mammalian Emx2 is associated with the organization and evolution of the mammalian pallium.
Assuntos
Córtex Cerebral , Telencéfalo , Camundongos , Animais , Telencéfalo/metabolismo , Córtex Cerebral/metabolismo , Encéfalo/metabolismo , Mamíferos/metabolismo , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Regulação da Expressão Gênica no DesenvolvimentoRESUMO
Ferroptosis is an iron-dependent cell death that occurs due to the peroxidation of phospholipids in the cell membrane. In this study, we find that the protein level of NSUN2 is significantly decreased in hepatocyte ferroptosis. This is attributed to STUB1-mediated ubiquitination of NSUN2 at lysines 457 and 654, promoting NSUN2 degradation in ferroptosis. Selenoprotein glutathione peroxidase 4 (GPX4) is a prominent suppressor of ferroptosis. We find that downregulation of NSUN2 diminishes m5C methylation of Gpx4 mRNA 3' UTR. The reduction of NSUN2-mediated Gpx4 mRNA m5C methylation abrogates the interaction between SBP2 and the selenocysteine insertion sequence (SECIS) and leads to inhibition of GPX4 protein expression. Lower GPX4 expression promotes hepatocyte ferroptosis in vivo and in vitro, which is reversed by restoration of NSUN2. These findings shed light on the mechanism of NSUN2 degradation and also indicate that the STUB1-NSUN2-GPX4 axis plays a regulatory role in hepatocyte ferroptosis.
RESUMO
Cis-regulatory elements regulate gene expression and play an essential role in the development and physiology of organisms. Many conserved non-coding sequences (CNSs) function as cis-regulatory elements. They control the development of various lineages. However, predicting clade-wide cis-regulatory elements across several closely related species remains challenging. Based on the relationship between CNSs and cis-regulatory elements, we present a computational approach that predicts the clade-wide putative cis-regulatory elements in 12 Cucurbitaceae genomes. Using 12-way whole-genome alignment, we first obtained 632 112 CNSs in Cucurbitaceae. Next, we identified 16 552 Cucurbitaceae-wide cis-regulatory elements based on collinearity among all 12 Cucurbitaceae plants. Furthermore, we predicted 3 271 potential regulatory pairs in the cucumber genome, of which 98 were verified using integrative RNA sequencing and ChIP sequencing datasets from samples collected during various fruit development stages. The CNSs, Cucurbitaceae-wide cis-regulatory elements, and their target genes are accessible at http://cmb.bnu.edu.cn/cisRCNEs_cucurbit/. These elements are valuable resources for functionally annotating CNSs and their regulatory roles in Cucurbitaceae genomes.
RESUMO
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria × ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramér's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.
Assuntos
Algoritmos , Ploidias , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma , Alelos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , SoftwareRESUMO
Alternative splicing is crucial for a wide range of biological processes. However, limited by the availability of reference genomes, genome-wide patterns of alternative splicing remain unknown in most nonmodel organisms. We present an attention-based convolutional neural network model, DeepASmRNA, for predicting alternative splicing events using only transcriptomic data. DeepASmRNA consists of two parts: identification of alternatively spliced transcripts and classification of alternative splicing events, which outperformed the state-of-the-art method, AStrap, and other deep learning models. Then, we utilize transfer learning to increase the performance in species with limited training data and use an interpretation method to decipher splicing codes. Finally, applying Amborella, DeepASmRNA can identify more AS events than AStrap while maintaining the same level of precision, suggesting that DeepASmRNA has superior sensitivity to identify alternative splicing events. In summary, DeepASmRNA is scalable and interpretable for detecting genome-wide patterns of alternative splicing in species without a reference genome.