Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37833843

RESUMEN

Alternative splicing (AS) is an essential post-transcriptional mechanism that regulates many biological processes. However, identifying comprehensive types of AS events without guidance from a reference genome is still a challenge. Here, we proposed a novel method, MkcDBGAS, to identify all seven types of AS events using transcriptome alone, without a reference genome. MkcDBGAS, modeled by full-length transcripts of human and Arabidopsis thaliana, consists of three modules. In the first module, MkcDBGAS, for the first time, uses a colored de Bruijn graph with dynamic- and mixed- kmers to identify bubbles generated by AS with precision higher than 98.17% and detect AS types overlooked by other tools. In the second module, to further classify types of AS, MkcDBGAS added the motifs of exons to construct the feature matrix followed by the XGBoost-based classifier with the accuracy of classification greater than 93.40%, which outperformed other widely used machine learning models and the state-of-the-art methods. Highly scalable, MkcDBGAS performed well when applied to Iso-Seq data of Amborella and transcriptome of mouse. In the third module, MkcDBGAS provides the analysis of differential splicing across multiple biological conditions when RNA-sequencing data is available. MkcDBGAS is the first accurate and scalable method for detecting all seven types of AS events using the transcriptome alone, which will greatly empower the studies of AS in a wider field.


Asunto(s)
Empalme Alternativo , Arabidopsis , Animales , Humanos , Ratones , Transcriptoma , Empalme del ARN , Análisis de Secuencia de ARN/métodos , ARN , Arabidopsis/genética , Perfilación de la Expresión Génica/métodos
2.
BMC Biol ; 21(1): 168, 2023 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-37553642

RESUMEN

BACKGROUND: Autopolyploidy is a valuable model for studying whole-genome duplication (WGD) without hybridization, yet little is known about the genomic structural and functional changes that occur in autopolyploids after WGD. Cyclocarya paliurus (Juglandaceae) is a natural diploid-autotetraploid species. We generated an allele-aware autotetraploid genome, a chimeric chromosome-level diploid genome, and whole-genome resequencing data for 106 autotetraploid individuals at an average depth of 60 × per individual, along with 12 diploid individuals at an average depth of 90 × per individual. RESULTS: Autotetraploid C. paliurus had 64 chromosomes clustered into 16 homologous groups, and the majority of homologous chromosomes demonstrated similar chromosome length, gene numbers, and expression. The regions of synteny, structural variation and nonalignment to the diploid genome accounted for 81.3%, 8.8% and 9.9% of the autotetraploid genome, respectively. Our analyses identified 20,626 genes (69.18%) with four alleles and 9191 genes (30.82%) with one, two, or three alleles, suggesting post-polyploid allelic loss. Genes with allelic loss were found to occur more often in proximity to or within structural variations and exhibited a marked overlap with transposable elements. Additionally, such genes showed a reduced tendency to interact with other genes. We also found 102 genes with more than four copies in the autotetraploid genome, and their expression levels were significantly higher than their diploid counterparts. These genes were enriched in enzymes involved in stress response and plant defense, potentially contributing to the evolutionary success of autotetraploids. Our population genomic analyses suggested a single origin of autotetraploids and recent divergence (~ 0.57 Mya) from diploids, with minimal interploidy admixture. CONCLUSIONS: Our results indicate the potential for genomic and functional reorganization, which may contribute to evolutionary success in autotetraploid C. paliurus.


Asunto(s)
Duplicación de Gen , Tetraploidía , Humanos , Alelos , Poliploidía , Genómica
3.
Mol Biol Evol ; 39(1)2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34687315

RESUMEN

Although hybridization plays a large role in speciation, some unknown fraction of hybrid individuals never reproduces, instead remaining as genetic dead-ends. We investigated a morphologically distinct and culturally important Chinese walnut, Juglans hopeiensis, suspected to have arisen from hybridization of Persian walnut (J. regia) with Asian butternuts (J. cathayensis, J. mandshurica, and hybrids between J. cathayensis and J. mandshurica). Based on 151 whole-genome sequences of the relevant taxa, we discovered that all J. hopeiensis individuals are first-generation hybrids, with the time for the onset of gene flow estimated as 370,000 years, implying both strong postzygotic barriers and the presence of J. regia in China by that time. Six inversion regions enriched for genes associated with pollen germination and pollen tube growth may be involved in the postzygotic barriers that prevent sexual reproduction in the hybrids. Despite its long-recurrent origination and distinct traits, J. hopeiensis does not appear on the way to speciation.


Asunto(s)
Juglans , Flujo Génico , Genómica , Humanos , Hibridación Genética , Juglans/genética , Árboles
4.
BMC Genomics ; 23(1): 732, 2022 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-36307757

RESUMEN

BACKGROUND: Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species. RESULTS: We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS ( https://github.com/CMB-BNU/PLAINS.git ), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing. CONCLUSIONS: Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.


Asunto(s)
Pueblo Asiatico , Genoma , Humanos , Análisis de Secuencia de ADN/métodos
5.
Proc Natl Acad Sci U S A ; 116(6): 2152-2157, 2019 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-30659151

RESUMEN

What kind of genetic variation contributes the most to adaptation is a fundamental question in evolutionary biology. By resequencing genomes of 80 individuals, we inferred the origin of genomic variants associated with a complex adaptive syndrome involving multiple quantitative traits, namely, adaptation between high and low altitudes, in the vinous-throated parrotbill (Sinosuthora webbiana) in Taiwan. By comparing these variants with those in the Asian mainland population, we revealed standing variation in 24 noncoding genomic regions to be the predominant genetic source of adaptation. Parrotbills at both high and low altitudes exhibited signatures of recent selection, suggesting that not only the front but also the trailing edges of postglacial expanding populations could be subjected to environmental stresses. This study verifies and quantifies the importance of standing variation in adaptation in a cohort of genes, illustrating that the evolutionary potential of a population depends significantly on its preexisting genetic diversity. These findings provide important context for understanding adaptation and conservation of species in the Anthropocene.


Asunto(s)
Adaptación Biológica , Evolución Biológica , Variación Genética , Pájaros Cantores/genética , Animales , Ambiente , Genética de Población , Genoma , Genómica/métodos , Polimorfismo de Nucleótido Simple , ARN no Traducido , Selección Genética , Taiwán
6.
BMC Bioinformatics ; 22(1): 282, 2021 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-34044757

RESUMEN

BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. RESULTS: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. CONCLUSIONS: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB .


Asunto(s)
Algoritmos , Genómica , Genoma , Metagenómica , Análisis de Secuencia de ADN
7.
BMC Genomics ; 22(1): 762, 2021 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-34702184

RESUMEN

BACKGROUND: Alternative splicing (AS) is an important mechanism of posttranscriptional modification and dynamically regulates multiple physiological processes in plants, including fruit ripening. However, little is known about alternative splicing during fruit development in fleshy fruits. RESULTS: We studied the alternative splicing at the immature and ripe stages during fruit development in cucumber, melon, papaya and peach. We found that 14.96-17.48% of multiexon genes exhibited alternative splicing. Intron retention was not always the most frequent event, indicating that the alternative splicing pattern during different developmental process differs. Alternative splicing was significantly more prevalent at the ripe stage than at the immature stage in cucumber and melon, while the opposite trend was shown in papaya and peach, implying that developmental stages adopt different alternative splicing strategies for their specific functions. Some genes involved in fruit ripening underwent stage-specific alternative splicing, indicating that alternative splicing regulates fruits ripening. Conserved alternative splicing events did not appear to be stage-specific. Clustering fruit developmental stages across the four species based on alternative splicing profiles resulted in species-specific clustering, suggesting that diversification of alternative splicing contributes to lineage-specific evolution in fleshy fruits. CONCLUSIONS: We obtained high quality transcriptomes and alternative splicing events during fruit development across the four species. Dynamics and nonconserved alternative splicing were discovered. The candidate stage-specific AS genes involved in fruit ripening will provide valuable insight into the roles of alternative splicing during the developmental processes of fleshy fruits.


Asunto(s)
Frutas , Prunus persica , Empalme Alternativo , Frutas/genética , Regulación de la Expresión Génica de las Plantas , Plantas , Transcriptoma
8.
BMC Plant Biol ; 20(1): 119, 2020 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-32183712

RESUMEN

BACKGROUND: Alternative splicing (AS) plays a critical regulatory role in modulating transcriptome and proteome diversity. In particular, it increases the functional diversity of proteins. Recent genome-wide analysis of AS using RNA-Seq has revealed that AS is highly pervasive in plants. Furthermore, it has been suggested that most AS events are subject to tissue-specific regulation. DESCRIPTION: To reveal the functional characteristics induced by AS and tissue-specific splicing events, a database for exploring these characteristics is needed, especially in plants. To address these goals, we constructed a database of annotated transcripts generated by alternative splicing in cucumbers (CuAS: http://cmb.bnu.edu.cn/alt_iso/index.php) that integrates genomic annotations, isoform-level functions, isoform-level features, and tissue-specific AS events among multiple tissues. CuAS supports a retrieval system that identifies unique IDs (gene ID, isoform ID, UniProt ID, and gene name), chromosomal positions, and gene families, and a browser for visualization of each gene. CONCLUSION: We believe that CuAS could be helpful for revealing the novel functional characteristics induced by AS and tissue-specific AS events in cucumbers. CuAS is freely available at http://cmb.bnu.edu.cn/alt_iso/index.php.


Asunto(s)
Empalme Alternativo , Cucumis sativus/genética , Bases de Datos Genéticas , Genes de Plantas , Transcriptoma
10.
Mol Biol Evol ; 34(1): 20-34, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27803123

RESUMEN

Bats can perceive the world by using a wide range of sensory systems, and some of the systems have become highly specialized, such as auditory sensory perception. Among bat species, the Old World leaf-nosed bats and horseshoe bats (rhinolophoid bats) possess the most sophisticated echolocation systems. Here, we reported the whole-genome sequencing and de novo assembles of two rhinolophoid bats-the great leaf-nosed bat (Hipposideros armiger) and the Chinese rufous horseshoe bat (Rhinolophus sinicus). Comparative genomic analyses revealed the adaptation of auditory sensory perception in the rhinolophoid bat lineages, probably resulting from the extreme selectivity used in the auditory processing by these bats. Pseudogenization of some vision-related genes in rhinolophoid bats was observed, suggesting that these genes have undergone relaxed natural selection. An extensive contraction of olfactory receptor gene repertoires was observed in the lineage leading to the common ancestor of bats. Further extensive gene contractions can be observed in the branch leading to the rhinolophoid bats. Such concordance suggested that molecular changes at one sensory gene might have direct consequences for genes controlling for other sensory modalities. To characterize the population genetic structure and patterns of evolution, we re-sequenced the genome of 20 great leaf-nosed bats from four different geographical locations of China. The result showed similar sequence diversity values and little differentiation among populations. Moreover, evidence of genetic adaptations to high altitudes in the great leaf-nosed bats was observed. Taken together, our work provided a useful resource for future research on the evolution of bats.


Asunto(s)
Quirópteros/genética , Ecolocación/fisiología , Genoma , Adaptación Fisiológica/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Evolución Biológica , China , Hibridación Genómica Comparativa/métodos , Evolución Molecular , Femenino , Filogenia , Selección Genética
11.
BMC Plant Biol ; 18(1): 5, 2018 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-29301488

RESUMEN

BACKGROUND: Alternative splicing (AS) is an important post-transcriptional process. It has been suggested that most AS events are subject to tissue-specific regulation. However, the global dynamics of AS in different tissues are poorly explored. RESULTS: To analyse global changes in AS in multiple tissues, we identified the AS events and constructed a comprehensive catalogue of AS events within each tissue based on the genome-wide RNA-seq reads from ten tissues in cucumber. First, we found that 58% of the multi-exon genes underwent AS. We further obtained 565 genes with significantly more AS events compared with random genes. These genes were found significant enrichment in biological processes related to the regulation of actin filament length. Second, significantly different AS event profiles among ten tissues were found. The tissues with the same origin of development are more likely to have a relatively similar AS profile. Moreover, 7370 genes showed tissue-specific AS events and were highly enriched in biological processes related to the positive regulation of cellular component organization. Root-specificity AS genes were related to the cellular response to DNA damage stimulus. Third, the genes with different intron retention (IR) patterns among the ten tissues showed significant difference in GC percentages of the retained intron, and the number of exons and FPKM of the major transcripts. CONCLUSIONS: Our study provided a comprehensive view of AS in multiple tissues. We revealed novel insights into the patterns of AS in multiple tissues and the tissue-specific AS in cucumber.


Asunto(s)
Empalme Alternativo , Cucumis sativus/genética , Transcriptoma , Cucumis sativus/metabolismo , Perfilación de la Expresión Génica
12.
PLoS Genet ; 10(1): e1004078, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24415955

RESUMEN

Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.


Asunto(s)
Núcleo Celular/genética , ADN Ribosómico/genética , Genoma Fúngico , Filogenia , Secuencia de Bases , Secuenciación de Nucleótidos de Alto Rendimiento , Datos de Secuencia Molecular , Micorrizas/genética , Sistemas de Lectura Abierta/genética , Esporas Fúngicas/genética
13.
BMC Evol Biol ; 16(1): 260, 2016 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-27903259

RESUMEN

BACKGROUND: The identification, description and understanding of protein-protein networks are important in cell biology and medicine, especially for the study of system biology where the focus concerns the interaction of biomolecules. Hubs and bottlenecks refer to the important proteins of a protein interaction network. Until now, very little attention has been paid to differentiate these two protein groups. RESULTS: By integrating human protein-protein interaction networks and human genome-wide variations across populations, we described the differences between hubs and bottlenecks in this study. Our findings showed that similar to interspecies, hubs and bottlenecks changed significantly more slowly than non-hubs and non-bottlenecks. To distinguish hubs from bottlenecks, we extracted their special members: hub-non-bottlenecks and non-hub-bottlenecks. The differences between these two groups represent what is between hubs and bottlenecks. We found that the variation rate of hubs was significantly lower than that of bottlenecks. In addition, we verified that stronger constraint is exerted on hubs than on bottlenecks. We further observed fewer non-synonymous sites on the domains of hubs than on those of bottlenecks and different molecular functions between them. CONCLUSIONS: Based on these results, we conclude that in recent human history, different variation patterns exist in hubs and bottlenecks in protein interaction networks. By revealing the difference between hubs and bottlenecks, our results might provide further insights in the relationship between evolution and biological structure.


Asunto(s)
Mapas de Interacción de Proteínas , Proteínas/química , Evolución Molecular , Variación Genética , Genoma Humano , Humanos , Dominios Proteicos , Mapeo de Interacción de Proteínas , Proteínas/genética
14.
Mol Genet Genomics ; 291(3): 1127-36, 2016 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26833483

RESUMEN

Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.


Asunto(s)
Polimorfismo de Nucleótido Simple , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Mapeo Cromosómico , Evolución Molecular , Frecuencia de los Genes , Humanos , Sistemas de Lectura Abierta , Dominios Proteicos , Selección Genética
15.
Int J Mol Sci ; 16(3): 5900-21, 2015 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-25782156

RESUMEN

Anthocyanin is the main pigment forming floral diversity. Several transcription factors that regulate the expression of anthocyanin biosynthetic genes belong to the R2R3-MYB family. Here we examined the transcriptomes of inflorescence buds of Scutellaria species (skullcaps), identified the expression R2R3-MYBs, and detected the genetic signatures of positive selection for adaptive divergence across the rapidly evolving skullcaps. In the inflorescence buds, seven R2R3-MYBs were identified. MYB11 and MYB16 were detected to be positively selected. The signature of positive selection on MYB genes indicated that species diversification could be affected by transcriptional regulation, rather than at the translational level. When comparing among the background lineages of Arabidopsis, tomato, rice, and Amborella, heterogeneous evolutionary rates were detected among MYB paralogs, especially between MYB13 and MYB19. Significantly different evolutionary rates were also evidenced by type-I functional divergence between MYB13 and MYB19, and the accelerated evolutionary rates in MYB19, implied the acquisition of novel functions. Another paralogous pair, MYB2/7 and MYB11, revealed significant radical amino acid changes, indicating divergence in the regulation of different anthocyanin-biosynthetic enzymes. Our findings not only showed that Scutellaria R2R3-MYBs are functionally divergent and positively selected, but also indicated the adaptive relevance of regulatory genes in floral diversification.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Proteínas de Plantas/genética , Scutellaria/genética , Factores de Transcripción/genética , Secuencia de Aminoácidos , Evolución Molecular , Inflorescencia/genética , Inflorescencia/metabolismo , Datos de Secuencia Molecular , Filogenia , Proteínas de Plantas/metabolismo , Scutellaria/clasificación , Scutellaria/metabolismo , Selección Genética , Alineación de Secuencia , Factores de Transcripción/metabolismo
16.
Sci Rep ; 14(1): 6102, 2024 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-38480729

RESUMEN

The mechanisms underlying the organization and evolution of the telencephalic pallium are not yet clear.. To address this issue, we first performed comparative analysis of genes critical for the development of the pallium (Emx1/2 and Pax6) and subpallium (Dlx2 and Nkx1/2) among 500 vertebrate species. We found that these genes have no obvious variations in chromosomal duplication/loss, gene locus synteny or Darwinian selection. However, there is an additional fragment of approximately 20 amino acids in mammalian Emx1 and a poly-(Ala)6-7 in Emx2. Lentiviruses expressing mouse or chick Emx2 (m-Emx2 or c-Emx2 Lv) were injected into the ventricle of the chick telencephalon at embryonic Day 3 (E3), and the embryos were allowed to develop to E12-14 or to posthatchling. After transfection with m-Emx2 Lv, the cells expressing Reelin, Vimentin or GABA increased, and neurogenesis of calbindin cells changed towards the mammalian inside-out pattern in the dorsal pallium and mesopallium. In addition, a behavior test for posthatched chicks indicated that the passive avoidance ratio increased significantly. The study suggests that the acquisition of an additional fragment in mammalian Emx2 is associated with the organization and evolution of the mammalian pallium.


Asunto(s)
Corteza Cerebral , Telencéfalo , Ratones , Animales , Telencéfalo/metabolismo , Corteza Cerebral/metabolismo , Encéfalo/metabolismo , Mamíferos/metabolismo , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Regulación del Desarrollo de la Expresión Génica
17.
Mol Ecol Resour ; 23(2): 499-510, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36239149

RESUMEN

Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria × ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramér's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.


Asunto(s)
Algoritmos , Ploidias , Análisis de Secuencia de ADN/métodos , Secuenciación Completa del Genoma , Alelos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
18.
Hortic Res ; 10(4): uhad038, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37799630

RESUMEN

Cis-regulatory elements regulate gene expression and play an essential role in the development and physiology of organisms. Many conserved non-coding sequences (CNSs) function as cis-regulatory elements. They control the development of various lineages. However, predicting clade-wide cis-regulatory elements across several closely related species remains challenging. Based on the relationship between CNSs and cis-regulatory elements, we present a computational approach that predicts the clade-wide putative cis-regulatory elements in 12 Cucurbitaceae genomes. Using 12-way whole-genome alignment, we first obtained 632 112 CNSs in Cucurbitaceae. Next, we identified 16 552 Cucurbitaceae-wide cis-regulatory elements based on collinearity among all 12 Cucurbitaceae plants. Furthermore, we predicted 3 271 potential regulatory pairs in the cucumber genome, of which 98 were verified using integrative RNA sequencing and ChIP sequencing datasets from samples collected during various fruit development stages. The CNSs, Cucurbitaceae-wide cis-regulatory elements, and their target genes are accessible at http://cmb.bnu.edu.cn/cisRCNEs_cucurbit/. These elements are valuable resources for functionally annotating CNSs and their regulatory roles in Cucurbitaceae genomes.

19.
G3 (Bethesda) ; 12(7)2022 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-35554526

RESUMEN

Population-specific, positive selection promotes the diversity of populations and drives local adaptations in the population. However, little is known about population-specific, recent positive selection in the populations of cultivated cucumber (Cucumis sativus L.). Based on a genomic variation map of individuals worldwide, we implemented a Fisher's combination method by combining 4 haplotype-based approaches: integrated haplotype score (iHS), number of segregating sites by length (nSL), cross-population extended haplotype homozygosity (XP-EHH), and Rsb. Overall, we detected 331, 2,147, and 3,772 population-specific, recent positive selective sites in the East Asian, Eurasian, and Xishuangbanna populations, respectively. Moreover, we found that these sites were related to processes for reproduction, response to abiotic and biotic stress, and regulation of developmental processes, indicating adaptations to their microenvironments. Meanwhile, the selective genes associated with traits of fruits were also observed, such as the gene related to the shorter fruit length in the Eurasian population and the gene controlling flesh thickness in the Xishuangbanna population. In addition, we noticed that soft sweeps were common in the East Asian and Xishuangbanna populations. Genes involved in hard or soft sweeps were related to developmental regulation and abiotic and biotic stress resistance. Our study offers a comprehensive candidate dataset of population-specific, selective signatures in cultivated cucumber populations. Our methods provide guidance for the analysis of population-specific, positive selection. These findings will help explore the biological mechanisms of adaptation and domestication of cucumber.


Asunto(s)
Cucumis sativus , Mapeo Cromosómico , Cucumis sativus/genética , Domesticación , Frutas/genética , Humanos , Fenotipo
20.
iScience ; 25(11): 105345, 2022 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-36325068

RESUMEN

Alternative splicing is crucial for a wide range of biological processes. However, limited by the availability of reference genomes, genome-wide patterns of alternative splicing remain unknown in most nonmodel organisms. We present an attention-based convolutional neural network model, DeepASmRNA, for predicting alternative splicing events using only transcriptomic data. DeepASmRNA consists of two parts: identification of alternatively spliced transcripts and classification of alternative splicing events, which outperformed the state-of-the-art method, AStrap, and other deep learning models. Then, we utilize transfer learning to increase the performance in species with limited training data and use an interpretation method to decipher splicing codes. Finally, applying Amborella, DeepASmRNA can identify more AS events than AStrap while maintaining the same level of precision, suggesting that DeepASmRNA has superior sensitivity to identify alternative splicing events. In summary, DeepASmRNA is scalable and interpretable for detecting genome-wide patterns of alternative splicing in species without a reference genome.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA