Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Mol Biol Evol ; 38(9): 3737-3741, 2021 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-33956142

RESUMO

Genome size in cellular organisms varies by six orders of magnitude, yet the cause of this large variation remains unexplained. The influential Drift-Barrier Hypothesis proposes that large genomes tend to evolve in small populations due to inefficient selection. However, to our knowledge no explicit tests of the Drift-Barrier Hypothesis have been reported. We performed the first explicit test, by comparing estimated census population size and genome size in mammals while incorporating potential covariates and the effect of shared evolutionary history. We found a lack of correlation between census population size and genome size among 199 species of mammals. These results suggest that population size is not the predominant factor influencing genome size and that the Drift-Barrier Hypothesis should be considered provisional.


Assuntos
Evolução Molecular , Mamíferos , Animais , Evolução Biológica , Tamanho do Genoma , Mamíferos/genética , Densidade Demográfica
2.
Mol Biol Evol ; 33(5): 1257-69, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-26769030

RESUMO

Why are certain bacterial genomes so small and compact? The adaptive genome streamlining hypothesis posits that selection acts to reduce genome size because of the metabolic burden of replicating DNA. To reveal the impact of genome streamlining on cellular traits, we reduced the Escherichia coli genome by up to 20% by deleting regions which have been repeatedly subjects of horizontal transfer in nature. Unexpectedly, horizontally transferred genes not only confer utilization of specific nutrients and elevate tolerance to stresses, but also allow efficient usage of resources to build new cells, and hence influence fitness in routine and stressful environments alike. Genome reduction affected fitness not only by gene loss, but also by induction of a general stress response. Finally, we failed to find evidence that the advantage of smaller genomes would be due to a reduced metabolic burden of replicating DNA or a link with smaller cell size. We conclude that as the potential energetic benefit gained by deletion of short genomic segments is vanishingly small compared with the deleterious side effects of these deletions, selection for reduced DNA synthesis costs is unlikely to shape the evolution of small genomes.


Assuntos
Transferência Genética Horizontal , Tamanho do Genoma , Genoma Bacteriano , Evolução Biológica , Escherichia coli/genética , Evolução Molecular , Genes Bacterianos , Filogenia
3.
Exp Mol Pathol ; 100(2): 248-56, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26779669

RESUMO

Laboratory strains of mice, both conventional and genetically engineered, have been introduced as critical components of a broad range of studies investigating normal and disease biology. Currently, the genetic identity of laboratory mice is primarily confirmed by surveying polymorphisms in selected sets of "conventional" genes and/or microsatellites in the absence of a single completely sequenced mouse genome. First, we examined variations in the genomic landscapes of transposable repetitive elements, named the TREome, in conventional and genetically engineered mouse strains using murine leukemia virus-type endogenous retroviruses (MLV-ERVs) as a probe. A survey of the genomes from 56 conventional strains revealed strain-specific TREome landscapes, and certain families (e.g., C57BL) of strains were discernible with defined patterns. Interestingly, the TREome landscapes of C3H/HeJ (toll-like receptor-4 [TLR4] mutant) inbred mice were different from its control C3H/HeOuJ (TLR4 wild-type) strain. In addition, a CD14 knock-out strain had a distinct TREome landscape compared to its control/backcross C57BL/6J strain. Second, an examination of superantigen (SAg, a "TREome gene") coding sequences of mouse mammary tumor virus-type ERVs in the genomes of the 46 conventional strains revealed a high diversity, suggesting a potential role of SAgs in strain-specific immune phenotypes. The findings from this study indicate that unexplored and intricate genomic variations exist in laboratory mouse strains, both conventional and genetically engineered. The TREome-based high-resolution genetics surveillance system for laboratory mice would contribute to efficient study design with quality control and accurate data interpretation. This genetics system can be easily adapted to other species ranging from plants to humans.


Assuntos
Retrovirus Endógenos/genética , Engenharia Genética/métodos , Genoma/genética , Genômica , Animais , Sequência de Bases , Feminino , Vírus da Leucemia Murina/genética , Masculino , Camundongos , Camundongos da Linhagem 129 , Camundongos Endogâmicos AKR , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C3H , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NOD , Camundongos Endogâmicos , Camundongos Knockout , Dados de Sequência Molecular , Polimorfismo Genético , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
4.
aBIOTECH ; 5(1): 52-70, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38576428

RESUMO

Bread wheat (Triticum aestivum) is an important crop and serves as a significant source of protein and calories for humans, worldwide. Nevertheless, its large and allopolyploid genome poses constraints on genetic improvement. The complex reticulate evolutionary history and the intricacy of genomic resources make the deciphering of the functional genome considerably more challenging. Recently, we have developed a comprehensive list of versatile computational tools with the integration of statistical models for dissecting the polyploid wheat genome. Here, we summarize the methodological innovations and applications of these tools and databases. A series of step-by-step examples illustrates how these tools can be utilized for dissecting wheat germplasm resources and unveiling functional genes associated with important agronomic traits. Furthermore, we outline future perspectives on new advanced tools and databases, taking into consideration the unique features of bread wheat, to accelerate genomic-assisted wheat breeding.

5.
Comput Biol Chem ; 112: 108107, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38875896

RESUMO

Spontaneous mutations are evolutionary engines as they generate variants for the evolutionary downstream processes that give rise to speciation and adaptation. Single nucleotide mutations (SNM) are the most abundant type of mutations among them. Here, we perform a meta-analysis to quantify the influence of selected global genomic parameters (genome size, genomic GC content, genomic repeat fraction, number of coding genes, gene count, and strand bias in prokaryotes) and local genomic features (local GC content, repeat content, CpG content and the number of SNM at CpG islands) on spontaneous SNM rates across the tree of life (prokaryotes, unicellular eukaryotes, multicellular eukaryotes) using wild-type sequence data in two different taxon classification systems. We find that the spontaneous SNM rates in our data are correlated with many genomic features in prokaryotes and unicellular eukaryotes irrespective of their sample sizes. On the other hand, only the number of coding genes was correlated with the spontaneous SNM rates in multicellular eukaryotes primarily contributed by vertebrates data. Considering local features, we notice that local GC content and CpG content significantly were correlated with the spontaneous SNM rates in the unicellular eukaryotes, while local repeat fraction is an important feature in prokaryotes and certain specific uni- and multi-cellular eukaryotes. Such predictive features of the spontaneous SNM rates often support non-linear models as the best fit compared to the linear model. We also observe that the strand asymmetry in prokaryotes plays an important role in determining the spontaneous SNM rates but the SNM spectrum does not.


Assuntos
Composição de Bases , Taxa de Mutação , Genômica , Genoma/genética , Nucleotídeos/genética , Células Procarióticas/metabolismo , Ilhas de CpG/genética , Animais
6.
Plant Biotechnol J ; 11(7): 809-17, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23639032

RESUMO

Marker development for marker-assisted selection in plant breeding is increasingly based on next-generation sequencing (NGS). However, marker development in crops with highly repetitive, complex genomes is still challenging. Here we applied sequence-based genotyping (SBG), which couples AFLP®-based complexity reduction to NGS, for de novo single nucleotide polymorphisms (SNP) marker discovery in and genotyping of a biparental durum wheat population. We identified 9983 putative SNPs in 6372 contigs between the two parents and used these SNPs for genotyping 91 recombinant inbred lines (RILs). Excluding redundant information from multiple SNPs per contig, 2606 (41%) markers were used for integration in a pre-existing framework map, resulting in the integration of 2365 markers over 2607 cM. Of the 2606 markers available for mapping, 91% were integrated in the pre-existing map, containing 708 SSRs, DArT markers, and SNPs from CRoPS technology, with a map-size increase of 492 cM (23%). These results demonstrate the high quality of the discovered SNP markers. With this methodology, it was possible to saturate the map at a final marker density of 0.8 cM/marker. Looking at the binned marker distribution (Figure 2), 63 of the 268 10-cM bins contained only SBG markers, showing that these markers are filling in gaps in the framework map. As to the markers that could not be used for mapping, the main reason was the low sequencing coverage used for genotyping. We conclude that SBG is a valuable tool for efficient, high-throughput and high-quality marker discovery and genotyping for complex genomes such as that of durum wheat.


Assuntos
Técnicas de Genotipagem , Polimorfismo de Nucleotídeo Único , Triticum/genética , Produtos Agrícolas/genética , Marcadores Genéticos , Genoma de Planta
7.
Biology (Basel) ; 12(9)2023 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-37759633

RESUMO

Dinoflagellates are important primary producers known to form Harmful Algae Blooms (HABs). In water, nutrient availability, pH, salinity and anthropogenic contamination constitute chemical stressors for them. The emergence of OMICs approaches propelled our understanding of dinoflagellates' responses to stressors. However, in dinoflagellates, these approaches are still biased, as transcriptomic approaches are largely conducted compared to proteomic and metabolomic approaches. Furthermore, integrated OMICs approaches are just emerging. Here, we report recent contributions of the different OMICs approaches to the investigation of dinoflagellates' responses to chemical stressors and discuss the current challenges we need to face to push studies further despite the lack of genomic resources available for dinoflagellates.

8.
Annu Rev Virol ; 9(1): 79-98, 2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-35655338

RESUMO

For decades, viruses have been isolated primarily from humans and other organisms. Interestingly, one of the most complex sides of the virosphere was discovered using free-living amoebae as hosts. The discovery of giant viruses in the early twenty-first century opened a new chapter in the field of virology. Giant viruses are included in the phylum Nucleocytoviricota and harbor large and complex DNA genomes (up to 2.7 Mb) encoding genes never before seen in the virosphere and presenting gigantic particles (up to 1.5 µm). Different amoebae have been used to isolate and characterize a plethora of new viruses with exciting details about novel viral biology. Through distinct isolation techniques and metagenomics, the diversity and complexity of giant viruses have astonished the scientific community. Here, we discuss the latest findings on amoeba viruses and how using these single-celled organisms as hosts has revealed entities that have remained hidden in plain sight for ages.


Assuntos
Amoeba , Vírus Gigantes , Vírus , Vírus de DNA/genética , Genoma Viral , Vírus Gigantes/genética , Humanos , Metagenômica , Filogenia , Vírus/genética
9.
Genome Biol Evol ; 13(7)2021 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-34061182

RESUMO

Organellar genomes serve as useful models for genome evolution and contain some of the most widely used phylogenetic markers, but they are poorly characterized in many lineages. Here, we report 20 novel mitochondrial genomes and 16 novel plastid genomes from the brown algae. We focused our efforts on the orders Chordales and Laminariales but also provide the first plastid genomes (plastomes) from Desmarestiales and Sphacelariales, the first mitochondrial genome (mitome) from Ralfsiales and a nearly complete mitome from Sphacelariales. We then compared gene content, sequence evolution rates, shifts in genome structural arrangements, and intron distributions across lineages. We confirm that gene content is largely conserved in both organellar genomes across the brown algal tree of life, with few cases of gene gain or loss. We further show that substitution rates are generally lower in plastid than mitochondrial genes, but plastomes are more variable in gene arrangement, as mitomes tend to be colinear even among distantly related lineages (with exceptions). Patterns of intron distribution across organellar genomes are complex. In particular, the mitomes of several laminarialean species possess group II introns that have T7-like ORFs, found previously only in mitochondrial genomes of Pylaiella spp. (Ectocarpales). The distribution of these mitochondrial introns is inconsistent with vertical transmission and likely reflects invasion by horizontal gene transfer between lineages. In the most extreme case, the mitome of Hedophyllum nigripes is ∼40% larger than the mitomes of close relatives because of these introns. Our results provide substantial insight into organellar evolution across the brown algae.


Assuntos
Genoma Mitocondrial , Genomas de Plastídeos , Phaeophyceae , Evolução Molecular , Genômica , Íntrons , Phaeophyceae/genética , Filogenia , Plastídeos/genética
11.
Methods Mol Biol ; 1890: 251-258, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30414160

RESUMO

Next-generation DNA sequencing has ushered in a new era of genotype-phenotype comparisons that have the potential to elucidate the genetic nature of complex traits. Since such methods rely on short sequence reads and since the human genome is composed largely of repetitive DNA elements larger than these read lengths many results cannot be mapped and are discarded, thus eliminating a large portion of the genome from analysis. Discerning associations in complex traits, such as longevity, will require either longer read lengths or methods to address these sequence complexities. Whole genome analysis, such as Genome Wide Association Studies (GWAS), also suffers from the repetitive nature of the human genome, as there exist many gaps in the availability of useable genetic markers, often in interesting regulatory regions. Methods are described here whereby some of these problems have been addressed by targeted DNA sequencing, full exploitation of available public databases, and a careful evaluation of genomic features where we use the FOXO3 gene as an example to identify functional variations and how they may relate to longevity.


Assuntos
Proteína Forkhead Box O3/genética , Estudos de Associação Genética , Longevidade/genética , Polimorfismo de Nucleotídeo Único , Estudos de Associação Genética/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reação em Cadeia da Polimerase , Análise de Sequência de DNA
12.
Evodevo ; 9: 22, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30455862

RESUMO

BACKGROUND: How genome complexity affects organismal phenotypic complexity is a fundamental question in evolutionary developmental biology. Previous studies proposed various contributing factors of genome complexity and tried to find the connection between genomic complexity and organism complexity. However, a general model to answer this question is lacking. Here, we introduce a 'two-level' model for the realization of genome complexity at phenotypic level. RESULTS: Five representative species across Protostomia and Deuterostomia were involved in this study. The intrinsic gene properties contributing to genome complexity were classified into two generalized groups: the complexity and age degree of both protein-coding and noncoding genes. We found that young genes tend to be simpler; however, the mid-age genes, rather than the oldest genes, show the highest proportion of high complexity. Complex genes tend to be utilized preferentially in each stage of embryonic development, with maximum representation during the late stage of organogenesis. This trend is mainly attributed to mid-age complex genes. In contrast, young genes tend to be expressed in specific spatiotemporal states. An obvious correlation between the time point of the change in over- and under-representation and the order of gene age was observed, which supports the funnel-like model of the conservation pattern of development. In addition, we found some probable causes for the seemingly contradictory 'funnel-like' or 'hourglass' model. CONCLUSIONS: These results indicate that complex and young genes contribute to organismal complexity at two different levels: Complex genes contribute to the complexity of individual proteomes in certain states, whereas young genes contribute to the diversity of proteomes in different spatiotemporal states. This conclusion is valid across the five species investigated, indicating it is a conserved model across Protostomia and Deuterostomia. The results in this study also support 'funnel-like model' from a new viewpoint and explain why there are different evo-devo relation models.

13.
PeerJ ; 5: e2861, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28090405

RESUMO

BACKGROUND: The mechanisms by which DNA sequences are expressed is the central preoccupation of molecular genetics. Recently, ourselves and others reported that in the diplomonad protist Giardia lamblia, the coding regions of several mRNAs are produced by ligation of independent RNA species expressed from distinct genomic loci. Such trans-splicing of introns was found to affect nearly as many genes in this organism as does classical cis-splicing of introns. These findings raised questions about the incidence of intron trans-splicing both across the G. lambliatranscriptome and across diplomonad diversity in general, however a dearth of transcriptomic data at the time prohibited systematic study of these questions. METHODS: I leverage newly available transcriptomic data from G. lamblia and the related diplomonad Spironucleus salmonicidato search for trans-spliced introns. My computational pipeline recovers all four previously reported trans-spliced introns in G. lamblia, suggesting good sensitivity. RESULTS: Scrutiny of thousands of potential cases revealed only a single additional trans-spliced intron in G. lamblia, in the p68 helicase gene, and no cases in S. salmonicida. The p68 intron differs from the previously reported trans-spliced introns in its high degree of streamlining: the core features of G. lamblia trans-spliced introns are closely packed together, revealing striking economy in the implementation of a seemingly inherently uneconomical molecular mechanism. DISCUSSION: These results serve to circumscribe the role of trans-splicing in diplomonads both in terms of the number of genes effected and taxonomically. Future work should focus on the molecular mechanisms, evolutionary origins and phenotypic implications of this intriguing phenomenon.

14.
Virus Res ; 240: 161-165, 2017 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-28822699

RESUMO

Gene duplication is the main source of genomic novelties and complexities for both eukaryotes and prokaryotes. In contrast, gene duplication appears to be infrequent in the RNA viruses. However, the extent and evolution of gene duplication in DNA viruses remains obscure. Here we perform a genome-wide analysis of gene duplication in the genomes of 250 DNA viruses that represent all known DNA viral genera. While no gene duplication event is identified in single stranded DNA (ssDNA) or reverse transcribing DNA viruses, gene duplication is frequent among double stranded DNA (dsDNA) viruses. For dsDNA viruses, the number of duplicate genes is significantly correlated with the genome complexity. We find that most of duplicate genes experienced purifying selection on average. Our results indicate that gene duplication play an important role in shaping the evolution of dsDNA viruses.


Assuntos
Vírus de DNA/genética , Evolução Molecular , Duplicação Gênica , Genoma Viral , Sequência de Aminoácidos , Infecções por Vírus de DNA/virologia , Vírus de DNA/química , Vírus de DNA/classificação , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência , Proteínas Virais/genética
15.
Artigo em Inglês | MEDLINE | ID: mdl-27431519

RESUMO

In their search to understand the evolution of biological complexity, John Maynard Smith and Eörs Szathmáry put forward the notion of major evolutionary transitions as those in which elementary units get together to generate something new, larger and more complex. The origins of chromosomes, eukaryotic cells, multicellular organisms, colonies and, more recently, language and technological societies are examples that clearly illustrate this notion. However, a transition may be considered as anecdotal or as major depending on the specific level of biological organization under study. In this contribution, I will argue that transitions may also be occurring at a much smaller scale of biological organization: the viral world. Not only that, but also that we can observe in real time how these major transitions take place during experimental evolution. I will review the outcome of recent evolution experiments with viruses that illustrate four major evolutionary transitions: (i) the origin of a new virus that infects an otherwise inaccessible host and completely changes the way it interacts with the host regulatory and metabolic networks, (ii) the incorporation and loss of genes, (iii) the origin of segmented genomes from a non-segmented one, and (iv) the evolution of cooperative behaviour and cheating between different viruses or strains during co-infection of the same host.This article is part of the themed issue 'The major synthetic evolutionary transitions'.


Assuntos
Evolução Biológica , Genoma Viral , Interações Microbianas , Vírus de RNA/genética , Coinfecção/virologia , Evolução Molecular , Vírus de RNA/fisiologia
16.
Front Plant Sci ; 7: 1973, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28105032

RESUMO

Double strand-break (DSB) induction allowed efficient gene targeting in barley (Hordeum vulgare), but little is known about efficiencies in its absence. To obtain such data, an assay system based on the acetolactate synthase (ALS) gene was established, a target gene which had been used previously in rice and Arabidopsis thaliana. Expression of recombinases RAD51 and RAD54 had been shown to improve gene targeting in A. thaliana and positive-negative (P-N) selection allows the routine production of targeted mutants without DSB induction in rice. We implemented these approaches in barley and analysed gene targeting with the ALS gene in wild type and RAD51 and RAD54 transgenic lines. In addition, P-N selection was tested. In contrast to the high gene targeting efficiencies obtained in the absence of DSB induction in A. thaliana or rice, not one single gene targeting event was obtained in barley. These data suggest that gene targeting efficiencies are very low in barley and can substantially differ between different plants, even at the same target locus. They also suggest that the amount of labour and time would become unreasonably high to use these methods as a tool in routine applications. This is particularly true since DSB induction offers efficient alternatives. Barley, unlike rice and A. thaliana has a large, complex genome, suggesting that genome size or complexity could be the reason for the low efficiencies. We discuss to what extent transformation methods, genome size or genome complexity could contribute to the striking differences in the gene targeting efficiencies between barley, rice and A. thaliana.

17.
Mol Breed ; 36(11): 154, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27942246

RESUMO

Advances in next generation sequencing have facilitated a large-scale single nucleotide polymorphism (SNP) discovery in many crop species. Genotyping-by-sequencing (GBS) approach couples next generation sequencing with genome complexity reduction techniques to simultaneously identify and genotype SNPs. Choice of enzymes used in GBS library preparation depends on several factors including the number of markers required, the desired level of multiplexing, and whether the enrichment of genic SNP is preferred. We evaluated various combinations of methylation-sensitive (AatII, PstI, MspI) and methylation-insensitive (SphI, MseI) enzymes for their effectiveness in genome complexity reduction and enrichment of genic SNPs. We discovered that the use of two methylation-sensitive enzymes effectively reduced genome complexity and did not require a size selection step. On the contrary, the genome coverage of libraries constructed with methylation-insensitive enzymes was quite high, and the additional size selection step may be required to increase the overall read depth. We also demonstrated the effectiveness of methylation-sensitive enzymes in enriching for SNPs located in genic regions. When two methylation-insensitive enzymes were used, only 16% of SNPs identified were located in genes and 18% in the vicinity (± 5 kb) of the genic regions, while most SNPs resided in the intergenic regions. In contrast, a remarkable degree of enrichment was observed when two methylation-sensitive enzymes were employed. Almost two thirds of the SNPs were located either inside (32-36%) or in the vicinity (28-31%) of the genic regions. These results provide useful information to help researchers choose appropriate GBS enzymes in oil palm and other crop species.

18.
Comput Biol Chem ; 53 Pt A: 71-8, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25182383

RESUMO

Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.


Assuntos
Algoritmos , Mapeamento Cromossômico/estatística & dados numéricos , Genoma Humano , Família Multigênica , Fatores de Transcrição/genética , Elementos Alu , Sítios de Ligação , Mapeamento Cromossômico/métodos , Ilhas de CpG , Elementos Facilitadores Genéticos , Éxons , Genes Reguladores , Humanos , Íntrons , Elementos Nucleotídeos Longos e Dispersos , Polimorfismo de Nucleotídeo Único
19.
Comput Biol Chem ; 53 Pt A: 144-52, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25218217

RESUMO

Analysis of cellular responses to diverse stimuli enables the exploration in the complexity of functional genomics. Typically, high-throughput microarray data allow us to identify genes that are differentially expressed under a phenomenon of interest. To extract the meanings from the long list of those differentially expressed genes, we present a new method "pathway-based LDA" to determine pathways/gene sets that are perturbed after exposure to different chemicals. In this study, a pathway is defined as a group of functionally related genes. Specifically, we have implemented a probabilistic Latent Dirichlet Allocation (LDA) model to learn drug-pathway-gene relations by taking known gene-pathway memberships as prior knowledge. We applied the pathway-based LDA model and 236 known pathways in order to determine pathway responsiveness to gene expression data of 1169 drugs. Our method yielded a better predictive performance on pathway responsiveness to drug treatments than the existing methods. Moreover, the pathway-based LDA also revealed genes contributing the most in each pre-defined pathway through a probabilistic distribution of genes. In achieving that, our method could provide a useful estimator of the pathway complexity of a genome.


Assuntos
Regulação da Expressão Gênica/efeitos dos fármacos , Redes Reguladoras de Genes/efeitos dos fármacos , Genoma Humano/efeitos dos fármacos , Redes e Vias Metabólicas/efeitos dos fármacos , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Cromanos/farmacologia , Dexametasona/farmacologia , Perfilação da Expressão Gênica , Genisteína/farmacologia , Humanos , Redes e Vias Metabólicas/genética , Farmacogenética , Propiltiouracila/farmacologia , Tiazolidinedionas/farmacologia , Troglitazona
20.
Mol Ecol Resour ; 14(6): 1314-21, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24806844

RESUMO

Application of high-throughput sequencing platforms in the field of ecology and evolutionary biology is developing quickly with the introduction of efficient methods to reduce genome complexity. Numerous approaches for genome complexity reduction have been developed using different combinations of restriction enzymes, library construction strategies and fragment size selection. As a result, the choice of which techniques to use may become cumbersome, because it is difficult to anticipate the number of loci resulting from each method. We developed SimRAD, an R package that performs in silico restriction enzyme digests and fragment size selection as implemented in most restriction site associated DNA polymorphism and genotyping by sequencing methods. In silico digestion is performed on a reference genome or on a randomly generated DNA sequence when no reference genome sequence is available. SimRAD accurately predicts the number of loci under alternative protocols when a reference genome sequence is available for the targeted species (or a close relative) but may be unreliable when no reference genome is available. SimRAD is also useful for fine-tuning a given protocol to adjust the number of targeted loci. Here, we outline the functionality of SimRAD and provide an illustrative example of the use of the package (available on the CRAN at http://cran.r-project.org/web/packages/SimRAD).


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Mapeamento por Restrição , Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA