Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Bioinformatics ; 37(15): 2081-2087, 2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-33515232

RESUMO

MOTIVATION: Unique marker sequences are highly sought after in molecular diagnostics. Nevertheless, there are only few programs available to search for marker sequences, compared to the many programs for similarity search. We therefore wrote the program Fur for Finding Unique genomic Regions. RESULTS: Fur takes as input a sample of target sequences and a sample of closely related neighbors. It returns the regions present in all targets and absent from all neighbors. The recently published program genmap can also be used for this purpose and we compared it to fur. When analyzing a sample of 33 genomes representing the major phylogroups of E.coli, fur was 40 times faster than genmap but used three times more memory. On the other hand, genmap yielded three times more markers, but they were less accurate when tested in silico on a sample of 237 E.coli genomes. We also designed phylogroup-specific PCR primers based on the markers proposed by genmap and fur, and tested them by analyzing their virtual amplicons in GenBank. Finally, we used fur to design primers specific to a Lactobacillus species, and found excellent sensitivity and specificity in vitro. AVAILABILITY AND IMPLEMENTATION: Fur sources and documentation are available from https://github.com/evolbioinf/fur. The compiled software is posted as a docker container at https://hub.docker.com/r/haubold/fox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Bioinformatics ; 36(7): 2040-2046, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31790149

RESUMO

MOTIVATION: Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. RESULTS: We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. AVAILABILITY AND IMPLEMENTATION: Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Algoritmos , Genoma , Análise de Sequência de DNA
3.
Bioinformatics ; 35(11): 1813-1819, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395202

RESUMO

MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, Cm, and augment it by deriving its null distribution for random sequences. We then apply Cm to the human and mouse genomes to investigate the relationship between sequence complexity and function. RESULTS: We implemented Cm in the program macle and show through simulation that the newly derived null distribution of Cm is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are twofold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions. AVAILABILITY AND IMPLEMENTATION: Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; Cm browser tracks from guanine.evolbio.mgp.de/complexity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Genômica , Animais , Genes Controladores do Desenvolvimento , Humanos , Mamíferos , Camundongos , Software
4.
Bioinformatics ; 32(16): 2554-5, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153632

RESUMO

MOTIVATION: In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms. Designing allele-specific primers by hand is time-consuming. We have therefore written a package to support hotspot detection and analysis. RESULTS: hotspot consists of four programs: asp looks up SNPs and designs allele-specific primers; aso constructs allele-specific oligos for mapping recombinants; xov implements a maximum-likelihood method for estimating the crossover rate; six, finally, simulates typing data. AVAILABILITY AND IMPLEMENTATION: hotspot is written in C. Sources are freely available under the GNU General Public License from http://github.com/evolbioinf/hotspot/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Recombinação Genética , Software , Espermatozoides , Alelos , Humanos , Funções Verossimilhança , Masculino
6.
Brief Bioinform ; 15(3): 407-18, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24291823

RESUMO

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.


Assuntos
Genética Populacional/métodos , Filogenia , Análise de Sequência de DNA/métodos , Animais , Biologia Computacional/métodos , Evolução Molecular , Genética Populacional/estatística & dados numéricos , Genoma Mitocondrial , Humanos , Modelos Genéticos , Mutação , Recombinação Genética , Seleção Genética , Alinhamento de Sequência , Análise de Sequência de DNA/estatística & dados numéricos
7.
Bioinformatics ; 31(8): 1169-75, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25504847

RESUMO

MOTIVATION: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae. AVAILABILITY AND IMPLEMENTATION: We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Evolução Biológica , Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Bases de Dados Genéticas , Humanos , Filogenia
8.
PLoS Pathog ; 9(7): e1003503, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23935484

RESUMO

The origins of crop diseases are linked to domestication of plants. Most crops were domesticated centuries--even millennia--ago, thus limiting opportunity to understand the concomitant emergence of disease. Kiwifruit (Actinidia spp.) is an exception: domestication began in the 1930s with outbreaks of canker disease caused by P. syringae pv. actinidiae (Psa) first recorded in the 1980s. Based on SNP analyses of two circularized and 34 draft genomes, we show that Psa is comprised of distinct clades exhibiting negligible within-clade diversity, consistent with disease arising by independent samplings from a source population. Three clades correspond to their geographical source of isolation; a fourth, encompassing the Psa-V lineage responsible for the 2008 outbreak, is now globally distributed. Psa has an overall clonal population structure, however, genomes carry a marked signature of within-pathovar recombination. SNP analysis of Psa-V reveals hundreds of polymorphisms; however, most reside within PPHGI-1-like conjugative elements whose evolution is unlinked to the core genome. Removal of SNPs due to recombination yields an uninformative (star-like) phylogeny consistent with diversification of Psa-V from a single clone within the last ten years. Growth assays provide evidence of cultivar specificity, with rapid systemic movement of Psa-V in Actinidia chinensis. Genomic comparisons show a dynamic genome with evidence of positive selection on type III effectors and other candidate virulence genes. Each clade has highly varied complements of accessory genes encoding effectors and toxins with evidence of gain and loss via multiple genetic routes. Genes with orthologs in vascular pathogens were found exclusively within Psa-V. Our analyses capture a pathogen in the early stages of emergence from a predicted source population associated with wild Actinidia species. In addition to candidate genes as targets for resistance breeding programs, our findings highlight the importance of the source population as a reservoir of new disease.


Assuntos
Actinidia/microbiologia , Proteínas de Bactérias/genética , Genoma Bacteriano , Doenças das Plantas/microbiologia , Pseudomonas syringae/genética , Actinidia/crescimento & desenvolvimento , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Produtos Agrícolas/crescimento & desenvolvimento , Produtos Agrícolas/microbiologia , Frutas/crescimento & desenvolvimento , Frutas/microbiologia , Ilhas Genômicas , Itália , Japão , Nova Zelândia , Filogenia , Doenças das Plantas/etiologia , Brotos de Planta/crescimento & desenvolvimento , Brotos de Planta/microbiologia , Polimorfismo de Nucleotídeo Único , Pseudomonas syringae/crescimento & desenvolvimento , Pseudomonas syringae/isolamento & purificação , Pseudomonas syringae/patogenicidade , Recombinação Genética , República da Coreia , Especificidade da Espécie , Virulência
9.
Bioinformatics ; 29(24): 3121-7, 2013 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-24064419

RESUMO

MOTIVATION: Why recombination? is one of the central questions in biology. This has led to a host of methods for quantifying recombination from sequence data. These methods are usually based on aligned DNA sequences. Here, we propose an efficient alignment-free alternative. RESULTS: Our method is based on the distribution of match lengths, which we look up using enhanced suffix arrays. By eliminating the alignment step, the test becomes fast enough for application to whole bacterial genomes. Using simulations we show that our test has similar power as established tests when applied to long pairs of sequences. When applied to 58 genomes of Escherichia coli, we pick up the strongest recombination signal from a 125 kb horizontal gene transfer engineered 20 years ago. AVAILABILITY AND IMPLEMENTATION: We have implemented our method in the command-line program rush. Its C sources and documentation are available under the GNU General Public License from http://guanine.evolbio.mpg.de/rush/.


Assuntos
Algoritmos , Biologia Computacional , Genoma Bacteriano , Recombinação Genética , Alinhamento de Sequência/métodos , Simulação por Computador , Escherichia coli/genética , Filogenia
10.
Bioinformatics ; 27(11): 1466-72, 2011 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-21471011

RESUMO

MOTIVATION: Bacterial and viral genomes are often affected by horizontal gene transfer observable as abrupt switching in local homology. In addition to the resulting mosaic genome structure, they frequently contain regions not found in close relatives, which may play a role in virulence mechanisms. Due to this connection to medical microbiology, there are numerous methods available to detect horizontal gene transfer. However, these are usually aimed at individual genes and viral genomes rather than the much larger bacterial genomes. Here, we propose an efficient alignment-free approach to describe the mosaic structure of viral and bacterial genomes, including their unique regions. RESULTS: Our method is based on the lengths of exact matches between pairs of sequences. Long matches indicate close homology, short matches more distant homology or none at all. These exact match lengths can be looked up efficiently using an enhanced suffix array. Our program implementing this approach, alfy (ALignment-Free local homologY), efficiently and accurately detects the recombination break points in simulated DNA sequences and among recombinant HIV-1 strains. We also apply alfy to Escherichia coli genomes where we detect new evidence for the hypothesis that strains pathogenic in poultry can infect humans. AVAILABILITY: alfy is written in standard C and its source code is available under the GNU General Public License from http://guanine.evolbio.mpg.de/alfy/. The software package also includes documentation and example data.


Assuntos
Genoma Bacteriano , Genoma Viral , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Escherichia coli/genética , Transferência Genética Horizontal , Genômica/métodos , HIV-1/genética , Humanos , Software
11.
Bioinformatics ; 27(4): 449-55, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21156730

RESUMO

MOTIVATION: Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing. RESULTS: We derive and implement an alignment-free estimator of the number of pairwise mismatches, . Our implementation of , pim, is based on an enhanced suffix array and inherits the superior time and memory efficiency of this data structure. Simulations demonstrate that is accurate if mutations are distributed randomly along the chromosome. While real data often deviates from this ideal, remains useful for identifying regions of low genetic diversity using a sliding window approach. We demonstrate this by applying it to the complete genomes of 37 strains of Drosophila melanogaster, and to the genomes of two closely related Drosophila species, D.simulans and D.sechellia. In both cases, we detect the diversity minimum and discuss its biological implications.


Assuntos
Biologia Computacional/métodos , Variação Genética , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Simulação por Computador , Drosophila/genética , Genoma de Inseto , Recombinação Genética
12.
Methods Mol Biol ; 2242: 77-89, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33961219

RESUMO

By tracking pathogen outbreaks using whole genome sequencing, medical microbiology is currently being transformed into genomic epidemiology. This change in technology is leading to the rapid accumulation of large samples of closely related genome sequences. Summarizing such samples into phylogenies can be computationally challenging. Our program andi quickly computes accurate pairwise distances between up to thousands of bacterial genomes. Working under the UNIX command line, we show how andi can be used to transform genomes to phylogenies with support values ready to be printed or integrated into documents.


Assuntos
DNA Bacteriano/genética , Escherichia coli/genética , Genoma Bacteriano , Genômica , Filogenia , Shigella/genética , Bases de Dados Genéticas , Projetos de Pesquisa , Design de Software , Fluxo de Trabalho
13.
Oncotarget ; 12(10): 1011-1023, 2021 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-34012513

RESUMO

Non-invasive clinical diagnostics of bladder cancer is feasible via a set of chemically distinct molecules including macromolecular tumor markers such as polypeptides and nucleic acids. In terms of tumor-related aberrant gene expression, RNA transcripts are the primary indicator of tumor-specific gene expression as for polypeptides and their metabolic products occur subsequently. Thus, in case of bladder cancer, urine RNA represents an early potentially useful diagnostic marker. Here we describe a systematic deep transcriptome analysis of representative pools of urine RNA collected from healthy donors versus bladder cancer patients according to established SOPs. This analysis revealed RNA marker candidates reflecting coding sequences, non-coding sequences, and circular RNAs. Next, we designed and validated PCR amplicons for a set of novel marker candidates and tested them in human bladder cancer cell lines. We identified linear and circular transcripts of the S100 Calcium Binding Protein 6 (S100A6) and translocation associated membrane protein 1 (TRAM1) as highly promising potential tumor markers. This work strongly suggests exploiting urine RNAs as diagnostic markers of bladder cancer and it suggests specific novel markers. Further, this study describes an entry into the tumor-biology of bladder cancer and the development of gene-targeted therapeutic drugs.

14.
Bioinformatics ; 25(24): 3221-7, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-19825795

RESUMO

MOTIVATION: Genome comparison is central to contemporary genomics and typically relies on sequence alignment. However, genome-wide alignments are difficult to compute. We have, therefore, recently developed an accurate alignment-free estimator of the number of substitutions per site based on the lengths of exact matches between pairs of sequences. The previous implementation of this measure requires n(n-1) suffix tree constructions and traversals, where n is the number of sequences analyzed. This does not scale well for large n. RESULTS: We present an algorithm to extract pairwise distances in a single traversal of a single suffix tree containing n sequences. As a result, the run time of the suffix tree construction phase of our algorithm is reduced from O(n(2)L) to O(nL), where L is the length of each sequence. We implement this algorithm in the program kr version 2 and apply it to 825 HIV genomes, 13 genomes of enterobacteria and the complete genomes of 12 Drosophila species. We show that, depending on the input dataset, the new program is at least 10 times faster than its predecessor. AVAILABILITY: Version 2 of kr can be tested via a web interface at http://guanine.evolbio.mpg.de/kr2/. It is written in standard C and its source code is available under the GNU General Public License from the same web site. CONTACT: haubold@evolbio.mpg.de Supplementary informations: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genoma , Genômica/métodos , Animais , Sequência de Bases , Bases de Dados Genéticas , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA
15.
Genetics ; 182(1): 205-16, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-19237689

RESUMO

Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajima's D (D(T)) and Fu and Li's D (D(FL)) statistics, particularly under specieswide (range) expansions. Pooled samples yield average D(T) and D(FL) values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects D(FL) more than it does D(T). Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm > 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes.


Assuntos
Drosophila melanogaster , Variação Genética , Genética Populacional , Desequilíbrio de Ligação , Modelos Genéticos , Solanum lycopersicum , Animais , Simulação por Computador , Demografia , Drosophila melanogaster/genética , Solanum lycopersicum/genética , Humanos
16.
Mol Ecol ; 19 Suppl 1: 277-84, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20331786

RESUMO

Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho. This program can efficiently compute the following three maximum likelihood estimators based on shotgun sequence data obtained from single diploid individuals: the population mutation rate (4N(e)mu), the sequencing error rate, and the population recombination rate (4N(e)c). We demonstrate the accuracy of mlRho by applying it to simulated data sets. In addition, we analyse the genomes of the sea squirt Ciona intestinalis and the water flea Daphnia pulex. Ciona intestinalis is an obligate outcrosser, while D. pulex is a cyclic parthenogen, and we discuss how these contrasting life histories are reflected in our parameter estimates. The program mlRho is freely available from http://guanine.evolbio.mpg.de/mlRho.


Assuntos
Análise Mutacional de DNA/métodos , Genética Populacional , Genômica/métodos , Recombinação Genética , Software , Animais , Ciona intestinalis/genética , Biologia Computacional/métodos , Simulação por Computador , Daphnia/genética , Diploide , Genoma , Funções Verossimilhança , Modelos Genéticos
17.
Mol Ecol ; 19 Suppl 1: 162-75, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20331778

RESUMO

Recent advances in sequencing technology promise to provide new strategies for studying population differentiation and speciation phenomena in their earliest phases. We focus here on the black carrion crow (Corvus [corone] corone), which forms a zone of hybridization and overlap with the grey coated hooded crow (Corvus [corone] cornix). However, although these semispecies are taxonomically distinct, previous analyses based on several types of genetic markers did not reveal significant molecular differentiation between them. We here corroborate this result with sequence data obtained from a set of 25 nuclear intronic loci. Thus, the system represents a case of a very early phase of species divergence that requires new molecular approaches for its description. We have therefore generated RNAseq expression profiles using barcoded massively parallel pyrosequencing of brain mRNA from six individuals of the carrion crow and five individuals from a hybrid zone with the hooded crow. We obtained 856 675 reads from two runs, with average read length of 270 nt and coverage of 8.44. Reads were assembled de novo into 19 552 contigs, 70% of which could be assigned to annotated genes in chicken and zebra finch. This resulted in a total of 7637 orthologous genes and a core set of 1301 genes that could be compared across all individuals. We find a clear clustering of expression profiles for the pure carrion crow animals and disperse profiles for the animals from the hybrid zone. These results suggest that gene expression differences may indeed be a sensitive indicator of initial species divergence.


Assuntos
Corvos/genética , Perfilação da Expressão Gênica , Hibridização Genética , Animais , Análise por Conglomerados , Hibridização Genômica Comparativa , Etiquetas de Sequências Expressas , Expressão Gênica , Projetos Piloto , Análise de Sequência de DNA/métodos
18.
G3 (Bethesda) ; 10(1): 211-223, 2020 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-31699776

RESUMO

With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex.


Assuntos
Biomassa , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Daphnia/genética , Daphnia/crescimento & desenvolvimento
19.
Mol Cell Biol ; 23(3): 864-72, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12529392

RESUMO

Nuclear receptors are ligand-modulated transcription factors. On the basis of the completed human genome sequence, this family was thought to contain 48 functional members. However, by mining human and mouse genomic sequences, we identified FXRbeta as a novel family member. It is a functional receptor in mice, rats, rabbits, and dogs but constitutes a pseudogene in humans and primates. Murine FXRbeta is widely coexpressed with FXR in embryonic and adult tissues. It heterodimerizes with RXRalpha and stimulates transcription through specific DNA response elements upon addition of 9-cis-retinoic acid. Finally, we identified lanosterol as a candidate endogenous ligand that induces coactivator recruitment and transcriptional activation by mFXRbeta. Lanosterol is an intermediate of cholesterol biosynthesis, which suggests a direct role in the control of cholesterol biosynthesis in nonprimates. The identification of FXRbeta as a novel functional receptor in nonprimate animals sheds new light on the species differences in cholesterol metabolism and has strong implications for the interpretation of genetic and pharmacological studies of FXR-directed physiologies and drug discovery programs.


Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Lanosterol/metabolismo , Receptores Citoplasmáticos e Nucleares/genética , Receptores Citoplasmáticos e Nucleares/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Animais , Sequência de Bases , Colesterol/metabolismo , Clonagem Molecular , DNA Complementar/genética , Proteínas de Ligação a DNA/química , Dimerização , Cães , Humanos , Ligantes , Masculino , Camundongos , Dados de Sequência Molecular , Primatas , Estrutura Quaternária de Proteína , Pseudogenes , Coelhos , Ratos , Fatores de Transcrição/química
20.
BMC Bioinformatics ; 7: 541, 2006 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-17187668

RESUMO

BACKGROUND: Genome sequences vary strongly in their repetitiveness and the causes for this are still debated. Here we propose a novel measure of genome repetitiveness, the index of repetitiveness, Ir, which can be computed in time proportional to the length of the sequences analyzed. We apply it to 336 genomes from all three domains of life. RESULTS: The expected value of Ir is zero for random sequences of any G/C content and greater than zero for sequences with excess repeats. We find that the Ir of archaea is significantly smaller than that of eubacteria, which in turn is smaller than that of eukaryotes. Mouse chromosomes have a significantly higher Ir than human chromosomes and within each genome the Y chromosome is most repetitive. A sliding window analysis reveals that the human HOXA cluster and two surrounding genes are characterized by local minima in Ir. A program for calculating the Ir is freely available at http://adenine.biz.fh-weihenstephan.de/ir/. CONCLUSION: The general measure of DNA repetitiveness proposed in this paper can be efficiently computed on a genomic scale. This reveals a broad spectrum of repetitiveness among diverse genomes which agrees qualitatively with previous studies of repeat content. A sliding window analysis helps to analyze the intragenomic distribution of repeats.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma/genética , Modelos Genéticos , Sequências Repetitivas de Ácido Nucleico/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Animais , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA