Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
Nat Biotechnol ; 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38168995

ABSTRACT

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

2.
Nat Commun ; 14(1): 3090, 2023 05 29.
Article in English | MEDLINE | ID: mdl-37248219

ABSTRACT

Long-read HiFi genome sequencing allows for accurate detection and direct phasing of single nucleotide variants, indels, and structural variants. Recent algorithmic development enables simultaneous detection of CpG methylation for analysis of regulatory element activity directly in HiFi reads. We present a comprehensive haplotype resolved 5-base HiFi genome sequencing dataset from a rare disease cohort of 276 samples in 152 families to identify rare (~0.5%) hypermethylation events. We find that 80% of these events are allele-specific and predicted to cause loss of regulatory element activity. We demonstrate heritability of extreme hypermethylation including rare cis variants associated with short (~200 bp) and large hypermethylation events (>1 kb), respectively. We identify repeat expansions in proximal promoters predicting allelic gene silencing via hypermethylation and demonstrate allelic transcriptional events downstream. On average 30-40 rare hypermethylation tiles overlap rare disease genes per patient, providing indications for variation prioritization including a previously undiagnosed pathogenic allele in DIP2B causing global developmental delay. We propose that use of HiFi genome sequencing in unsolved rare disease cases will allow detection of unconventional diseases alleles due to loss of regulatory element activity.


Subject(s)
DNA Methylation , Rare Diseases , Humans , Haplotypes , Rare Diseases/genetics , DNA Methylation/genetics , Sequence Analysis, DNA , Base Sequence , High-Throughput Nucleotide Sequencing , Nerve Tissue Proteins/genetics
3.
Clin Transl Sci ; 15(4): 912-922, 2022 04.
Article in English | MEDLINE | ID: mdl-35297172

ABSTRACT

An accurate understanding of the changes in height and weight of children with age is critical to the development of models predicting drug concentrations in children (i.e., physiologically-based pharmacokinetic models). However, curves describing the growth of a typical population of children may not accurately characterize growth of children with various conditions, such as obesity. Therefore, to develop height and weight versus age growth curves for youth who were diagnosed with type 2 diabetes, we extracted data from electronic medical records. Robust nonlinear models were parameterized to the equations describing height and weight versus age as defined by the Centers for Disease Control and Prevention (CDC). CDC z-scores were calculated using an internal program. The growth curves and z-scores were compared to CDC norms. Youth with type 2 diabetes were increasingly heavier than CDC norms from early childhood. Except for a period around puberty, youth with type 2 diabetes were, on average, shorter than CDC norms, resulting in shorter average adult height. Deviations in growth were apparent in youth who develop type 2 diabetes; such deviations may be expected for other conditions as well, and disease-specific growth curves should be considered during development of model-informed drug development for pediatric conditions.


Subject(s)
Diabetes Mellitus, Type 2 , Adolescent , Adult , Body Height , Body Mass Index , Body Weight , Child , Child Development , Child, Preschool , Humans , Obesity
4.
Clin Transl Sci ; 13(3): 509-519, 2020 05.
Article in English | MEDLINE | ID: mdl-31917523

ABSTRACT

The hepatic influx transporter OATP1B1 (SLCO1B1) plays an important role in the disposition of endogenous substrates and drugs prescribed to children. Alternative splicing increases the diversity of protein products from > 90% of human genes and may be triggered by developmental signals. As concentrations of several endogenous OATP1B1 substrates change during growth and development, with this exploratory study we investigated age-dependent alternative splicing of SLCO1B1 mRNA in 97 postmortem livers (fetus-adolescents). Twenty-seven splice variants were detected; 10 were confirmed by additional bioinformatic analyses and verified by quantitative polymerase chain reaction, and selected for detailed analysis based on relative abundance, association with age, and overlap with an adjacent gene. Two splice variants code for reference OATP1B1 protein, and eight code for truncated proteins. The expression of eight isoforms was associated with age. We conclude that alternative splicing of SLCO1B1 occurs frequently in children; although the functional consequences remain unknown, the data raise the possibility of a regulatory role for alternative splicing in mediating developmental changes in drug disposition.


Subject(s)
Alternative Splicing , Gene Expression Regulation, Developmental , Liver-Specific Organic Anion Transporter 1/genetics , Liver/metabolism , Aborted Fetus , Adolescent , Age Factors , Child , Child, Preschool , Humans , Infant , Infant, Newborn , Liver-Specific Organic Anion Transporter 1/metabolism , Netherlands , Organic Anion Transporters/genetics , Organic Anion Transporters/metabolism , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA-Seq , Solute Carrier Proteins/genetics , Solute Carrier Proteins/metabolism , Stillbirth
5.
Eur J Pharm Sci ; 124: 217-227, 2018 Nov 01.
Article in English | MEDLINE | ID: mdl-30171984

ABSTRACT

BACKGROUND: Hepatic membrane transporters are involved in the transport of many endogenous and exogenous compounds, including drugs. We aimed to study the relation of age with absolute transporter protein expression in a cohort of 62 mainly fetus and newborn samples. METHODS: Protein expressions of BCRP, BSEP, GLUT1, MCT1, MDR1, MRP1, MRP2, MRP3, NTCP, OCT1, OATP1B1, OATP1B3, OATP2B1 and ATP1A1 were quantified with LC-MS/MS in isolated crude membrane fractions of snap-frozen post-mortem fetal and pediatric, and surgical adult liver samples. mRNA expression was quantified using RNA sequencing, and genetic variants with TaqMan assays. We explored relationships between protein expression and age (gestational age [GA], postnatal age [PNA], and postmenstrual age); between protein and mRNA expression; and between protein expression and genotype. RESULTS: We analyzed 36 fetal (median GA 23.4 weeks [range 15.3-41.3]), 12 premature newborn (GA 30.2 weeks [24.9-36.7], PNA 1.0 weeks [0.14-11.4]), 10 term newborn (GA 40.0 weeks [39.7-41.3], PNA 3.9 weeks [0.3-18.1]), 4 pediatric (PNA 4.1 years [1.1-7.4]) and 8 adult liver samples. A relationship with age was found for BCRP, BSEP, GLUT1, MDR1, MRP1, MRP2, MRP3, NTCP, OATP1B1 and OCT1, with the strongest relationship for postmenstrual age. For most transporters mRNA and protein expression were not correlated. No genotype-protein expression relationship was detected. DISCUSSION AND CONCLUSION: Various developmental patterns of protein expression of hepatic transporters emerged in fetuses and newborns up to four months of age. Postmenstrual age was the most robust factor predicting transporter expression in this cohort. Our data fill an important gap in current pediatric transporter ontogeny knowledge.


Subject(s)
Fetus/metabolism , Liver/metabolism , Membrane Transport Proteins/metabolism , Adult , Animals , Child , Child, Preschool , Dogs , HEK293 Cells , Humans , Infant , Infant, Newborn , Liver/embryology , Madin Darby Canine Kidney Cells , Membrane Transport Proteins/genetics , Proteomics , RNA, Messenger/metabolism
6.
Pharmacogenet Genomics ; 28(3): 86-94, 2018 03.
Article in English | MEDLINE | ID: mdl-29360682

ABSTRACT

OBJECTIVES: The majority of drug dosing studies are based on adult populations, with modification of the dosing for children based on size and weight. This rudimentary approach for drug dosing children is limited, as biologically a child can differ from an adult in far more aspects than just size and weight. Specifically, understanding the ontogeny of childhood liver development is critical in dosing drugs that are metabolized through the liver, as the rate of metabolism determines the duration and intensity of a drug's pharmacologic action. Therefore, we set out to determine pharmacogenes that change over childhood development, followed by a secondary agnostic analysis, assessing changes transcriptome wide. MATERIALS AND METHODS: A total of 47 human liver tissue samples, with between 10 and 13 samples in four age groups spanning childhood development, underwent pair-end sequencing. Kruskal-Wallis and Spearman's rank correlation tests were used to determine the association of gene expression levels with age. Gene set analysis based on the pathways in KEGG utilized the gamma method. Correction for multiple testing was completed using q-values. RESULTS: We found evidence for increased expression of 'very important pharmacogenes', for example, coagulation factor V (F5) (P=6.7×10(-7)), angiotensin I converting enzyme (ACE) (P=6.4×10(-3)), and solute carrier family 22 member 1 (SLC22A1) (P=7.0×10(-5)) over childhood development. In contrast, we observed a significant decrease in expression of two alternative CYP3A7 transcripts (P=1.5×10(-5) and 3.0×10(-5)) over development. The analysis of genome-wide changes detected transcripts in the following genes with significant changes in mRNA expression (P<1×10(-9) with false discovery rate<5×0(-5)): ADCY1, PTPRD, CNDP1, DCAF12L1 and HIP1. Gene set analysis determined ontogeny-related transcriptomic changes in the renin-angiotensin pathway (P<0.002), with lower expression of the pathway, in general, observed in liver samples from younger participants. CONCLUSION: Considering that the renin-angiotensin pathway plays a central role in blood pressure and plasma sodium concentration, and our observation that ACE and PTPRD expression increased over the spectrum of childhood development, this finding could potentially impact the dosing of an entire class of drugs known as ACE-inhibitors in pediatric patients.


Subject(s)
High-Throughput Nucleotide Sequencing , Organic Cation Transporter 1/genetics , Renin-Angiotensin System/genetics , Transcriptome/genetics , Adolescent , Child , Child, Preschool , Cytochrome P-450 CYP3A/genetics , Factor V/genetics , Female , Gene Expression Regulation/drug effects , Humans , Infant , Infant, Newborn , Liver/drug effects , Liver/metabolism , Male , Peptidyl-Dipeptidase A/genetics
7.
Drug Metab Dispos ; 44(7): 1020-6, 2016 07.
Article in English | MEDLINE | ID: mdl-26772622

ABSTRACT

Members of the human CYP3A family of metabolizing enzymes exhibit developmental changes in expression whereby CYP3A7 is expressed in fetal tissues, followed by a transition to expression of CYP3A4 in the first months of life. Despite knowledge about the general pattern of CYP3A activity in human development, the mechanisms that regulate developmental expression remain poorly understood. Epigenetic changes, including cytosine methylation, have been suggested to play a role in the regulation of CYP3A expression. The objective of this study was to investigate changes in cytosine methylation of the CYP3A4 and CYP3A7 genes in human pediatric and prenatal livers. The methylation status of cytosine-phospho-guanine dinucleotides was determined in 16 pediatric liver samples using methyl-seq and confirmed by bisulfite sequencing of 48 pediatric and 34 prenatal liver samples. Samples were separated by age into five groups (prenatal, < 1 year of age, 1.8-6 years, 7-11 years, and 12-17 years). Methyl-seq anaylsis revealed that cytosines in the proximal promoter of CYP3A7 are hypomethylated in neonates compared with adolescents (P < 0.001). In contrast, a cytosine 383 base pair upstream of CYP3A4 is hypermethylated in liver samples from neonates compared with adolescents (P = 0.00001). Developmental changes in methylation of cytosines in the proximal promoters of CYP3A4 and CYP3A7 in pediatric livers were confirmed by bisulfite sequencing. In addition, the methylation status of cytosine in the CYP3A4 and CYP3A7 proximal promoters correlated with changes in developmental expression of mRNA for the two enzymes.


Subject(s)
Aging/genetics , Cytochrome P-450 CYP3A/genetics , Cytosine , DNA Methylation , Epigenesis, Genetic , Liver/enzymology , Promoter Regions, Genetic , Adolescent , Age Factors , Aging/metabolism , Child , Child, Preschool , Cytochrome P-450 CYP3A/metabolism , Female , Gene Expression Regulation, Developmental , Gene Expression Regulation, Enzymologic , Gestational Age , Humans , Infant , Infant, Newborn , Male , RNA, Messenger/genetics , RNA, Messenger/metabolism
8.
J Clin Neurosci ; 20(1): 75-9, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23098391

ABSTRACT

Of the 74 immunocompetent patients diagnosed between July 2004 and June 2011 at the North Shore University Hospital and Long Island Jewish Medical Center with primary central nervous system lymphoma, 71 (95.9%) had diffuse large B-cell lymphomas (DLBCL). The median patient age was 68 years (range: 19-87 years) with a slight male preponderance (1.1:1). The overall median survival time was 21 months. For patients older than 70 years, the median survival time was 8 months while for those 70 years or younger, the median survival time was 27 months (p<0.01). Female patients had a worse prognosis than male patients (p<0.05, median survival time, 17 months compared to 23 months). We had enough data from 52 of these 71 patients to define the lymphomas as either germinal center B-cell-like (GCB) or activated B-cell-like (ABC) DLBCL. Of these 52 patients, 42 (80.8%) had ABC DLBCL while only 10 (19.2%) had GCB DLBCL. The patients in the GCB subgroup seemed to survive longer than the patients in the ABC subgroup, although the difference did not reach statistical significance. No statistically significant difference in overall survival was seen between patients with BCL-6 positive or negative DLBCL; or between patients with BCL-2 positive or negative DLBCL.


Subject(s)
Central Nervous System Neoplasms/ethnology , Central Nervous System Neoplasms/epidemiology , Immunocompetence , Lymphoma, Large B-Cell, Diffuse/ethnology , Lymphoma, Large B-Cell, Diffuse/epidemiology , Adult , Aged , Aged, 80 and over , Central Nervous System Neoplasms/mortality , DNA-Binding Proteins/metabolism , Female , Flow Cytometry , Follow-Up Studies , Humans , Jews , Kaplan-Meier Estimate , Lymphoma, Large B-Cell, Diffuse/mortality , Male , Middle Aged , Neprilysin/metabolism , New York City/epidemiology , New York City/ethnology , Proto-Oncogene Proteins c-bcl-6 , Retrospective Studies , Young Adult
9.
Artif Intell Med ; 56(1): 1-17, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22613029

ABSTRACT

OBJECTIVES: The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. METHODS AND MATERIALS: In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. RESULTS: The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. CONCLUSIONS: The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms.


Subject(s)
Algorithms , Proteins/chemistry , Base Sequence , Binding Sites , Chromatin Immunoprecipitation , MicroRNAs/chemistry , MicroRNAs/metabolism , Molecular Sequence Data , Proteins/metabolism , Sequence Analysis, DNA/methods
10.
Genet Test Mol Biomarkers ; 14(2): 241-7, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20384458

ABSTRACT

Although few examples are formally documented, all polymerase chain reaction-based testing is theoretically vulnerable to allele drop-out (ADO), the failure to amplify one of the two alleles present in a cell. In a clinical setting, this can lead to false positive or negative diagnosis. We investigated the mechanisms leading to ADO in the MECP2 gene in two unrelated female patients undergoing testing for Rett syndrome. Both the patients had two benign DNA variations, c.819G > T and c.1161C > T, that appeared homozygous due to ADO. Bioinformatics analyses indicate that this region of the MECP2 gene is rich in complex tertiary structures called G-quadruplex and i-motifs, the disruption of which by the c.819G > T and c.1161C > T variants leads to preferential amplification of the variant allele. Other examples of ADO likely occur, and consideration of disrupting G-quadruplex and i-motif structures should be given when this phenomenon is unexpected. We identify factors in both the polymerase chain reaction amplification and the sequencing steps that help overcome ADO.


Subject(s)
Methyl-CpG-Binding Protein 2/genetics , Rett Syndrome/diagnosis , Rett Syndrome/genetics , Alleles , Base Sequence , Child , DNA/chemistry , DNA/genetics , DNA Primers/genetics , Female , G-Quadruplexes , Genetic Testing , Homozygote , Humans , Molecular Sequence Data , Nucleic Acid Conformation , Polymerase Chain Reaction/methods , Polymorphism, Single Nucleotide
11.
Article in English | MEDLINE | ID: mdl-19644166

ABSTRACT

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods.


Subject(s)
Algorithms , DNA/chemistry , Monte Carlo Method , Proteins/chemistry , Sequence Analysis/methods , Amino Acid Motifs , Amino Acid Sequence , Animals , Base Sequence , Computer Simulation , Databases, Genetic , Markov Chains , Models, Molecular , Molecular Sequence Data , Nucleic Acid Conformation , Protein Structure, Secondary , Transcription Factors
12.
BMC Bioinformatics ; 10 Suppl 1: S13, 2009 Jan 30.
Article in English | MEDLINE | ID: mdl-19208112

ABSTRACT

BACKGROUND: Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. RESULTS: This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. CONCLUSION: Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.


Subject(s)
DNA/chemistry , Markov Chains , Sequence Alignment/methods , Algorithms , DNA/genetics , Sequence Analysis, DNA/methods
13.
J Proteome Res ; 7(1): 192-201, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18081244

ABSTRACT

Protein conserved domains are distinct units of molecular structure, usually associated with particular aspects of molecular function such as catalysis or binding. These conserved subsequences are often unobserved and thus in need of detection. Motif discovery methods can be used to find these unobserved domains given a set of sequences. This paper presents the data augmentation (DA) framework that unifies a suite of motif-finding algorithms through maximizing the same likelihood function by imputing the unobserved data. The data augmentation refers to those methods that formulate iterative optimization by exploiting the unobserved data. Two categories of maximum likelihood based motif-finding algorithms are illustrated under the DA framework. The first is the deterministic algorithms that are to maximize the likelihood function by performing an iteratively optimal local search in the alignment space. The second is the stochastic algorithms that are to iteratively draw motif location samples via Monte Carlo simulation and simultaneously keep track of the superior solution with the best likelihood. As a result, four DA motif discovery algorithms are described, evaluated, and compared by aligning real and simulated protein sequences.


Subject(s)
Algorithms , Amino Acid Motifs , Conserved Sequence , Structural Homology, Protein , Information Storage and Retrieval , Likelihood Functions , Monte Carlo Method , Protein Structure, Tertiary , Sequence Alignment , Stochastic Processes
14.
Mol Pharm ; 5(1): 3-16, 2008.
Article in English | MEDLINE | ID: mdl-18076137

ABSTRACT

Since the completion of human genome sequencing, cataloging of all genomic functional elements has been one of the challenging problems in bioinformatics. Deciphering cis-regulatory elements in the human genome still remains elusive although much effort has been expended. This paper reviews a suite of methods for two-block motif discovery including mathematical modeling, de novo motif-finding based on multiple local alignment, and genomic sequence scanning method for putative sites. We formulate a general method to address this challenge and compare two major existing algorithms (i.e., greedy local search and Gibbs sampling) implemented to solve the popular two-block structured motif discovery issue. We demonstrate how to use this suite of methods and apply them to human nuclear receptor response elements (i.e., protein binding sites of several relevant nuclear receptors, HNF4alpha, CAR/RXR, and PXR/RXR).


Subject(s)
Algorithms , Computational Biology , Receptors, Cytoplasmic and Nuclear/chemistry , Amino Acid Motifs , Base Sequence , Humans , Molecular Sequence Data , Sequence Homology, Nucleic Acid , Software
15.
J Bioinform Comput Biol ; 5(1): 47-77, 2007 Feb.
Article in English | MEDLINE | ID: mdl-17477491

ABSTRACT

Position weight matrix-based statistical modeling for the identification and characterization of motif sites in a set of unaligned biopolymer sequences is presented. This paper describes and implements a new algorithm, the Stochastic EM-type Algorithm for Motif-finding (SEAM), and redesigns and implements the EM-based motif-finding algorithm called deterministic EM (DEM) for comparison with SEAM, its stochastic counterpart. The gold standard example, cyclic adenosine monophosphate receptor protein (CRP) binding sequences, together with other biological sequences, is used to illustrate the performance of the new algorithm and compare it with other popular motif-finding programs. The convergence of the new algorithm is shown by simulation. The in silico experiments using simulated and biological examples illustrate the power and robustness of the new algorithm SEAM in de novo motif discovery.


Subject(s)
Algorithms , Artificial Intelligence , Biopolymers/chemistry , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Binding Sites , Data Interpretation, Statistical , Likelihood Functions , Markov Chains , Molecular Sequence Data , Protein Binding , Software , Stochastic Processes
16.
J Mol Biol ; 358(2): 597-613, 2006 Apr 28.
Article in English | MEDLINE | ID: mdl-16516920

ABSTRACT

Scaffold or matrix-attachment regions (S/MARs) are thought to be involved in the organization of eukaryotic chromosomes and in the regulation of several DNA functions. Their characteristics are conserved between plants and humans, and a variety of biological activities have been associated with them. The identification of S/MARs within genomic sequences has proved to be unexpectedly difficult, as they do not appear to have consensus sequences or sequence motifs associated with them. We have shown that S/MARs do share a characteristic structural property, they have a markedly high predicted propensity to undergo strand separation when placed under negative superhelical tension. This result agrees with experimental observations, that S/MARs contain base-unpairing regions (BURs). Here, we perform a quantitative evaluation of the association between the ease of stress-induced DNA duplex destabilization (SIDD) and S/MAR binding activity. We first use synthetic oligomers to investigate how the arrangement of localized unpairing elements within a base-unpairing region affects S/MAR binding. The organizational properties found in this way are applied to the investigation of correlations between specific measures of stress-induced duplex destabilization and the binding properties of naturally occurring S/MARs. For this purpose, we analyze S/MAR and non-S/MAR elements that have been derived from the human genome or from the tobacco genome. We find that S/MARs exhibit long regions of extensive destabilization. Moreover, quantitative measures of the SIDD attributes of these fragments calculated under uniform conditions are found to correlate very highly (r2>0.8) with their experimentally measured S/MAR-binding strengths. These results suggest that duplex destabilization may be involved in the mechanisms by which S/MARs function. They suggest also that SIDD properties may be incorporated into an improved computational strategy to search genomic DNA sequences for sites having the necessary attributes to function as S/MARs, and even to estimate their relative binding strengths.


Subject(s)
DNA/metabolism , Matrix Attachment Regions , Nucleic Acid Heteroduplexes/metabolism , Antineoplastic Agents/chemistry , Antineoplastic Agents/metabolism , Chromatin/genetics , DNA/chemistry , Dimerization , Genome, Human , Genome, Plant , Humans , Interferon-beta/chemistry , Interferon-beta/metabolism , Nucleic Acid Conformation , Protein Binding
17.
BMC Bioinformatics ; 7: 76, 2006 Feb 17.
Article in English | MEDLINE | ID: mdl-16503993

ABSTRACT

BACKGROUND: Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites. RESULTS: We introduce the Bipad Server, a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths. CONCLUSION: The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins.


Subject(s)
Chromosome Mapping/methods , DNA-Binding Proteins/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Transcription Factors/genetics , Algorithms , Base Sequence , Binding Sites , Computer Simulation , Internet , Models, Genetic , Molecular Sequence Data , Online Systems , Protein Binding , Sequence Homology, Nucleic Acid
18.
J Comput Biol ; 11(4): 519-43, 2004.
Article in English | MEDLINE | ID: mdl-15579230

ABSTRACT

We present a method for calculating predicted locations and extents of stress-induced DNA duplex destabilization (SIDD) as functions of base sequence and stress level in long DNA molecules. The base pair denaturation energies are assigned individually, so the influences of near neighbors, methylated bases, adducts, or lesions can be included. Sample calculations indicate that copolymeric energetics give results that are close to those derived when full near-neighbor energetics are used; small but potentially informative differences occur only in the calculated SIDD properties of moderately destabilized regions. The method presented here for analyzing long sequences calculates the destabilization properties within windows of fixed length N, with successive windows displaced by an offset distance d(o). The final values of the relevant destabilization parameters for each base pair are calculated as weighted averages of the values computed for each window in which that base pair appears. This approach implicitly assumes that the strength of the direct coupling between remote base pairs that is induced by the imposed stress attenuates with their separation distance. This strategy enables calculations of the destabilization properties of DNA sequences of any length, up to and including complete chromosomes. We illustrate its utility by calculating the destabilization properties of the entire E. coli genomic DNA sequence. A preliminary analysis of the results shows that promoters are associated with SIDD regions in a highly statistically significant manner, suggesting that SIDD attributes may prove useful in the computational prediction of promoter locations in prokaryotes.


Subject(s)
DNA/chemistry , DNA/genetics , Nucleic Acid Conformation , Sequence Analysis, DNA/methods , Base Pairing , Biomechanical Phenomena , Computational Biology , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA, Superhelical/chemistry , DNA, Superhelical/genetics , Drug Stability , Escherichia coli/genetics , Genome, Bacterial , Genomics/methods , Genomics/statistics & numerical data , Models, Biological , Nucleic Acid Denaturation , Sequence Analysis, DNA/statistics & numerical data , Thermodynamics
19.
Nucleic Acids Res ; 32(17): 4979-91, 2004.
Article in English | MEDLINE | ID: mdl-15388800

ABSTRACT

Many multimeric transcription factors recognize DNA sequence patterns by cooperatively binding to bipartite elements composed of half sites separated by a flexible spacer. We developed a novel bipartite algorithm, bipartite pattern discovery (Bipad), which produces a mathematical model based on information maximization or Shannon's entropy minimization principle, for discovery of bipartite sequence patterns. Bipad is a C++ program that applies greedy methods to search the bipartite alignment space and examines the upstream or downstream regions of co-regulated genes, looking for cis-regulatory bipartite patterns. An input sequence file with zero or one site per locus is required, and the left and right motif widths and a range of possible gap lengths must be specified. Bipad can run in either single-block or bipartite pattern search modes, and it is capable of comprehensively searching all four orientations of half-site patterns. Simulation studies showed that the accuracy of this motif discovery algorithm depends on sample size and motif conservation level, but results were independent of background composition. Bipad performed equivalent with or better than other pattern search algorithms in correctly identifying Escherichia coli cyclic AMP receptor protein and Bacillus subtilis sigma factor binding site sequences based on experimentally defined benchmarks. Finally, a new bipartite information weight matrix for vitamin D3 receptor/retinoid X receptor alpha (VDR/RXRalpha) binding sites was derived that comprehensively models the natural variability inherent in these sequence elements.


Subject(s)
Algorithms , DNA/chemistry , DNA/metabolism , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Binding Sites , Cyclic AMP Receptor Protein/metabolism , Entropy , Models, Genetic , Receptors, Calcitriol/metabolism , Receptors, Retinoic Acid/metabolism , Retinoid X Receptors , Sequence Alignment , Sigma Factor/metabolism
20.
Bioinformatics ; 20(9): 1477-9, 2004 Jun 12.
Article in English | MEDLINE | ID: mdl-15130924

ABSTRACT

SUMMARY: WebSIDD is a Web-based service designed to predict locations and extents of stress-induced duplex destabilization (SIDD) that occur in a double-stranded DNA molecule of specified base sequence, on which a specified level of superhelical stress is imposed. The algorithm calculates the approximate equilibrium statistical mechanical distribution of a population of identical molecules among its accessible states. The user inputs the DNA sequence, and the program outputs the calculated transition probability and destabilization energy of each base pair in the sequence. As options, the user can specify the temperature and the level of superhelicity. The values of all structural and energy parameters used in the calculation have been experimentally measured. WebSIDD should prove useful for finding SIDD-susceptible sites in genomic sequences, and correlating their occurrence with locations involved in regulatory and pathological processes. This strategy already has illuminated the roles of SIDD in diverse biological regulatory processes, including transcriptional initiation and termination, and the eukaryotic nuclear scaffold attachments that partition chromosomes into domains. AVAILABILITY: http://orange.genomecenter.ucdavis.edu/benham/sidd/index.html


Subject(s)
Algorithms , DNA/chemistry , DNA/genetics , Internet , Models, Chemical , Nucleic Acid Conformation , Sequence Analysis, DNA/methods , Computing Methodologies , DNA/analysis , DNA Damage , Models, Molecular , Nucleic Acid Denaturation , Online Systems , Oxidative Stress/genetics , Software , Structure-Activity Relationship
SELECTION OF CITATIONS
SEARCH DETAIL
...