Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Mol Biol Evol ; 36(6): 1281-1293, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30912801

RESUMEN

In species with chromosomal sex determination, X chromosomes are predicted to evolve faster than autosomes because of positive selection on recessive alleles or weak purifying selection. We investigated X chromosome evolution in Stegodyphus spiders that differ in mating system, sex ratio, and population dynamics. We assigned scaffolds to X chromosomes and autosomes using a novel method based on flow cytometry of sperm cells and reduced representation sequencing. We estimated coding substitution patterns (dN/dS) in a subsocial outcrossing species (S. africanus) and its social inbreeding and female-biased sister species (S. mimosarum), and found evidence for faster-X evolution in both species. X chromosome-to-autosome diversity (piX/piA) ratios were estimated in multiple populations. The average piX/piA estimates of S. africanus (0.57 [95% CI: 0.55-0.60]) was lower than the neutral expectation of 0.75, consistent with more hitchhiking events on X-linked loci and/or a lower X chromosome mutation rate, and we provide evidence in support of both. The social species S. mimosarum has a significantly higher piX/piA ratio (0.72 [95% CI: 0.65-0.79]) in agreement with its female-biased sex ratio. Stegodyphus mimosarum also have different piX/piA estimates among populations, which we interpret as evidence for recurrent founder events. Simulations show that recurrent founder events are expected to decrease the piX/piA estimates in S. mimosarum, thus underestimating the true effect of female-biased sex ratios. Finally, we found lower synonymous divergence on X chromosomes in both species, and the male-to-female substitution ratio to be higher than 1, indicating a higher mutation rate in males.


Asunto(s)
Evolución Biológica , Arañas/genética , Cromosoma X/genética , Animales , Variación Genética , Masculino , Dinámica Poblacional , Razón de Masculinidad
2.
BMC Genomics ; 15: 439, 2014 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-24906298

RESUMEN

BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.


Asunto(s)
Vectores Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Picea/genética , Clonación Molecular , Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Programas Informáticos
3.
BMC Bioinformatics ; 14 Suppl 2: S22, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23368905

RESUMEN

Comparative methods for RNA secondary structure prediction use evolutionary information from RNA alignments to increase prediction accuracy. The model is often described in terms of stochastic context-free grammars (SCFGs), which generate a probability distribution over secondary structures. It is, however, unclear how this probability distribution changes as a function of the input alignment. As prediction programs typically only return a single secondary structure, better characterisation of the underlying probability space of RNA secondary structures is of great interest. In this work, we show how to efficiently compute the information entropy of the probability distribution over RNA secondary structures produced for RNA alignments by a phylo-SCFG, and implement it for the PPfold model. We also discuss interpretations and applications of this quantity, including how it can clarify reasons for low prediction reliability scores. PPfold and its source code are available from http://birc.au.dk/software/ppfold/.


Asunto(s)
Algoritmos , Modelos Teóricos , Conformación de Ácido Nucleico , ARN/química , Secuencia de Bases , Biología Computacional/métodos , Entropía , Probabilidad , Programas Informáticos
4.
BMC Genomics ; 14: 75, 2013 Feb 02.
Artículo en Inglés | MEDLINE | ID: mdl-23375136

RESUMEN

BACKGROUND: Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. RESULTS: Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. CONCLUSIONS: The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.


Asunto(s)
Genómica , Hevea/genética , Análisis de Secuencia , Alérgenos/genética , Resistencia a la Enfermedad/genética , Evolución Molecular , Proteínas F-Box/genética , Genoma de Planta/genética , Haploidia , Hevea/inmunología , Hevea/metabolismo , Látex/metabolismo , Anotación de Secuencia Molecular , Filogenia , Reguladores del Crecimiento de las Plantas/genética , Goma/metabolismo , Transducción de Señal/genética , Factores de Transcripción/genética , Madera/metabolismo
5.
Bioinformatics ; 28(20): 2691-2, 2012 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-22877864

RESUMEN

UNLABELLED: PPfold is a multi-threaded implementation of the Pfold algorithm for RNA secondary structure prediction. Here we present a new version of PPfold, which extends the evolutionary analysis with a flexible probabilistic model for incorporating auxiliary data, such as data from structure probing experiments. Our tests show that the accuracy of single-sequence secondary structure prediction using experimental data in PPfold 3.0 is comparable to RNAstructure. Furthermore, alignment structure prediction quality is improved even further by the addition of experimental data. PPfold 3.0 therefore has the potential of producing more accurate predictions than it was previously possible. AVAILABILITY AND IMPLEMENTATION: PPfold 3.0 is available as a platform-independent Java application and can be downloaded from http://birc.au.dk/software/ppfold.


Asunto(s)
ARN/química , Programas Informáticos , Algoritmos , Modelos Estadísticos , Conformación de Ácido Nucleico , Filogenia
6.
BMC Bioinformatics ; 12: 103, 2011 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-21501497

RESUMEN

BACKGROUND: The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The pfold model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present PPfold, a multithreaded version of pfold, which is capable of predicting the structure of large RNA alignments accurately on practical timescales. RESULTS: We have distributed both the phylogenetic calculations and the inside-outside algorithm in PPfold, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of pfold by implementing an extended-exponent datatype, enabling PPfold to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, PPfold is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of PPfold using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction. CONCLUSIONS: PPfold is the first parallelized comparative RNA structure prediction algorithm to date. Based on the pfold model, PPfold is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.


Asunto(s)
Algoritmos , Conformación de Ácido Nucleico , ARN/química , Alineación de Secuencia/métodos , Biología Computacional/métodos , Genoma Viral , VIH-1/genética , Filogenia , Análisis de Secuencia de ARN/métodos , Procesos Estocásticos
7.
Nucleic Acids Res ; 36(4): 1113-9, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18096620

RESUMEN

The inherent properties of DNA as a stable polymer with unique affinity for partner molecules determined by the specific Watson-Crick base pairing makes it an ideal component in self-assembling structures. This has been exploited for decades in the design of a variety of artificial substrates for investigations of DNA-interacting enzymes. More recently, strategies for synthesis of more complex two-dimensional (2D) and 3D DNA structures have emerged. However, the building of such structures is still in progress and more experiences from different research groups and different fields of expertise are necessary before complex DNA structures can be routinely designed for the use in basal science and/or biotechnology. Here we present the design, construction and structural analysis of a covalently closed and stable 3D DNA structure with the connectivity of an octahedron, as defined by the double-stranded DNA helices that assembles from eight oligonucleotides with a yield of approximately 30%. As demonstrated by Small Angle X-ray Scattering and cryo-Transmission Electron Microscopy analyses the eight-stranded DNA structure has a central cavity larger than the apertures in the surrounding DNA lattice and can be described as a nano-scale DNA cage, Hence, in theory it could hold proteins or other bio-molecules to enable their investigation in certain harmful environments or even allow their organization into higher order structures.


Asunto(s)
ADN/química , Nanoestructuras/química , Electroforesis en Gel de Poliacrilamida , Microscopía Electrónica de Transmisión , Modelos Moleculares , Conformación de Ácido Nucleico , Oligonucleótidos/química , Dispersión del Ángulo Pequeño , Difracción de Rayos X
8.
BMC Bioinformatics ; 10: 247, 2009 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-19671163

RESUMEN

BACKGROUND: The population mutation rate (theta) remains one of the most fundamental parameters in genetics, ecology, and evolutionary biology. However, its accurate estimation can be seriously compromised when working with error prone data such as expressed sequence tags, low coverage draft sequences, and other such unfinished products. This study is premised on the simple idea that a random sequence error due to a chance accident during data collection or recording will be distributed within a population dataset as a singleton (i.e., as a polymorphic site where one sampled sequence exhibits a unique base relative to the common nucleotide of the others). Thus, one can avoid these random errors by ignoring the singletons within a dataset. RESULTS: This strategy is implemented under an infinite sites model that focuses on only the internal branches of the sample genealogy where a shared polymorphism can arise (i.e., a variable site where each alternative base is represented by at least two sequences). This approach is first used to derive independently the same new Watterson and Tajima estimators of theta, as recently reported by Achaz 1 for error prone sequences. It is then used to modify the recent, full, maximum-likelihood model of Knudsen and Miyamoto 2, which incorporates various factors for experimental error and design with those for coalescence and mutation. These new methods are all accurate and fast according to evolutionary simulations and analyses of a real complex population dataset for the California seahare. CONCLUSION: In light of these results, we recommend the use of these three new methods for the determination of theta from error prone sequences. In particular, we advocate the new maximum likelihood model as a starting point for the further development of more complex coalescent/mutation models that also account for experimental error and design.


Asunto(s)
Biología Computacional/métodos , Genética de Población , Mutación , Algoritmos , Densidad de Población , Alineación de Secuencia
9.
Genetics ; 176(4): 2335-42, 2007 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-17565962

RESUMEN

Coalescent theory provides a powerful framework for estimating the evolutionary, demographic, and genetic parameters of a population from a small sample of individuals. Current coalescent models have largely focused on population genetic factors (e.g., mutation, population growth, and migration) rather than on the effects of experimental design and error. This study develops a new coalescent/mutation model that accounts for unobserved polymorphisms due to missing data, sequence errors, and multiple reads for diploid individuals. The importance of accommodating these effects of experimental design and error is illustrated with evolutionary simulations and a real data set from a population of the California sea hare. In particular, a failure to account for sequence errors can lead to overestimated mutation rates, inflated coalescent times, and inappropriate conclusions about the population. This current model can now serve as a starting point for the development of newer models with additional experimental and population genetic factors. It is currently implemented as a maximum-likelihood method, but this model may also serve as the basis for the development of Bayesian approaches that incorporate experimental design and error.


Asunto(s)
Genética de Población/estadística & datos numéricos , Modelos Genéticos , Mutación , Algoritmos , Animales , Aplysia/genética , Teorema de Bayes , Bases de Datos Genéticas , Evolución Molecular , Funciones de Verosimilitud , Polimorfismo Genético
10.
Nucleic Acids Res ; 31(13): 3423-8, 2003 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-12824339

RESUMEN

RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at http://www.daimi.au.dk/~compbio/pfold.


Asunto(s)
Algoritmos , ARN/química , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Evolución Molecular , Internet , Modelos Moleculares , Conformación de Ácido Nucleico , Reproducibilidad de los Resultados , Alineación de Secuencia , Programas Informáticos , Procesos Estocásticos , Factores de Tiempo
11.
Nucleic Acids Res ; 31(1): 363-4, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520023

RESUMEN

The Signal Recognition Particle Database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html and http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html assists in the better understanding of the structure and function of the signal recognition particle (SRP), a ribonucleoprotein complex that recognizes signal sequences as they emerge from the ribosome. SRPDB provides alphabetically and phylogenetically ordered lists of SRP RNA and SRP protein sequences. The SRP RNA alignment emphasizes base pairs supported by comparative sequence analysis to derive accurate SRP RNA secondary structures for each species. This release includes a total of 181 SRP RNA sequences, 7 protein SRP9, 11 SRP14, 31 SRP19, 113 SRP54 (Ffh), 9 SRP68 and 12 SRP72 sequences. There are 44 new sequences of the SRP receptor alpha subunit and its FtsY homolog (a total of 99 entries). Additional data are provided for polypeptides with established or potential roles in SRP-mediated protein targeting, such as the beta subunit of SRP receptor, Flhf, Hbsu and cpSRP43. Also available are motifs for the identification of new SRP RNA sequences, 2D representations, three-dimensional models in PDB format, and links to the high-resolution structures of several SRP components. New to this version of SRPDB is the introduction of a relational database system and a SRP RNA prediction server (SRP-Scan) which allows the identification of SRP RNAs within genome sequences and also generates secondary structure diagrams.


Asunto(s)
Bases de Datos Genéticas , Partícula de Reconocimiento de Señal/química , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Conformación de Ácido Nucleico , Filogenia , ARN Citoplasmático Pequeño/química , ARN Citoplasmático Pequeño/genética , Ribonucleoproteínas/química , Ribonucleoproteínas/genética , Partícula de Reconocimiento de Señal/genética , Partícula de Reconocimiento de Señal/fisiología
12.
Nucleic Acids Res ; 31(1): 446-7, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520048

RESUMEN

Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Bioinformatics Research Center, Aarhus, Denmark (http://www.bioinf.au.dk/tmRDB/). The tmRDB collects and distributes information relevant to the study of tmRNA. In trans-translation, this molecule combines properties of tRNA and mRNA and binds several proteins to form the tmRNP. Related RNPs are likely to be functional in all bacteria. In this release of tmRDB, 186 new entries from 10 bacterial groups for a total of 274 tmRNA sequences have been added. Lists of the tmRNAs and the corresponding tmRNA-encoded tag-peptides are presented in alphabetical and phylogenetic order. The tmRNA sequences are aligned manually, assisted by computational tools, to determine base pairs supported by comparative sequence analysis. The tmRNA alignment, available in a variety of formats, provides the basis for the secondary and tertiary structure of each tmRNA molecule. Three-dimensional models of the tmRNAs and their associated proteins in PDB format give evidence for the recent progress that has been made in the understanding of tmRNP structure and function.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN Bacteriano/química , Bacterias/clasificación , Bacterias/genética , Conformación de Ácido Nucleico , Filogenia , ARN Bacteriano/fisiología , ARN Mensajero/química , ARN de Transferencia/química , Alineación de Secuencia
13.
BMC Evol Biol ; 5: 21, 2005 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-15743518

RESUMEN

BACKGROUND: The f factor is a new parameter for accommodating the influence of both the starting and ending states in the rate matrices of "generalized weighted frequencies" (+gwF) models for sequence evolution. In this study, we derive an expected value for f, starting from a nearly neutral model of weak selection, and then assess the biological interpretation of this factor with evolutionary simulations. RESULTS: An expected value of f = 0.5 (i.e., equal dependency on the starting and ending states) is derived for sequences that are evolving under the nearly neutral model of this study. However, this expectation is sensitive to violations of its underlying assumptions as illustrated with the evolutionary simulations. CONCLUSION: This study illustrates how selection, drift, and mutation at the population level can be linked to the rate matrices of models for sequence evolution to derive an expected value of f. However, as f is affected by a number of factors that limit its biological interpretation, this factor should normally be estimated as a free parameter rather than fixed a priori in a +gwF analysis.


Asunto(s)
Evolución Molecular , Evolución Biológica , Interpretación Estadística de Datos , Frecuencia de los Genes , Flujo Genético , Modelos Biológicos , Modelos Genéticos , Modelos Estadísticos , Modelos Teóricos , Mutación , Selección Genética
14.
J Mol Biol ; 333(2): 453-60, 2003 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-14529629

RESUMEN

This work presents a novel pairwise statistical alignment method based on an explicit evolutionary model of insertions and deletions (indels). Indel events of any length are possible according to a geometric distribution. The geometric distribution parameter, the indel rate, and the evolutionary time are all maximum likelihood estimated from the sequences being aligned. Probability calculations are done using a pair hidden Markov model (HMM) with transition probabilities calculated from the indel parameters. Equations for the transition probabilities make the pair HMM closely approximate the specified indel model. The method provides an optimal alignment, its likelihood, the likelihood of all possible alignments, and the reliability of individual alignment regions. Human alpha and beta-hemoglobin sequences are aligned, as an illustration of the potential utility of this pair HMM approach.


Asunto(s)
Evolución Biológica , ADN/genética , Cadenas de Markov , Recombinación Genética , Alineación de Secuencia/estadística & datos numéricos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Variación Genética , Hemoglobinas/genética , Humanos , Funciones de Verosimilitud , Modelos Genéticos , Datos de Secuencia Molecular , Filogenia , Homología de Secuencia de Aminoácido
15.
J Mol Biol ; 336(2): 369-79, 2004 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-14757051

RESUMEN

The untranslated leader of the dimeric HIV-1 RNA genome is folded into a complex structure that plays multiple and essential roles in the viral replication cycle. Here, we have investigated secondary and tertiary structural elements within the 5' 744 nucleotides of the HIV-1 genome using a combination of bioinformatics, enzymatic probing, native gel electrophoresis, and UV-crosslinking experiments. We used a recently developed RNA folding algorithm (Pfold) to predict the common secondary structure of an alignment of 20 divergent HIV-1 sequences. Combining this analysis with biochemical data, we present a secondary structure model for the entire 744 nucleotide fragment, which incorporates previously recognized and novel structural elements. In particular, our data provided strong evidence for a long-distance interaction between the region encompassing the AUG Gag initiation codon and an upstream region and we demonstrate that this feature is highly conserved in distantly related human and animal retroviruses. To obtain information about tertiary interactions we applied an intramolecular UV-crosslinking strategy and identified a novel tertiary interaction within the PBS hairpin structure.


Asunto(s)
Genoma Viral , VIH-1/genética , Conformación de Ácido Nucleico , ARN Viral/química , ARN Viral/metabolismo , Algoritmos , Secuencia de Bases , Codón Iniciador/genética , Biología Computacional , Datos de Secuencia Molecular , Ensayos de Protección de Nucleasas , Nucleótidos/metabolismo , Filogenia , Estabilidad del ARN , ARN Viral/genética , Rayos Ultravioleta
16.
Genetics ; 164(4): 1261-9, 2003 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-12930737

RESUMEN

Functional constraints on proteins limit their evolutionary rates at specific sites. These constraints allow for the interpretation of conserved residues and sites with a rate change as those most likely underlying the functional similarities and differences among protein subfamilies, respectively. This study describes new likelihood-ratio tests (LRTs) that complement existing ones for the identification of both conserved and rate change sites. These identifications are validated by the recovery of residues that are known from existing biochemical and structural information to be critical for the functional similarities and differences among carbonic anhydrases (CAs). In combination with this other information, these LRTs also support a unique antioxidant defense role for the puzzling CA III. As illustrated by the CAs, these LRTs, in combination with other biological evidence, offer a powerful and cost-effective approach for testing hypotheses, making predictions, and designing experiments in protein functional studies.


Asunto(s)
Anhidrasas Carbónicas/genética , Secuencia Conservada/genética , Evolución Molecular , Proteínas/genética , Secuencia de Aminoácidos/genética , Animales , Antioxidantes/metabolismo , Anhidrasas Carbónicas/química , Simulación por Computador , Humanos , Cinética , Funciones de Verosimilitud , Modelos Moleculares , Datos de Secuencia Molecular , Filogenia , Proteínas/metabolismo , Reproducibilidad de los Resultados , Proyectos de Investigación , Homología de Secuencia de Aminoácido
17.
PLoS One ; 9(5): e98187, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24878701

RESUMEN

Formalin-fixed, paraffin-embedded (FFPE) tissues are an invaluable resource for clinical research. However, nucleic acids extracted from FFPE tissues are fragmented and chemically modified making them challenging to use in molecular studies. We analysed 23 fresh-frozen (FF), 35 FFPE and 38 paired FF/FFPE specimens, representing six different human tissue types (bladder, prostate and colon carcinoma; liver and colon normal tissue; reactive tonsil) in order to examine the potential use of FFPE samples in next-generation sequencing (NGS) based retrospective and prospective clinical studies. Two methods for DNA and three methods for RNA extraction from FFPE tissues were compared and were found to affect nucleic acid quantity and quality. DNA and RNA from selected FFPE and paired FF/FFPE specimens were used for exome and transcriptome analysis. Preparations of DNA Exome-Seq libraries was more challenging (29.5% success) than that of RNA-Seq libraries, presumably because of modifications to FFPE tissue-derived DNA. Libraries could still be prepared from RNA isolated from two-decade old FFPE tissues. Data were analysed using the CLC Bio Genomics Workbench and revealed systematic differences between FF and FFPE tissue-derived nucleic acid libraries. In spite of this, pairwise analysis of DNA Exome-Seq data showed concordance for 70-80% of variants in FF and FFPE samples stored for fewer than three years. RNA-Seq data showed high correlation of expression profiles in FF/FFPE pairs (Pearson Correlations of 0.90 +/- 0.05), irrespective of storage time (up to 244 months) and tissue type. A common set of 1,494 genes was identified with expression profiles that were significantly different between paired FF and FFPE samples irrespective of tissue type. Our results are promising and suggest that NGS can be used to study FFPE specimens in both prospective and retrospective archive-based studies in which FF specimens are not available.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Neoplasias/patología , Adhesión en Parafina , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Fijación del Tejido , Criopreservación , ADN/genética , ADN/aislamiento & purificación , Exoma/genética , Formaldehído/farmacología , Perfilación de la Expresión Génica , Humanos , Proteínas Proto-Oncogénicas/genética , Proteínas Proto-Oncogénicas B-raf/genética , Proteínas Proto-Oncogénicas p21(ras) , ARN/genética , ARN/aislamiento & purificación , Proteínas ras/genética
18.
Gene ; 511(2): 195-201, 2012 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-23026207

RESUMEN

The Japanese eel is a much appreciated research object and very important for Asian aquaculture; however, its genomic resources are still limited. We have used a streamlined bioinformatics pipeline for the de novo assembly of the genome sequence of the Japanese eel from raw Illumina sequence reads. The total assembled genome has a size of 1.15 Gbp, which is divided over 323,776 scaffolds with an N50 of 52,849 bp, a minimum scaffold size of 200 bp and a maximum scaffold size of 1.14 Mbp. Direct comparison of a representative set of scaffolds revealed that all the Hox genes and their intergenic distances are almost perfectly conserved between the European and the Japanese eel. The first draft genome sequence of an organism strongly catalyzes research progress in multiple fields. Therefore, the Japanese eel genome sequence will provide a rich resource of data for all scientists working on this important fish species.


Asunto(s)
Anguilla/genética , Genoma , Animales , Biología Computacional
19.
Genes (Basel) ; 1(2): 263-82, 2010 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-24710045

RESUMEN

This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms.

20.
ACS Nano ; 4(3): 1367-76, 2010 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-20146442

RESUMEN

The assembly, structure, and stability of DNA nanocages with the shape of truncated octahedra have been studied. The cages are composed of 12 double-stranded B-DNA helices interrupted by single-stranded linkers of thymidines of varying length that constitute the truncated corners of the structure. The structures assemble with a high efficiency in a one-step procedure, compared to previously published structures of similar complexity. The structures of the cages were determined by small-angle X-ray scattering. With increasing linker length, there is a systematic increase of the cage size and decrease of the twist angle of the double helices with respect to the symmetry planes of the cage structure. In the present study, we demonstrate the length of the single-stranded linker regions, which impose a certain degree of flexibility to the structure, to be the important determinant for efficient assembly. The linker length can be decreased to three thymidines without affecting assembly yield or the overall structural characteristics of the DNA cages. A linker length of two thymidines represents a sharp cutoff abolishing cage assembly. This is supported by energy minimization calculations suggesting substantial hydrogen bond deformation in a cage with linkers of two thymidines.


Asunto(s)
ADN de Cadena Simple/química , Nanoestructuras/química , Secuencia de Bases , ADN de Cadena Simple/genética , Electroforesis en Gel de Poliacrilamida , Enlace de Hidrógeno , Modelos Moleculares , Conformación de Ácido Nucleico , Dispersión del Ángulo Pequeño , Termodinámica , Timidina/química , Difracción de Rayos X
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA