Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Front Plant Sci ; 14: 1039211, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36993855

RESUMEN

Pomegranate has a unique evolutionary history given that different cultivars have eight or nine bivalent chromosomes with possible crossability between the two classes. Therefore, it is important to study chromosome evolution in pomegranate to understand the dynamics of its population. Here, we de novo assembled the Azerbaijani cultivar "Azerbaijan guloyshasi" (AG2017; 2n = 16) and re-sequenced six cultivars to track the evolution of pomegranate and to compare it with previously published de novo assembled and re-sequenced cultivars. High synteny was observed between AG2017, Bhagawa (2n = 16), Tunisia (2n = 16), and Dabenzi (2n = 18), but these four cultivars diverged from the cultivar Taishanhong (2n = 18) with several rearrangements indicating the presence of two major chromosome evolution events. Major presence/absence variations were not observed as >99% of the five genomes aligned across the cultivars, while >99% of the pan-genic content was represented by Tunisia and Taishanhong only. We also revisited the divergence between soft- and hard-seeded cultivars with less structured population genomic data, compared to previous studies, to refine the selected genomic regions and detect global migration routes for pomegranate. We reported a unique admixture between soft- and hard-seeded cultivars that can be exploited to improve the diversity, quality, and adaptability of local pomegranate varieties around the world. Our study adds body knowledge to understanding the evolution of the pomegranate genome and its implications for the population structure of global pomegranate diversity, as well as planning breeding programs aiming to develop improved cultivars.

2.
Opt Express ; 27(22): 32578-32586, 2019 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-31684467

RESUMEN

Exceptionally strong enhancement of the Raman signal exceeding eight orders of magnitude for near-infrared (1064 nm) excitation is demonstrated for an array of dielectric submicron pillars covered by a relatively thick metal layer. The microstructure is designed to support 'spoof' plasmon-polariton excitations with resonant frequencies significantly below the fundamental surface plasmon resonance. Experiments reveal a relatively narrow range of spatial parameters for the optimal resonant scattering enhancement. They include a period close to the excitation wavelength, a specific ratio of the pillar planar size to the period, and optimal heights of both the pillars and the covering silver metal layer. The realized microstructures can be produced by fab-compatible photolithography techniques, and their outstanding sensing possibilities open the venue for the biomedical applications.

3.
BMC Genomics ; 20(1): 399, 2019 May 22.
Artículo en Inglés | MEDLINE | ID: mdl-31117933

RESUMEN

BACKGROUND: The three epidemiologically important Opisthorchiidae liver flukes Opisthorchis felineus, O. viverrini, and Clonorchis sinensis, are believed to harbour similar potencies to provoke hepatobiliary diseases in their definitive hosts, although their populations have substantially different ecogeographical aspects including habitat, preferred hosts, population structure. Lack of O. felineus genomic data is an obstacle to the development of comparative molecular biological approaches necessary to obtain new knowledge about the biology of Opisthorchiidae trematodes, to identify essential pathways linked to parasite-host interaction, to predict genes that contribute to liver fluke pathogenesis and for the effective prevention and control of the disease. RESULTS: Here we present the first draft genome assembly of O. felineus and its gene repertoire accompanied by a comparative analysis with that of O. viverrini and Clonorchis sinensis. We observed both noticeably high heterozygosity of the sequenced individual and substantial genetic diversity in a pooled sample. This indicates that potency of O. felineus population for rapid adaptive response to control and preventive measures of opisthorchiasis is higher than in O. viverrini and C. sinensis. We also have found that all three species are characterized by more intensive involvement of trans-splicing in RNA processing compared to other trematodes. CONCLUSION: All revealed peculiarities of structural organization of genomes are of extreme importance for a proper description of genes and their products in these parasitic species. This should be taken into account both in academic and applied research of epidemiologically important liver flukes. Further comparative genomics studies of liver flukes and non-carcinogenic flatworms allow for generation of well-grounded hypotheses on the mechanisms underlying development of cholangiocarcinoma associated with opisthorchiasis and clonorchiasis as well as species-specific mechanisms of these diseases.


Asunto(s)
Cricetinae/parasitología , Cyprinidae/parasitología , Genoma de los Helmintos , Genómica/métodos , Proteínas del Helminto/genética , Opistorquiasis/epidemiología , Opisthorchis/genética , Secuencia de Aminoácidos , Animales , Clonorquiasis/epidemiología , Clonorquiasis/genética , Clonorquiasis/parasitología , Clonorchis sinensis/genética , Opistorquiasis/genética , Opistorquiasis/parasitología , Homología de Secuencia
4.
FASEB J ; 33(7): 8161-8173, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30970224

RESUMEN

Human prefrontal cortex (PFC) is associated with broad individual variabilities in functions linked to personality, social behaviors, and cognitive functions. The phenotype variabilities associated with brain functions can be caused by genetic or epigenetic factors. The interactions between these factors in human subjects is, as of yet, poorly understood. The heterogeneity of cerebral tissue, consisting of neuronal and nonneuronal cells, complicates the comparative analysis of gene activities in brain specimens. To approach the underlying neurogenomic determinants, we performed a deep analysis of open chromatin-associated histone methylation in PFC neurons sorted from multiple human individuals in conjunction with whole-genome and transcriptome sequencing. Integrative analyses produced novel unannotated neuronal genes and revealed individual-specific chromatin "blueprints" of neurons that, in part, relate to genetic background. Surprisingly, we observed gender-dependent epigenetic signals, implying that gender may contribute to the chromatin variabilities in neurons. Finally, we found epigenetic, allele-specific activation of the testis-specific gene nucleoporin 210 like (NUP210L) in brain in some individuals, which we link to a genetic variant occurring in <3% of the human population. Recently, the NUP210L locus has been associated with intelligence and mathematics ability. Our findings highlight the significance of epigenetic-genetic footprinting for exploring neurologic function in a subject-specific manner.-Gusev, F. E., Reshetov, D. A., Mitchell, A. C., Andreeva, T. V., Dincer, A., Grigorenko, A. P., Fedonin, G., Halene, T., Aliseychik, M., Goltsov, A. Y., Solovyev, V., Brizgalov, L., Filippova, E., Weng, Z., Akbarian, S., Rogaev, E. I. Epigenetic-genetic chromatin footprinting identifies novel and subject-specific genes active in prefrontal cortex neurons.


Asunto(s)
Cromatina/metabolismo , Cognición/fisiología , Epigénesis Genética/fisiología , Neuronas/metabolismo , Corteza Prefrontal/metabolismo , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Femenino , Sitios Genéticos/fisiología , Histonas/metabolismo , Humanos , Lactante , Recién Nacido , Masculino , Metilación , Persona de Mediana Edad , Neuronas/citología , Proteínas de Complejo Poro Nuclear/biosíntesis , Corteza Prefrontal/citología , Embarazo
5.
Bioinformatics ; 35(16): 2730-2737, 2019 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-30601980

RESUMEN

MOTIVATION: Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences. RESULTS: In this work, we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the transcription start site inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set, which iteratively improves the model's discriminative ability. Our method significantly outperforms the previously developed promoter prediction programs by considerably reducing the number of false-positive predictions. We have achieved error-per-1000-bp rate of 0.02 and have 0.31 errors per correct prediction, which is significantly better than the results of other human promoter predictors. AVAILABILITY AND IMPLEMENTATION: The developed method is available as a web server at http://www.cbrc.kaust.edu.sa/PromID/.


Asunto(s)
Aprendizaje Profundo , Regiones Promotoras Genéticas , Genoma Humano , Genómica , Humanos , Sitio de Iniciación de la Transcripción
6.
Opt Express ; 26(17): 22519-22527, 2018 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-30130943

RESUMEN

Apart from the main plasmon-polariton resonance of the surface-enhanced Raman scattering (SERS) occurring at 480 - 530 nm, an additional resonance was observed for substrates with two silver layers separated by a dielectric layer which support extra plasmon modes with decreased group velocities. The novel SERS resonance is shifted towards lower energies and has comparable amplitude, its exact energy position being determined by the thickness of the dielectric interlayer. The experimental findings provide a ground for the engineering of SERS-substrates with the spectral position of the additional resonance matched with the photon energy of the pump laser over a fairly wide range of laser wavelengths.

7.
PLoS One ; 12(11): e0187243, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29141011

RESUMEN

Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 "promoter-specific" transcription factors), those that bind preferentially to the [0,500] region (282 "5' UTR-specific" TFs), and 207 of the "promiscuous" transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.


Asunto(s)
Eucariontes/genética , Nucleótidos/metabolismo , Regiones Promotoras Genéticas , Algoritmos , Sitios de Unión , Metilación de ADN , Evolución Molecular , Oryza/genética , Factores de Transcripción/metabolismo
8.
Biol Direct ; 12(1): 21, 2017 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-28886750

RESUMEN

BACKGROUND: Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. RESULTS: Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. CONCLUSIONS: We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. REVIEWERS: This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.


Asunto(s)
Arecaceae/genética , Genoma de Planta , Modelos Genéticos , Anotación de Secuencia Molecular , Biología Computacional/métodos , Genes de Plantas , Programas Informáticos
9.
Methods Mol Biol ; 1613: 311-331, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28849566

RESUMEN

It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.


Asunto(s)
Bacterias/metabolismo , Proteínas Bacterianas/metabolismo , Minería de Datos/métodos , Redes y Vías Metabólicas , Proteínas Bacterianas/química , Bases de Datos de Proteínas , Aprendizaje Automático , Anotación de Secuencia Molecular , Dominios Proteicos , Proteómica/métodos
10.
PLoS One ; 12(2): e0171410, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28158264

RESUMEN

Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.


Asunto(s)
Células Eucariotas/metabolismo , Redes Neurales de la Computación , Células Procariotas/metabolismo , Regiones Promotoras Genéticas/genética , Animales , Biología Computacional/métodos , Humanos , Análisis de Secuencia de ADN
11.
Nucleic Acids Res ; 45(8): e65, 2017 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-28082394

RESUMEN

Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.


Asunto(s)
Genoma de Planta , Redes Neurales de la Computación , Proteínas de Plantas/genética , Regiones Promotoras Genéticas , ARN Polimerasa II/genética , Sitio de Iniciación de la Transcripción , Arabidopsis/genética , Arabidopsis/metabolismo , Expresión Génica , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , ARN Polimerasa II/metabolismo , Análisis de Secuencia de ADN , Programas Informáticos
12.
Nucleic Acids Res ; 45(D1): D1075-D1081, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899667

RESUMEN

We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma de Planta , Mutación INDEL , Oryza/genética , Polimorfismo de Nucleótido Simple , Motor de Búsqueda , Programas Informáticos , Alelos , Biología Computacional/métodos , Frecuencia de los Genes , Sitios Genéticos , Genómica/métodos , Genotipo , Interfaz Usuario-Computador , Navegador Web
13.
PLoS One ; 11(7): e0158896, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27390860

RESUMEN

The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.


Asunto(s)
Minería de Datos/métodos , Bases de Datos de Proteínas , Anotación de Secuencia Molecular/métodos , Células Procariotas/metabolismo , Proteoma/genética , Proteoma/metabolismo
14.
Genome Announc ; 3(5)2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26472828

RESUMEN

The emergence and spread of multidrug-resistant (MDR) bacteria have been regarded as major challenges among health care-associated infections worldwide. Here, we report the draft genome sequence of an MDR Stenotrophomonas maltophilia strain isolated in 2014 from King Abdulla Medical City, Makkah, Saudi Arabia.

15.
Bioinformatics ; 31(21): 3544-5, 2015 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-26142184

RESUMEN

UNLABELLED: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively. AVAILABILITY AND IMPLEMENTATION: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com. CONTACT: solovictor@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Regiones Promotoras Genéticas , Programas Informáticos , Animales , Sitios de Unión , Genómica , Humanos , Motivos de Nucleótidos , Plantas/genética , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
16.
Bioinformatics ; 31(21): 3421-8, 2015 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-26177965

RESUMEN

MOTIVATION: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. RESULTS: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction. AVAILABILITY AND IMPLEMENTATION: Karect is available at: http://aminallam.github.io/karect. CONTACT: amin.allam@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL/genética , Mutagénesis Insercional/genética , Análisis de Secuencia de ADN/métodos , Eliminación de Secuencia , Mapeo Cromosómico , Biología Computacional/métodos , Genoma Humano , Humanos
17.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25273068

RESUMEN

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Asunto(s)
Genoma , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Simulación por Computador , Conjuntos de Datos como Asunto , Estudio de Asociación del Genoma Completo , Humanos , Mamíferos/genética , Filogenia , Reproducibilidad de los Resultados
18.
Nature ; 510(7503): 109-14, 2014 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-24847885

RESUMEN

The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we present the draft genome of Pleurobrachia bachei, Pacific sea gooseberry, together with ten other ctenophore transcriptomes, and show that they are remarkably distinct from other animal genomes in their content of neurogenic, immune and developmental genes. Our integrative analyses place Ctenophora as the earliest lineage within Metazoa. This hypothesis is supported by comparative analysis of multiple gene families, including the apparent absence of HOX genes, canonical microRNA machinery, and reduced immune complement in ctenophores. Although two distinct nervous systems are well recognized in ctenophores, many bilaterian neuron-specific genes and genes of 'classical' neurotransmitter pathways either are absent or, if present, are not expressed in neurons. Our metabolomic and physiological data are consistent with the hypothesis that ctenophore neural systems, and possibly muscle specification, evolved independently from those in other animals.


Asunto(s)
Ctenóforos/genética , Evolución Molecular , Genoma/genética , Sistema Nervioso , Animales , Ctenóforos/clasificación , Ctenóforos/inmunología , Ctenóforos/fisiología , Genes del Desarrollo , Genes Homeobox , Mesodermo/metabolismo , Metabolómica , MicroARNs , Datos de Secuencia Molecular , Músculos/fisiología , Sistema Nervioso/metabolismo , Neuronas/metabolismo , Neurotransmisores , Filogenia , Transcriptoma/genética
19.
BMC Genomics ; 15: 86, 2014 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-24479613

RESUMEN

BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.


Asunto(s)
Abejas/genética , Genes de Insecto , Animales , Composición de Base , Bases de Datos Genéticas , Secuencias Repetitivas Esparcidas/genética , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Péptidos/análisis , Análisis de Secuencia de ARN , Homología de Secuencia de Aminoácido
20.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21926179

RESUMEN

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Asunto(s)
Genoma/fisiología , Genómica/métodos , Análisis de Secuencia de ADN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA