Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinformatics ; 36(24): 5582-5589, 2021 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-33399819

RESUMEN

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Bioinformatics ; 36(22-23): 5537-5538, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33300997

RESUMEN

SUMMARY: Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering >10× size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access. We demonstrate its effectiveness with the DiscovEHR and UK Biobank whole-exome sequencing cohorts. AVAILABILITY AND IMPLEMENTATION: Apache-licensed reference implementation: github.com/mlin/spVCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Programas Informáticos , Secuencia de Bases , Genotipo , Células Germinativas
3.
Gigascience ; 9(10)2020 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-33057676

RESUMEN

BACKGROUND: Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. For many research laboratories, this presents an obstacle, especially in resource-limited environments. FINDINGS: We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. Furthermore, IDseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics that are critical for data interpretation. IDseq was designed with the specific intent of detecting novel pathogens. Here, we benchmark novel virus detection capability using both synthetically evolved viral sequences and real-world samples, including IDseq analysis of a nasopharyngeal swab sample acquired and processed locally in Cambodia from a tourist from Wuhan, China, infected with the recently emergent SARS-CoV-2. CONCLUSION: The IDseq Portal reduces the barrier to entry for mNGS data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mNGS datasets for both known and novel pathogens.


Asunto(s)
Betacoronavirus/genética , Nube Computacional , Infecciones por Coronavirus/virología , Metagenoma , Metagenómica/métodos , Neumonía Viral/virología , Betacoronavirus/patogenicidad , COVID-19 , Infecciones por Coronavirus/diagnóstico , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Pandemias , Neumonía Viral/diagnóstico , SARS-CoV-2 , Programas Informáticos
4.
J Ultrasound Med ; 39(7): 1335-1342, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31995242

RESUMEN

OBJECTIVES: To determine patient and procedural risk factors for major complications in ultrasound (US)-guided random renal core biopsy. METHODS: Random renal biopsies performed by radiologists in the US department at a single institution between 2014 and 2018 were retrospectively reviewed. The patient's age, sex, race, and estimated glomerular filtration rate (eGFR) were recorded. The biopsy approach, needle gauge, length of cores, number of throws, and presence of a color flow tract were recorded. Outcome data included minor and major complications. Associations between variables were tested with χ2 analyses and univariable/multivariable logistic regression models. RESULTS: A total of 231 biopsies (167 native and 64 allografts) were reviewed. There was no significant difference in the sex, age, race, or eGFR between native and allograft groups. The overall rate for any complication was 18.2%, with a 4.3% rate of major complications, which was significantly greater in native compared to allograft biopsies (6% versus 0%; P = .045). A risk analysis in native biopsies only showed that major complications were significantly associated with a low eGFR such that patients with stage 4 or 5 kidney disease had higher odds of complications (odds ratio [95% confidence interval]: stage 4, 9.405 [1.995-44.338]; P = .0393; stage 5, 10.749 [2.218-52.080]; P = .0203) than patients with normal function (eGFR >60 mL/min). The presence of a color flow tract portended a 10.7 times greater risk of having any complication (95% confidence interval, 4.595-24.994; P < .001). Other procedural factors were not significantly associated with complications. CONCLUSIONS: There is an increased risk of major complications in US-guided random native kidney biopsy in patients with a low eGFR (<30 mL/min) and a patent color flow tract in the immediate postbiopsy setting.


Asunto(s)
Biopsia Guiada por Imagen , Ultrasonografía Intervencional , Biopsia , Biopsia con Aguja Gruesa , Humanos , Riñón/diagnóstico por imagen , Estudios Retrospectivos
5.
J Ultrasound Med ; 38(3): 581-586, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30043431

RESUMEN

OBJECTIVES: Image-guided tissue sampling in the workup of suspected lymphoma can be performed by core needle biopsy (CNB) or CNB with fine-needle aspiration (FNA). We compared the yield of clinically actionable diagnoses between these methods of tissue sampling. METHODS: All ultrasound-guided percutaneous peripheral lymph node biopsies from 2010 to 2017 at a single institution were retrospectively reviewed for biopsy type (CNB versus CNB + FNA), prior diagnosis of lymphoma, size of the target lymph node, number of cores, length of core specimens, and pathologic diagnosis. Lymphoma and lymphoid tissue were included; metastatic disease and nonlymphoid tissue were excluded. An oncologist specializing in lymphoma independently determined whether an actionable diagnosis could be made with the pathologic results in the context of the patient's medical record. χ2 analyses and univariable/multivariable logistic regression models were used for statistical analyses. RESULTS: Of 578 lymph node biopsies, 306 (53%) had a prior diagnosis of lymphoma; 273 (47%) were CNB, and 305 (53%) were CNB + FNA. There was no significant difference between biopsy types (CNB versus CNB + FNA) in the number of cores (median [25th, 75th percentiles], 3 [3, 4] versus 4 [3, 4]; P = .47) or total length of tissue (4.1 [2.5, 6.1] versus 3.7 [2.3, 6] cm; P = .09). There was no difference in obtaining an actionable diagnosis between biopsy types after controlling for a known history of lymphoma (P = .271) or after controlling for the number of core specimens (P = .826). CONCLUSIONS: In cases of suspected lymphoma, CNB without FNA was sufficient to obtain an actionable diagnosis.


Asunto(s)
Ganglios Linfáticos/diagnóstico por imagen , Ganglios Linfáticos/patología , Linfoma/diagnóstico por imagen , Linfoma/patología , Ultrasonografía Intervencional/métodos , Adulto , Anciano , Anciano de 80 o más Años , Biopsia con Aguja Fina , Biopsia con Aguja Gruesa , Femenino , Humanos , Biopsia Guiada por Imagen/métodos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Adulto Joven
6.
F1000Res ; 8: 1751, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-34386196

RESUMEN

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

7.
Nat Biotechnol ; 36(9): 875-879, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30125266

RESUMEN

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.


Asunto(s)
Variación Genética , Simulación por Computador , ADN/genética , Humanos
8.
Mol Biol Evol ; 33(12): 3108-3132, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27604222

RESUMEN

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.


Asunto(s)
Anopheles/genética , Anopheles/metabolismo , Codón de Terminación , Terminación de la Cadena Péptídica Traduccional , Animales , Evolución Biológica , Codón , Drosophila melanogaster , Evolución Molecular , Genómica , Sistemas de Lectura Abierta , Filogenia , Biosíntesis de Proteínas , Ribosomas/genética , Ribosomas/metabolismo
9.
Genome Biol ; 16: 38, 2015 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-25853568

RESUMEN

BACKGROUND: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. RESULTS: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. CONCLUSIONS: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.


Asunto(s)
Evolución Molecular , Genoma Viral , Sistemas de Lectura Abierta/genética , Virus/genética , Codón/genética , Secuencia Conservada , Ebolavirus/genética , Virus de la Hepatitis B/genética , Humanos , Virus Lassa/genética , MicroARNs/genética , Filogenia , Poliovirus/genética , Alineación de Secuencia , Mutación Silenciosa/genética , Virus del Nilo Occidental/genética
10.
Abdom Imaging ; 40(6): 1666-74, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25488345

RESUMEN

OBJECTIVE: To determine the effectiveness of the CT histogram method to characterize indeterminate adrenal nodules above 10 Hounsfield units (HU) on noncontrast CT. MATERIALS AND METHODS: Retrospective review of clinical CT data from January 2005 through 2008 identified 194 indeterminate adrenal nodules (>10 HU on noncontrast CT) in 175 patients. 20 nodules in 18 patients were excluded due to large standard deviation (SD > 30) of HU values. Of the remaining 174 nodules, 131 were classified as benign lipid-poor nodules based on size stability for ≥1 year (104), in- and opposed-phase MRI (17), adrenal washout CT (3), or biopsy (7). 43 were classified as malignant by size increase over a short time (30), avid FDG uptake on PET/CT (15), or biopsy (5). Histogram analysis was performed by drawing a circular region of interest on all adrenal nodules. Mean attenuation, total number of pixels, number of negative pixels, and percentage of negative pixels were recorded for each nodule. RESULTS: At the threshold value of >10% negative pixels, 59/131 benign nodules were correctly characterized, but 1/43 malignant nodules was falsely characterized as benign (sensitivity 45%, specificity 98%, positive predictive value 98%). With a slightly higher threshold value of >15% negative pixels, there were no false benign judgments. 36 nodules had more than 15% negative pixels, all of which were benign (sensitivity 27%, specificity 100%, positive predictive value 100%). In the subgroup of benign nodules measuring 11-20 HU, 80% and 54% were identified with threshold values of >10% and >15% negative pixels, respectively. CONCLUSION: The CT histogram method with a threshold value of >10% negative pixels can identify many benign adrenal nodules with attenuation values >10 HU on unenhanced CT with extremely high specificity. A threshold of >15% negative pixels can achieve 100% specificity. This method is highly robust provided very "noisy" CT examinations (SD > 30) are eliminated.


Asunto(s)
Neoplasias de las Glándulas Suprarrenales/diagnóstico por imagen , Glándulas Suprarrenales/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sensibilidad y Especificidad
12.
Radiology ; 265(1): 151-7, 2012 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22798224

RESUMEN

PURPOSE: To determine which measurement of donor renal size on computed tomographic (CT) angiograms has the greatest correlation with renal function preoperatively in the donor and postoperatively in the transplant recipient. MATERIALS AND METHODS: Informed consent was waived for this retrospective HIPAA-compliant study approved by the institutional review board. Renal length, total volume, and cortical volume were measured on renal donor CT angiograms in 111 patients. Preoperative serum creatinine values for donors and postoperative creatinine values for recipients at hospital discharge and 6, 12, 24, and 36 months after transplant were collected, and estimated glomerular filtration rate (eGFR) was calculated. Correlation coefficients with 95% confidence intervals (CIs) were obtained for renal measures and donor eGFR and for renal measures adjusted to recipient body habitus and posttransplant creatinine level in the recipient. Thresholds were set for adjusted length and volumes, and the odds ratio (OR) for creatinine level less than 1.5 mg/dL at 36 months was calculated. RESULTS: Renal volumes and length were correlated with donor eGFR (r=0.58 [95% CI: 0.44, 0.69] for cortical volume, 0.56 [95% CI: 0.42, 0.68] for total volume, and 0.43 [95% CI: 0.27, 0.57] for renal length). All three measures, adjusted to recipient body habitus, were correlated with recipient renal function from discharge (r=-0.41 to -0.43) up to 36 months after transplantation (r=-0.33 to -0.41). By using a threshold of 1.5 for cortical volume to recipient weight, 2.25 for total volume to recipient weight, and 0.175 for renal length to recipient weight, the odds of creatinine level greater than 1.5 mg/dL were four times as great for smaller kidney-to-recipient weight ratios, a statistically significant pattern for cortical volume (OR, 4.07; 95% CI: 1.10, 15.09) but not total volume (OR, 4.24; 95% CI: 0.90, 20.01) or renal length (OR, 4.08; 95% CI: 0.48-34.29). CONCLUSION: Renal length and volumes correlated with recipient renal function up to 36 months after transplant. A low ratio of cortical volume to recipient weight was associated with diminished renal function at 36 months after transplant.


Asunto(s)
Angiografía/métodos , Riñón/diagnóstico por imagen , Trasplante de Hígado , Tomografía Computarizada por Rayos X/métodos , Adolescente , Adulto , Anciano , Biomarcadores/sangre , Intervalos de Confianza , Creatinina/sangre , Femenino , Tasa de Filtración Glomerular , Humanos , Pruebas de Función Renal , Masculino , Persona de Mediana Edad , Nefrectomía , Tamaño de los Órganos , Interpretación de Imagen Radiográfica Asistida por Computador , Reproducibilidad de los Resultados , Estudios Retrospectivos
13.
Genome Res ; 22(3): 577-91, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22110045

RESUMEN

Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.


Asunto(s)
Desarrollo Embrionario/genética , ARN no Traducido/genética , Pez Cebra/embriología , Pez Cebra/genética , Animales , Cromatina , Análisis por Conglomerados , Biología Computacional/métodos , Expresión Génica , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Genómica , Ratones , Sistemas de Lectura Abierta , Especificidad de Órganos/genética , Transcripción Genética
14.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-21993624

RESUMEN

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Asunto(s)
Evolución Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animales , Enfermedad , Exones/genética , Genómica , Salud , Humanos , Anotación de Secuencia Molecular , Filogenia , ARN/clasificación , ARN/genética , Selección Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN
15.
Genome Res ; 21(11): 1916-28, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21994248

RESUMEN

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.


Asunto(s)
Genoma , Mamíferos/genética , Sistemas de Lectura Abierta/genética , Selección Genética , Animales , Composición de Base , Secuencia de Bases , Codón , Codón Iniciador , Biología Computacional , Secuencia Conservada , Elementos de Facilitación Genéticos , Exones , Orden Génico , Genes BRCA1 , Proteínas de Homeodominio/genética , Humanos , MicroARNs/metabolismo , Datos de Secuencia Molecular , Tasa de Mutación , Conformación de Ácido Nucleico , Nucleosomas/metabolismo , Iniciación de la Cadena Peptídica Traduccional , Empalme del ARN , Alineación de Secuencia , Transcripción Genética
16.
Genome Res ; 21(12): 2096-113, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21994247

RESUMEN

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence. We report an expanded set of 283 readthrough candidates, including 16 double-readthrough candidates; these were manually curated to rule out alternatives such as A-to-I editing, alternative splicing, dicistronic translation, and selenocysteine incorporation. We report experimental evidence of translation using GFP tagging and mass spectrometry for several readthrough regions. We find that the set of readthrough candidates differs from other genes in length, composition, conservation, stop codon context, and in some cases, conserved stem-loops, providing clues about readthrough regulation and potential mechanisms. Lastly, we expand our studies beyond Drosophila and find evidence of abundant readthrough in several other insect species and one crustacean, and several readthrough candidates in nematode and human, suggesting that functionally important translational stop codon readthrough is significantly more prevalent in Metazoa than previously recognized.


Asunto(s)
Codón de Terminación/fisiología , Genes de Insecto/fisiología , Sistemas de Lectura Abierta/fisiología , Biosíntesis de Proteínas/fisiología , Animales , Proteínas de Drosophila/biosíntesis , Proteínas de Drosophila/genética , Drosophila melanogaster , Humanos
17.
Bioinformatics ; 27(13): i275-82, 2011 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-21685081

RESUMEN

MOTIVATION: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. RESULTS: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. AVAILABILITY AND IMPLEMENTATION: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlin@mit.edu; manoli@mit.edu.


Asunto(s)
Drosophila melanogaster/genética , Genómica/métodos , Sistemas de Lectura Abierta , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Drosophila/clasificación , Drosophila/genética , Perfilación de la Expresión Génica , Mamíferos/genética , Schizosaccharomyces/genética
18.
Nat Genet ; 43(7): 621-9, 2011 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-21642992

RESUMEN

Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.


Asunto(s)
Genes cdc/fisiología , Neoplasias/genética , Regiones Promotoras Genéticas/genética , ARN no Traducido/genética , Transcripción Genética/genética , Apoptosis , Biomarcadores/metabolismo , Ciclo Celular/fisiología , Diferenciación Celular , Inmunoprecipitación de Cromatina , Daño del ADN , Perfilación de la Expresión Génica , Humanos , Inmunoprecipitación , Datos de Secuencia Molecular , Neoplasias/patología , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN Mensajero/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Activación Transcripcional
19.
Science ; 332(6032): 930-6, 2011 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-21511999

RESUMEN

The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.


Asunto(s)
Genoma Fúngico , Schizosaccharomyces/genética , Centrómero/genética , Centrómero/fisiología , Centrómero/ultraestructura , Elementos Transponibles de ADN , Evolución Molecular , Perfilación de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Genes del Tipo Sexual de los Hongos , Genómica , Glucosa/metabolismo , Meiosis , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Filogenia , ARN sin Sentido/genética , ARN de Hongos/genética , ARN Interferente Pequeño/genética , ARN no Traducido/genética , Elementos Reguladores de la Transcripción , Schizosaccharomyces/crecimiento & desarrollo , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo , Análisis de Secuencia de ADN , Especificidad de la Especie , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcripción Genética
20.
PLoS One ; 6(2): e17034, 2011 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-21340033

RESUMEN

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.


Asunto(s)
Bases de Datos de Ácidos Nucleicos/normas , Anotación de Secuencia Molecular/normas , Proyectos de Investigación , Análisis de Secuencia de ADN/normas , Animales , Mapeo Cromosómico/métodos , Genoma/genética , Genómica/métodos , Humanos , Mamíferos/genética , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de ADN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...