Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Methods Mol Biol ; 2104: 245-263, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31953822

RESUMEN

With the increasing importance of big data in biomedicine, skills in data science are a foundation for the individual career development and for the progress of science. This chapter is a practical guide to working with high-throughput biomedical data. It covers how to understand and set up the computing environment, to start a research project with proper and effective data management, and to perform common bioinformatics tasks such as data wrangling, quality control, statistical analysis, and visualization, with examples on metabolomics data. Concepts and tools related to coding and scripting are discussed. Version control, knitr and Jupyter notebooks are important to project management, collaboration, and research reproducibility. Overall, this chapter describes a core set of skills to work in bioinformatics, and can serve as a reference text at the level of a graduate course and interfacing with data science.


Asunto(s)
Biología Computacional/métodos , Ciencia de los Datos , Metabolómica , Programas Informáticos , Nube Computacional , Biología Computacional/normas , Manejo de Datos , Ciencia de los Datos/métodos , Ciencia de los Datos/normas , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Humanos , Metabolómica/normas , Metabolómica/estadística & datos numéricos
2.
Methods Mol Biol ; 2104: 265-311, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31953823

RESUMEN

The daily work in data science involves a set of essential tools: the programming languages Python and R, the version control tool Git and the virtualization tool Docker. Proficiency in at least one programming language is required for data science. R is tied to a computing environment that focuses on statistics, in which many new algorithms in genomics and biomedicine are first published. Python has a root in system administration, and is a superb language for general programming. Version control is critical to managing complex projects, even if software development is not involved. Docker container is becoming a key tool for deployment, portability, and reproducibility. This chapter provides a self-contained practical guide of these topics so that readers can use it as a reference and to plan their training.


Asunto(s)
Biología Computacional/métodos , Ciencia de los Datos , Programas Informáticos , Ciencia de los Datos/métodos , Sistemas de Administración de Bases de Datos , Lenguajes de Programación , Interfaz Usuario-Computador , Navegador Web
3.
Genome Res ; 27(11): 1916-1929, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28855259

RESUMEN

Mobile element insertions (MEIs) represent ∼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.


Asunto(s)
Biología Computacional/métodos , Elementos Transponibles de ADN , Hombre de Neandertal/genética , Pan troglodytes/genética , Animales , Bases de Datos Genéticas , Evolución Molecular , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Secuenciación Completa del Genoma/métodos
4.
BMC Bioinformatics ; 14: 15, 2013 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-23323971

RESUMEN

BACKGROUND: Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. RESULTS: xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. CONCLUSIONS: xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.


Asunto(s)
Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Metabolómica/métodos , Programas Informáticos , Algoritmos , Espectrometría de Masas en Tándem
5.
Genome Res ; 21(6): 830-9, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21460062

RESUMEN

Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. These variants include 819,363 small INDELs that map to human genes. Small INDELs frequently were found in the coding exons of these genes, and several lines of evidence indicate that such variation is a major determinant of human biological diversity. Microarray-based genotyping experiments revealed several interesting observations regarding the population genetics of small INDEL variation. For example, we found that many of our INDELs had high levels of linkage disequilibrium (LD) with both HapMap SNPs and with high-scoring SNPs from genome-wide association studies. Overall, our study indicates that small INDEL variation is likely to be a key factor underlying inherited traits and diseases in humans.


Asunto(s)
Variación Genética , Genoma Humano/genética , Mutación INDEL/genética , Genómica/métodos , Genotipo , Humanos , Análisis por Micromatrices , Medicina de Precisión/métodos
6.
Hum Mol Genet ; 19(R2): R131-6, 2010 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-20858594

RESUMEN

In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs in human genomes is second only to the number of single nucleotide polymorphisms (SNPs), and, in terms of base pairs of variation, INDELs cause similar levels of variation as SNPs. Many of these INDELs map to functionally important sites within human genes, and thus, are likely to influence human traits and diseases. Therefore, small INDEL variation will play a prominent role in personalized medicine.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mutagénesis Insercional/genética , Eliminación de Secuencia/genética , Humanos , Polimorfismo de Nucleótido Simple/genética
7.
Cell ; 141(7): 1253-61, 2010 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-20603005

RESUMEN

Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these young insertions, and the true extent of germline mutagenesis by endogenous human retrotransposons has been difficult to examine. Here, we describe technologies for detecting these young retrotransposon insertions and demonstrate that such insertions indeed are abundant in human populations. We also found that new somatic L1 insertions occur at high frequencies in human lung cancer genomes. Genome-wide analysis suggests that altered DNA methylation may be responsible for the high levels of L1 mobilization observed in these tumors. Our data indicate that transposon-mediated mutagenesis is extensive in human genomes and is likely to have a major impact on human biology and diseases.


Asunto(s)
Elementos Alu , Genoma Humano , Elementos de Nucleótido Esparcido Largo , Mutagénesis , Análisis de Secuencia de ADN/métodos , Neoplasias Encefálicas/genética , Humanos , Neoplasias Pulmonares/genética , Metilación
8.
Genome Res ; 16(9): 1182-90, 2006 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-16902084

RESUMEN

Although many studies have been conducted to identify single nucleotide polymorphisms (SNPs) in humans, few studies have been conducted to identify alternative forms of natural genetic variation, such as insertion and deletion (INDEL) polymorphisms. In this report, we describe an initial map of human INDEL variation that contains 415,436 unique INDEL polymorphisms. These INDELs were identified with a computational approach using DNA re-sequencing traces that originally were generated for SNP discovery projects. They range from 1 bp to 9989 bp in length and are split almost equally between insertions and deletions, relative to the chimpanzee genome sequence. Five major classes of INDELs were identified, including (1) insertions and deletions of single-base pairs, (2) monomeric base pair expansions, (3) multi-base pair expansions of 2-15 bp repeat units, (4) transposon insertions, and (5) INDELs containing random DNA sequences. Our INDELs are distributed throughout the human genome with an average density of one INDEL per 7.2 kb of DNA. Variation hotspots were identified with up to 48-fold regional increases in INDEL and/or SNP variation compared with the chromosomal averages for the same chromosomes. Over 148,000 INDELs (35.7%) were identified within known genes, and 5542 of these INDELs were located in the promoters and exons of genes, where gene function would be expected to be influenced the greatest. All INDELs in this study have been deposited into dbSNP and have been integrated into maps of human genetic variation that are available to the research community.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Polimorfismo Genético , Eliminación de Secuencia , Animales , Humanos , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple
9.
Am J Hum Genet ; 78(4): 671-9, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16532396

RESUMEN

Transposable genetic elements are abundant in the genomes of most organisms, including humans. These endogenous mutagens can alter genes, promote genomic rearrangements, and may help to drive the speciation of organisms. In this study, we identified almost 11,000 transposon copies that are differentially present in the human and chimpanzee genomes. Most of these transposon copies were mobilized after the existence of a common ancestor of humans and chimpanzees, approximately 6 million years ago. Alu, L1, and SVA insertions accounted for >95% of the insertions in both species. Our data indicate that humans have supported higher levels of transposition than have chimpanzees during the past several million years and have amplified different transposon subfamilies. In both species, approximately 34% of the insertions were located within known genes. These insertions represent a form of species-specific genetic variation that may have contributed to the differential evolution of humans and chimpanzees. In addition to providing an initial overview of recently mobilized elements, our collections will be useful for assessing the impact of these insertions on their hosts and for studying the transposition mechanisms of these elements.


Asunto(s)
Elementos Transponibles de ADN , Genoma Humano , Genoma , Pan troglodytes/genética , Animales , Secuencia de Bases , Cartilla de ADN , Humanos , Reacción en Cadena de la Polimerasa , Secuencias Repetitivas de Ácidos Nucleicos
10.
Genetics ; 168(2): 933-51, 2004 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-15514065

RESUMEN

Transposons and transposon-like repetitive elements collectively occupy 44% of the human genome sequence. In an effort to measure the levels of genetic variation that are caused by human transposons, we have developed a new method to broadly detect transposon insertion polymorphisms of all kinds in humans. We began by identifying 606,093 insertion and deletion (indel) polymorphisms in the genomes of diverse humans. We then screened these polymorphisms to detect indels that were caused by de novo transposon insertions. Our method was highly efficient and led to the identification of 605 nonredundant transposon insertion polymorphisms in 36 diverse humans. We estimate that this represents 25-35% of approximately 2075 common transposon polymorphisms in human populations. Because we identified all transposon insertion polymorphisms with a single method, we could evaluate the relative levels of variation that were caused by each transposon class. The average human in our study was estimated to harbor 1283 Alu insertion polymorphisms, 180 L1 polymorphisms, 56 SVA polymorphisms, and 17 polymorphisms related to other forms of mobilized DNA. Overall, our study provides significant steps toward (i) measuring the genetic variation that is caused by transposon insertions in humans and (ii) identifying the transposon copies that produce this variation.


Asunto(s)
Elementos Transponibles de ADN , Variación Genética , Polimorfismo Genético/genética , Genoma Humano , Humanos , Análisis de Secuencia de ADN
11.
Nucleic Acids Res ; 31(16): 4910-6, 2003 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-12907734

RESUMEN

An international effort is underway to generate a comprehensive haplotype map (HapMap) of the human genome represented by an estimated 300,000 to 1 million 'tag' single nucleotide polymorphisms (SNPs). Our analysis indicates that the current human SNP map is not sufficiently dense to support the HapMap project. For example, 24.6% of the genome currently lacks SNPs at the minimal density and spacing that would be required to construct even a conservative tag SNP map containing 300,000 SNPs. In an effort to improve the human SNP map, we identified 140,696 additional SNP candidates using a new bioinformatics pipeline. Over 51,000 of these SNPs mapped to the largest gaps in the human SNP map, leading to significant improvements in these regions. Our SNPs will be immediately useful for the HapMap project, and will allow for the inclusion of many additional genomic intervals in the final HapMap. Nevertheless, our results also indicate that additional SNP discovery projects will be required both to define the haplotype architecture of the human genome and to construct comprehensive tag SNP maps that will be useful for genetic linkage studies in humans.


Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano , Polimorfismo de Nucleótido Simple/genética , Secuencia de Bases , ADN/química , ADN/genética , Bases de Datos de Ácidos Nucleicos , Haplotipos , Humanos , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...