Búsqueda | Portal Regional de la BVS

Fine-scale subpopulation detection via an SNP-based unsupervised method: A case study on the 1000 Genomes Project resources.

Chaichoompu, Kridsadakorn; Wilantho, Alisa; Wangkumhang, Pongsakorn; Tongsima, Sissades; Cavadas, Bruno; Pereira, Luísa; Van Steen, Kristel.

Pac Symp Biocomput ; 28: 245-256, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36540981

RESUMEN

SNP-based information is used in several existing clustering methods to detect shared genetic ancestry or to identify population substructure. Here, we present a methodology, called IPCAPS for unsupervised population analysis using iterative pruning. Our method, which can capture fine-level structure in populations, supports ordinal data, and thus can readily be applied to SNP data. Although haplotypes may be more informative than SNPs, especially in fine-level substructure detection contexts, the haplotype inference process often remains too computationally intensive. In this work, we investigate the scale of the structure we can detect in populations without knowledge about haplotypes; our simulated data do not assume the availability of haplotype information while comparing our method to existing tools for detecting fine-level population substructures. We demonstrate experimentally that IPCAPS can achieve high accuracy and can outperform existing tools in several simulated scenarios. The fine-level structure detected by IPCAPS on an application to the 1000 Genomes Project data underlines its subject heterogeneity.

Asunto(s)

Biología Computacional , Polimorfismo de Nucleótido Simple , Humanos , Haplotipos , Análisis por Conglomerados

EpiScanpy: integrated single-cell epigenomic analysis.

Danese, Anna; Richter, Maria L; Chaichoompu, Kridsadakorn; Fischer, David S; Theis, Fabian J; Colomé-Tatché, Maria.

Nat Commun ; 12(1): 5228, 2021 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-34471111

RESUMEN

EpiScanpy is a toolkit for the analysis of single-cell epigenomic data, namely single-cell DNA methylation and single-cell ATAC-seq data. To address the modality specific challenges from epigenomics data, epiScanpy quantifies the epigenome using multiple feature space constructions and builds a nearest neighbour graph using epigenomic distance between cells. EpiScanpy makes the many existing scRNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities, including methods for common clustering, dimension reduction, cell type identification and trajectory learning techniques, as well as an atlas integration tool for scATAC-seq datasets. The toolkit also features numerous useful downstream functions, such as differential methylation and differential openness calling, mapping epigenomic features of interest to their nearest gene, or constructing gene activity matrices using chromatin openness. We successfully benchmark epiScanpy against other scATAC-seq analysis tools and show its outperformance at discriminating cell types.

Asunto(s)

Epigenómica/métodos , Análisis de la Célula Individual/métodos , Cromatina , Secuenciación de Inmunoprecipitación de Cromatina , Análisis por Conglomerados , Metilación de ADN , Humanos , Análisis de Secuencia de ARN

A different view on fine-scale population structure in Western African populations.

Chaichoompu, Kridsadakorn; Abegaz, Fentaw; Cavadas, Bruno; Fernandes, Verónica; Müller-Myhsok, Bertram; Pereira, Luísa; Van Steen, Kristel.

Hum Genet ; 139(1): 45-59, 2020 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-31630246

RESUMEN

Due to its long genetic evolutionary history, Africans exhibit more genetic variation than any other population in the world. Their genetic diversity further lends itself to subdivisions of Africans into groups of individuals with a genetic similarity of varying degrees of granularity. It remains challenging to detect fine-scale structure in a computationally efficient and meaningful way. In this paper, we present a proof-of-concept of a novel fine-scale population structure detection tool with Western African samples. These samples consist of 1396 individuals from 25 ethnic groups (two groups are African American descendants). The strategy is based on a recently developed tool called IPCAPS. IPCAPS, or Iterative Pruning to CApture Population Structure, is a genetic divisive clustering strategy that enhances iterative pruning PCA, is robust to outliers and does not require a priori computation of haplotypes. Our strategy identified in total 12 groups and 6 groups were revealed as fine-scale structure detected in the samples from Cameroon, Gambia, Mali, Southwest USA, and Barbados. Our finding helped to explain evolutionary processes in the analyzed West African samples and raise awareness for fine-scale structure resolution when conducting genome-wide association and interaction studies.

Asunto(s)

Población Negra/genética , Etnicidad/genética , Variación Genética , Genética de Población , Estudio de Asociación del Genoma Completo , Haplotipos , Programas Informáticos , África Occidental/etnología , Humanos

IPCAPS: an R package for iterative pruning to capture population structure.

Chaichoompu, Kridsadakorn; Abegaz, Fentaw; Tongsima, Sissades; Shaw, Philip James; Sakuntabhai, Anavaj; Pereira, Luísa; Van Steen, Kristel.

Source Code Biol Med ; 14: 2, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-30936940

RESUMEN

BACKGROUND: Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic ancestry, improvements can be made especially when fine-scale population structure is the target. RESULTS: This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-scale population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipPCA) framework that systematically assigns individuals to genetically similar subgroups. In each iteration, our tool is able to detect and eliminate outliers, hereby avoiding severe misclassification errors. CONCLUSIONS: IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated as well. The tool can also be applied in patient sub-phenotyping contexts. IPCAPS is developed in R and is freely available from http://bio3.giga.ulg.ac.be/ipcaps.

Principals about principal components in statistical genetics.

Abegaz, Fentaw; Chaichoompu, Kridsadakorn; Génin, Emmanuelle; Fardo, David W; König, Inke R; Mahachie John, Jestinah M; Van Steen, Kristel.

Brief Bioinform ; 20(6): 2200-2216, 2019 11 27.

Artículo en Inglés | MEDLINE | ID: mdl-30219892

RESUMEN

Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.

Asunto(s)

Modelos Estadísticos , Análisis de Componente Principal , Animales , Humanos

Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure.

Wangkumhang, Pongsakorn; Shaw, Philip James; Chaichoompu, Kridsadakorn; Ngamphiw, Chumpol; Assawamakin, Anunchai; Nuinoon, Manit; Sripichai, Orapan; Svasti, Saovaros; Fucharoen, Suthat; Praphanphoj, Verayuth; Tongsima, Sissades.

PLoS One ; 8(11): e79522, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24223962

RESUMEN

There is considerable ethno-linguistic and genetic variation among human populations in Asia, although tracing the origins of this diversity is complicated by migration events. Thailand is at the center of Mainland Southeast Asia (MSEA), a region within Asia that has not been extensively studied. Genetic substructure may exist in the Thai population, since waves of migration from southern China throughout its recent history may have contributed to substantial gene flow. Autosomal SNP data were collated for 438,503 markers from 992 Thai individuals. Using the available self-reported regional origin, four Thai subpopulations genetically distinct from each other and from other Asian populations were resolved by Neighbor-Joining analysis using a 41,569 marker subset. Using an independent Principal Components-based unsupervised clustering approach, four major MSEA subpopulations were resolved in which regional bias was apparent. A major ancestry component was common to these MSEA subpopulations and distinguishes them from other Asian subpopulations. On the other hand, these MSEA subpopulations were admixed with other ancestries, in particular one shared with Chinese. Subpopulation clustering using only Thai individuals and the complete marker set resolved four subpopulations, which are distributed differently across Thailand. A Sino-Thai subpopulation was concentrated in the Central region of Thailand, although this constituted a minority in an otherwise diverse region. Among the most highly differentiated markers which distinguish the Thai subpopulations, several map to regions known to affect phenotypic traits such as skin pigmentation and susceptibility to common diseases. The subpopulation patterns elucidated have important implications for evolutionary and medical genetics. The subpopulation structure within Thailand may reflect the contributions of different migrants throughout the history of MSEA. The information will also be important for genetic association studies to account for population-structure confounding effects.

Asunto(s)

Pueblo Asiatico/genética , Pueblo Asiatico/etnología , Genética de Población , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Tailandia/etnología

Iterative pruning PCA improves resolution of highly structured populations.

Intarapanich, Apichart; Shaw, Philip J; Assawamakin, Anunchai; Wangkumhang, Pongsakorn; Ngamphiw, Chumpol; Chaichoompu, Kridsadakorn; Piriyapongsa, Jittima; Tongsima, Sissades.

BMC Bioinformatics ; 10: 382, 2009 Nov 23.

Artículo en Inglés | MEDLINE | ID: mdl-19930644

RESUMEN

BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.

Asunto(s)

Biología Computacional/métodos , Población/genética , Análisis de Componente Principal/métodos , Algoritmos , Animales , Variación Genética , Genética de Población , Humanos , Modelos Genéticos

WASP: a Web-based Allele-Specific PCR assay designing tool for detecting SNPs and mutations.

Wangkumhang, Pongsakorn; Chaichoompu, Kridsadakorn; Ngamphiw, Chumpol; Ruangrit, Uttapong; Chanprasert, Juntima; Assawamakin, Anunchai; Tongsima, Sissades.

BMC Genomics ; 8: 275, 2007 Aug 14.

Artículo en Inglés | MEDLINE | ID: mdl-17697334

RESUMEN

BACKGROUND: Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results. RESULTS: This work presents a web-based AS primer design application called WASP. This tool can efficiently design AS primers for human SNPs as well as mutations. To assist scientists with collecting necessary information about target polymorphisms, this tool provides a local SNP database containing over 10 million SNPs of various populations from public domain databases, namely NCBI dbSNP, HapMap and JSNP respectively. This database is tightly integrated with the tool so that users can perform the design for existing SNPs without going off the site. To guarantee specificity of AS primers, the proposed system incorporates a primer specificity enhancement technique widely used in experiment protocol. In particular, WASP makes use of different destabilizing effects by introducing one deliberate 'mismatch' at the penultimate (second to last of the 3'-end) base of AS primers to improve the resulting AS primers. Furthermore, WASP offers graphical user interface through scalable vector graphic (SVG) draw that allow users to select SNPs and graphically visualize designed primers and their conditions. CONCLUSION: WASP offers a tool for designing AS primers for both SNPs and mutations. By integrating the database for known SNPs (using gene ID or rs number), this tool facilitates the awkward process of getting flanking sequences and other related information from public SNP databases. It takes into account the underlying destabilizing effect to ensure the effectiveness of designed primers. With user-friendly SVG interface, WASP intuitively presents resulting designed primers, which assist users to export or to make further adjustment to the design. This software can be freely accessed at http://bioinfo.biotec.or.th/WASP.

Asunto(s)

Alelos , Internet , Mutación , Reacción en Cadena de la Polimerasa/métodos , Polimorfismo de Nucleótido Simple , Gráficos por Computador , Interfaz Usuario-Computador

Speedup bioinformatics applications on multicore-based processor using vectorizing and multithreading strategies.

Chaichoompu, Kridsadakorn; Kittitornkun, Surin; Tongsima, Sissades.

Bioinformation ; 2(5): 182-4, 2007 Dec 30.

Artículo en Inglés | MEDLINE | ID: mdl-18305826

RESUMEN

Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware. A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package. The Single Instruction Multiple Data-stream (SIMD) paradigm is heavily utilized in this class of processors. Nevertheless, most popular compilers including Microsoft Visual C/C++ 6.0, x86 gnu C-compiler gcc do not automatically create SIMD code which can fully utilize the advancement of these processors. To harness the power of the new multicore architecture certain compiler techniques must be considered. This paper presents a generic compiling strategy to assist the compiler in improving the performance of bioinformatics applications written in C/C++. The proposed framework contains 2 main steps: multithreading and vectorizing strategies. After following the strategies, the application can achieve higher speedup by taking the advantage of multicore architecture technology. Due to the extremely fast interconnection networking among multiple cores, it is suggested that the proposed optimization could be more appropriate than making use of parallelization on a small cluster computer which has larger network latency and lower bandwidth.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA