Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
País como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 135, 2023 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-37020193

RESUMO

BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS: We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS: The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .


Assuntos
Estudo de Associação Genômica Ampla , Software , Animais , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genoma , Fluxo de Trabalho
2.
Anim Genet ; 54(2): 199-206, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36683294

RESUMO

As an important source of genomic variation, copy number variation (CNV) contributes to environmental adaptation in worldwide buffaloes. Despite this importance, CNV divergence between swamp buffaloes and river buffaloes has not been studied previously. Here, we report 21 152 CNV regions (CNVRs) in 141 buffaloes of 20 breeds detected through multiple CNV calling strategies. Only 248 CNVRs were shared between river buffalo and swamp buffalo, reflecting great variation of CNVRs between the two subspecies. Population structure analysis based on CNVs successfully separated the two buffalo subspecies. We further assessed CNV divergence by calculating FST for genome-wide CNVs. Totally, we identified 110 significantly divergent CNV segments and 44 putatively selected genes between river buffaloes and swamp buffaloes. In particular, LALBA, a key gene controlling milk production in cattle, presented a highly differentiated CNV in the promoter region, which makes it a strong functional candidate gene for differences between swamp buffaloes and river buffaloes in traits related to milk production. Our study provides useful information of CNVs in buffaloes, which may help explain the genetic differences between the two subspecies.


Assuntos
Bison , Búfalos , Variações do Número de Cópias de DNA , Animais , Bovinos , Bison/genética , Búfalos/genética , Genoma , Fenótipo
3.
Hum Genomics ; 12(1): 25, 2018 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-29743099

RESUMO

The analysis of population structure has many applications in medical and population genetic research. Such analysis is used to provide clear insight into the underlying genetic population substructure and is a crucial prerequisite for any analysis of genetic data. The analysis involves grouping individuals into subpopulations based on shared genetic variations. The most widely used markers to study the variation of DNA sequences between populations are single nucleotide polymorphisms. Data preprocessing is a necessary step to assess the quality of the data and to determine which markers or individuals can reasonably be included in the analysis. After preprocessing, several methods can be utilized to uncover population substructure, which can be categorized into two broad approaches: parametric and nonparametric. Parametric approaches use statistical models to infer population structure and assign individuals into subpopulations. However, these approaches suffer from many drawbacks that make them impractical for large datasets. In contrast, nonparametric approaches do not suffer from these drawbacks, making them more viable than parametric approaches for analyzing large datasets. Consequently, nonparametric approaches are increasingly used to reveal population substructure. Thus, this paper reviews and discusses the nonparametric approaches that are available for population structure analysis along with some implications to resolve challenges.


Assuntos
Genética Populacional , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Genótipo , Humanos , Análise de Componente Principal , Análise de Sequência de DNA
4.
Infect Dis Poverty ; 9(1): 59, 2020 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-32487156

RESUMO

BACKGROUND: Beijing sub-pedigree 2 (BSP2) and T sub-lineage 6 (TSL6) are two clades belonging to Beijing and T family of Mycobacterium tuberculosis (MTB), respectively, defined by Bayesian population structure analysis based on 24-loci mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR). Globally, over 99% of BSP2 and 89% of TSL6 isolates were distributed in Chongqing, suggesting their possible local adaptive evolution. The objective of this paper is to explore whether BSP2 and TSL6 originated by their local adaptive evolution from the specific isolates of Beijing and T families in Chongqing. METHODS: The genotyping data of 16 090 MTB isolates were collected from laboratory collection, published literatures and SITVIT database before subjected to Bayesian population structure analysis based on 24-loci MIRU-VNTR. Spacer Oligonucleotide Forest (Spoligoforest) and 24-loci MIRU-VNTR-based minimum spanning tree (MST) were used to explore their phylogenetic pathways, with Bayesian demographic analysis for exploring the recent demographic change of TSL6. RESULTS: Phylogenetic analysis suggested that BSP2 and TSL6 in Chongqing may evolve from BSP4 and TSL5, respectively, which were locally predominant in Tibet and Jiangsu, respectively. Spoligoforest showed that Beijing and T families were genetically distant, while the convergence of the MIRU-VNTR pattern of BSP2 and TSL6 was revealed by WebLogo. The demographic analysis concluded that the recent demographic change of TSL6 might take 111.25 years. CONCLUSIONS: BSP2 and TSL6 clades might originate from BSP4 and TSL5, respectively, by their local adaptive evolution in Chongqing. Our study suggests MIRU-VNTR be combined with other robust markers for a more comprehensive genotyping approach, especially for families of clades with the same MIRU-VNTR pattern.


Assuntos
Variação Genética , Repetições Minissatélites , Mycobacterium tuberculosis/genética , Teorema de Bayes , Evolução Biológica , China
5.
Front Plant Sci ; 11: 310, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32265963

RESUMO

In the main distribution area the genetic pattern of silver birch is dominated by two haplotypes: haplotype A located in the western and north-western Europe, and haplotype C in eastern and southeastern Europe, characterized by high levels of neutral genetic variability within populations, and low differentiation among populations. Information about the amount and structure of genetic variation in the southern marginal areas, representing rear populations left during the expansion of this species from southern glacial refugia, are lacking. The general aim of the study was to investigate the existence of the climatic characteristics typical of the environmental niche of the species, jointly to genetic organization, variation and gene flow, in marginal populations on the Italian Apennines and Greek Southern Rhodope and compare them with populations of the southern part of the main distribution range on the Alps and Balkans. Genetic analysis was performed using nuclear microsatellites loci on 311 trees sampled from 14 populations. Environmental analysis was performed on the multivariate analysis of derived climatic variables. The allelic pattern was analyzed to assess genetic diversity, population diversity and differentiation, population structure and gene flow. The geographic and environmental peripherality did not always match, with some Apennine sites at higher elevation enveloped in the environmental niche. In the peripheral populations on the Apennines, we observed a lower genetic diversity and higher differentiation, with evident genetic barriers detected around these sites. These characteristics were not shown in the marginal Greek populations. Unexpectedly, the southern Italian marginal populations showed genetic links with the Greek and central area of the distribution range. The Greek populations also showed evident gene flow with the Alpine and Balkan areas. The disparity of results in these two marginal areas show that it is not the geographic peripherality or even the ecological marginality that may shape the genetic diversity and structure of marginal populations, but primarily their position as part of the continuous range or as disjunct populations. This outcome suggests different considerations on how to manage their gene pools and the role that these rear populations can play in maintaining the biodiversity of this species.

6.
Front Microbiol ; 8: 371, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28337187

RESUMO

At present, the most used methods for Klebsiella pneumoniae subtyping are multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE). However, the discriminatory power of MLST could not meet the need for distinguishing outbreak and non-outbreak isolates and the PFGE is time-consuming and labor-intensive. A core genome multilocus sequence typing (cgMLST) scheme for whole-genome sequence-based typing of K. pneumoniae was developed for solving the disadvantages of these traditional molecular subtyping methods. Firstly, we used the complete genome of K. pneumoniae strain HKUOPLC as the reference genome and 907 genomes of K. pneumoniae download from NCBI database as original genome dataset to determine cgMLST target genes. A total of 1,143 genes were retained as cgMLST target genes. Secondly, we used 26 K. pneumoniae strains from a nosocomial infection outbreak to evaluate the cgMLST scheme. cgMLST enabled clustering of outbreak strains with <10 alleles difference and unambiguous separation from unrelated outgroup strains. Moreover, cgMLST revealed that there may be several sub-clones of epidemic ST11 clone. In conclusion, the novel cgMLST scheme not only showed higher discriminatory power compared with PFGE and MLST in outbreak investigations but also showed ability to reveal more population structure characteristics than MLST.

7.
Infect Genet Evol ; 56: 117-124, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29155241

RESUMO

This work revealed the drug resistance and population structure of Moraxella catarrhalis strains isolated from children less than three years old with pneumonia. Forty-four independent M. catarrhalis strains were analyzed using broth dilution antimicrobial susceptibility testing and multilocus sequence typing (MLST). The highest non-susceptibility rate was observed for amoxicillin (AMX), which reached 95.5%, followed by clindamycin (CLI) (n=33; 75.0%), azithromycin (AZM) (61.4%), cefaclor (CEC) (25.0%), trimethoprim-sulfamethoxazole (SXT) (15.9%), cefuroxime (CXM) (4.5%), tetracycline (TE) (2.3%), and doxycycline (DOX) (2.3%). There was no strain showing non-susceptibility to other six antimicrobials. Using MLST, the 44 M. catarrhalis strains were divided into 33 sequence types (STs). Based on their allelic profiles, the 33 STs were divided into one CC (CC363) and 28 singletons. CC363 contained five STs and ST363 was the founder ST. CC363 contained 63.6%, 33.3%, and 40.7% of CEC non-susceptible, CLI non-susceptible and AZM non-susceptible strains, respectively. The proportions of CEC non-susceptible, CLI non-susceptible and AZM non-susceptible strains in CC363 were higher than that of singletons; these differences were significant for CEC (p=0.002) and AZM (p=0.011). Furthermore, CC363 contained more AMX-CLI-AZM co-non-susceptible and AMX-CEC-CLI-AZM co-non-susceptible strains than the singletons (p=0.007 and p<0.001, respectively). CC363 is a drug-resistant clone of clinical M. catarrhalis strains in China. Expansion of this clone under selective pressure of antibiotics should be noted and long-term monitoring should be established.


Assuntos
Antibacterianos/farmacologia , Farmacorresistência Bacteriana , Moraxella catarrhalis/classificação , Moraxella catarrhalis/genética , Tipagem de Sequências Multilocus , Pneumonia Bacteriana/microbiologia , Criança , Humanos , Testes de Sensibilidade Microbiana , Moraxella catarrhalis/efeitos dos fármacos , Filogenia , Pneumonia Bacteriana/tratamento farmacológico , Pneumonia Bacteriana/epidemiologia
8.
BioData Min ; 10: 37, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29270227

RESUMO

BACKGROUND: Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. RESULTS: Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. CONCLUSIONS: This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa