RESUMO
Goal-directed behavior requires the interaction of multiple brain regions. How these regions and their interactions with brain-wide activity drive action selection is less understood. We have investigated this question by combining whole-brain volumetric calcium imaging using light-field microscopy and an operant-conditioning task in larval zebrafish. We find global, recurring dynamics of brain states to exhibit pre-motor bifurcations toward mutually exclusive decision outcomes. These dynamics arise from a distributed network displaying trial-by-trial functional connectivity changes, especially between cerebellum and habenula, which correlate with decision outcome. Within this network the cerebellum shows particularly strong and predictive pre-motor activity (>10 s before movement initiation), mainly within the granule cells. Turn directions are determined by the difference neuroactivity between the ipsilateral and contralateral hemispheres, while the rate of bi-hemispheric population ramping quantitatively predicts decision time on the trial-by-trial level. Our results highlight a cognitive role of the cerebellum and its importance in motor planning.
Assuntos
Cerebelo/fisiologia , Tomada de Decisões/fisiologia , Tempo de Reação/fisiologia , Peixe-Zebra/fisiologia , Animais , Comportamento Animal/fisiologia , Mapeamento Encefálico/métodos , Cérebro/fisiologia , Cognição/fisiologia , Condicionamento Operante/fisiologia , Objetivos , Habenula/fisiologia , Temperatura Alta , Larva/fisiologia , Atividade Motora/fisiologia , Movimento , Neurônios/fisiologia , Desempenho Psicomotor/fisiologia , Rombencéfalo/fisiologiaRESUMO
The array of whiskers on the snout provides rodents with tactile sensory information relating to the size, shape and texture of objects in their immediate environment. Rodents can use their whiskers to detect stimuli, distinguish textures, locate objects and navigate. Important aspects of whisker sensation are thought to result from neuronal computations in the whisker somatosensory cortex (wS1). Each whisker is individually represented in the somatotopic map of wS1 by an anatomical unit named a 'barrel' (hence also called barrel cortex). This allows precise investigation of sensory processing in the context of a well-defined map. Here, we first review the signaling pathways from the whiskers to wS1, and then discuss current understanding of the various types of excitatory and inhibitory neurons present within wS1. Different classes of cells can be defined according to anatomical, electrophysiological and molecular features. The synaptic connectivity of neurons within local wS1 microcircuits, as well as their long-range interactions and the impact of neuromodulators, are beginning to be understood. Recent technological progress has allowed cell-type-specific connectivity to be related to cell-type-specific activity during whisker-related behaviors. An important goal for future research is to obtain a causal and mechanistic understanding of how selected aspects of tactile sensory information are processed by specific types of neurons in the synaptically connected neuronal networks of wS1 and signaled to downstream brain areas, thus contributing to sensory-guided decision-making.
Assuntos
Vias Neurais/fisiologia , Sensação/fisiologia , Córtex Somatossensorial/fisiologia , Vibrissas/fisiologia , Animais , Encefalopatias/fisiopatologia , Interfaces Cérebro-Computador , Humanos , Camundongos , Transdução de Sinais/fisiologia , Vibrissas/inervaçãoRESUMO
Tuberculosis claims more human lives than any other bacterial infectious disease and represents a clear and present danger to global health as new tools for vaccination, treatment, and interruption of transmission have been slow to emerge. Additionally, tuberculosis presents with notable clinical heterogeneity, which complicates diagnosis, treatment, and the establishment of nonrelapsing cure. How this heterogeneity is driven by the diversity ofclinical isolates of the causative agent, Mycobacterium tuberculosis, has recently garnered attention. Herein, we review advances in the understanding of how naturally occurring variation in clinical isolates affects transmissibility, pathogenesis, immune modulation, and drug resistance. We also summarize how specific changes in transcriptional responses can modulate infection or disease outcome, together with strain-specific effects on gene essentiality. Further understanding of how this diversity of M. tuberculosis isolates affects disease and treatment outcomes will enable the development of more effective therapeutic options and vaccines for this dreaded disease.
Assuntos
Variação Genética/genética , Mycobacterium tuberculosis/genética , Animais , Genótipo , Humanos , Transcrição Gênica/genética , Tuberculose/microbiologiaRESUMO
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
RESUMO
Is there a formula for a competitive NIH grant application? The Serenity Prayer may provide one: "Grant me the serenity to accept the things I cannot change, the ability to change the things I can, and the wisdom to know the difference." But how to tell the difference? In this Perspective, we provide an inclusive roadmap-elements of NIH funding. Collectively, we have over 30 y of peer review experience as NIH Scientific Review Officers in addition to over 30 y of program experience as NIH Program Officers. This article distills our NIH experience. We use Euclid's 13-book landmark, The Elements, as our template to humbly share what we learned. We have three specific aims: inform, guide, and motivate prospective applicants. We also address ways that support diversity and inclusion among applicants and young investigators in biomedical research. The elements we describe come from a wide range of sources. Some themes will be general. Some will be specific. All will be candid. The ultimate goal is a competitive application, serenity, and hopefully both.
Assuntos
Pesquisa Biomédica , Humanos , Estados Unidos , Pesquisadores , Revisão por Pares , Motivação , National Institutes of Health (U.S.)RESUMO
Multi-principal element alloys (MPEAs) exhibit outstanding strength attributed to the complex dislocation dynamics as compared to conventional alloys. Here, we develop an atomic-lattice-distortion-dependent discrete dislocation dynamics framework consisted of random field theory and phenomenological dislocation model to investigate the fundamental deformation mechanism underlying massive dislocation motions in body-centered cubic MPEA. Amazingly, the turbulence of dislocation speed is identified in light of strong heterogeneous lattice strain field caused by short-range ordering. Importantly, the vortex from dislocation flow turbulence not only acts as an effective source to initiate dislocation multiplication but also induces the strong local pinning trap to block dislocation movement, thus breaking the strength-ductility trade-off.
RESUMO
In cystic fibrosis (CF), impaired mucociliary clearance leads to chronic infection and inflammation. However, cilia beating features in a CF altered environment, consisting of dehydrated airway surface liquid layer and abnormal mucus, have not been fully characterized. Furthermore, acute inflammation is normally followed by an active resolution phase requiring specialized proresolving lipid mediators (SPMs) and allowing return to homeostasis. However, altered SPMs biosynthesis has been reported in CF. Here, we explored cilia beating dynamics in CF airways primary cultures and its response to the SPMs, resolvin E1 (RvE1) and lipoxin B4 (LXB4). Human nasal epithelial cells (hNECs) from CF and non-CF donors were grown at air-liquid interface. The ciliary beat frequency, synchronization, orientation, and density were analyzed from high-speed video microscopy using a multiscale Differential Dynamic Microscopy algorithm and an in-house developed method. Mucins and ASL layer height were studied by qRT-PCR and confocal microscopy. Principal component analysis showed that CF and non-CF hNEC had distinct cilia beating phenotypes, which was mostly explained by differences in cilia beat organization rather than frequency. Exposure to RvE1 (10 nM) and to LXB4 (10 nM) restored a non-CF-like cilia beating phenotype. Furthermore, RvE1 increased the airway surface liquid (ASL) layer height and reduced the mucin MUC5AC thickness. The calcium-activated chloride channel, TMEM16A, was involved in the RvE1 effect on cilia beating, hydration, and mucus. Altogether, our results provide evidence for defective cilia beating in CF airway epithelium and a role of RvE1 and LXB4 to restore the main epithelial functions involved in the mucociliary clearance.
Assuntos
Fibrose Cística , Ácido Eicosapentaenoico/análogos & derivados , Humanos , Cílios , Mucosa Nasal , InflamaçãoRESUMO
Interstitial atoms usually diffuse much faster than vacancies, which is often the root cause for the ineffective recombination of point defects in metals under irradiation. Here, via ab initio modeling of single-defect diffusion behavior in the equiatomic NiCoCrFe(Pd) alloy, we demonstrate an alloy design strategy that can reduce the diffusivity difference between the two types of point defects. The two diffusivities become almost equal after substituting the NiCoCrFe base alloy with Pd. The underlying mechanism is that Pd, with a much larger atomic size (hence larger compressibility) than the rest of the constituents, not only heightens the activation energy barrier (Ea) for interstitial motion by narrowing the diffusion channels but simultaneously also reduces Ea for vacancies due to less energy penalty required for bond length change between the initial and the saddle states. Our findings have a broad implication that the dynamics of point defects can be manipulated by taking advantage of the atomic size disparity, to facilitate point-defect annihilation that suppresses void formation and swelling, thereby improving radiation tolerance.
RESUMO
The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Proteínas/química , Conformação Proteica , Domínio CatalíticoRESUMO
How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless-Anderson-Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.
RESUMO
Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV-disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.
RESUMO
The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects, we address outcome missingness of the latent missing at random type (LMAR, also known as latent ignorability). That is, conditional on covariates and treatment assigned, the missingness may depend on compliance type. Within the instrumental variable (IV) approach to noncompliance, methods have been proposed for handling LMAR outcome that additionally invoke an exclusion restriction-type assumption on missingness, but no solution has been proposed for when a non-IV approach is used. This article focuses on effect identification in the presence of LMAR outcomes, with a view to flexibly accommodate different principal identification approaches. We show that under treatment assignment ignorability and LMAR only, effect nonidentifiability boils down to a set of two connected mixture equations involving unidentified stratum-specific response probabilities and outcome means. This clarifies that (except for a special case) effect identification generally requires two additional assumptions: a specific missingness mechanism assumption and a principal identification assumption. This provides a template for identifying effects based on separate choices of these assumptions. We consider a range of specific missingness assumptions, including those that have appeared in the literature and some new ones. Incidentally, we find an issue in the existing assumptions, and propose a modification of the assumptions to avoid the issue. Results under different assumptions are illustrated using data from the Baltimore Experience Corps Trial.
Assuntos
Modelos Estatísticos , Humanos , Interpretação Estatística de Dados , Causalidade , Bioestatística/métodosRESUMO
Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.
Assuntos
Doença de Parkinson , Humanos , Estudos Longitudinais , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/fisiopatologia , Doença de Alzheimer/tratamento farmacológico , Interpretação Estatística de Dados , Análise Multivariada , Bioestatística/métodos , Disfunção Cognitiva , Modelos Estatísticos , Pioglitazona/uso terapêutico , Pioglitazona/farmacologia , Análise de Componente PrincipalRESUMO
Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.
Assuntos
Eletroencefalografia , Máquina de Vetores de Suporte , Humanos , Eletroencefalografia/métodos , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Análise de Componente Principal , Modelos Estatísticos , Alcoolismo/fisiopatologia , Simulação por ComputadorRESUMO
Dysregulation of microRNAs (miRNAs) is closely associated with refractory human diseases, and the identification of potential associations between small molecule (SM) drugs and miRNAs can provide valuable insights for clinical treatment. Existing computational techniques for inferring potential associations suffer from limitations in terms of accuracy and efficiency. To address these challenges, we devise a novel predictive model called RPCA$\Gamma $NR, in which we propose a new Robust principal component analysis (PCA) framework based on $\gamma $-norm and $l_{2,1}$-norm regularization and design an Augmented Lagrange Multiplier method to optimize it, thereby deriving the association scores. The Gaussian Interaction Profile Kernel Similarity is calculated to capture the similarity information of SMs and miRNAs in known associations. Through extensive evaluation, including Cross Validation Experiments, Independent Validation Experiment, Efficiency Analysis, Ablation Experiment, Matrix Sparsity Analysis, and Case Studies, RPCA$\Gamma $NR outperforms state-of-the-art models concerning accuracy, efficiency and robustness. In conclusion, RPCA$\Gamma $NR can significantly streamline the process of determining SM-miRNA associations, thus contributing to advancements in drug development and disease treatment.
Assuntos
Algoritmos , MicroRNAs , Humanos , Análise de Componente Principal , Desenvolvimento de Medicamentos , MicroRNAs/genética , Projetos de PesquisaRESUMO
Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome ('globally') and based on loci from a specific genomic region ('locally'). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5-0.5%) and rare (<0.5%) single-nucleotide variations (SNVs). Overall, we observe that all approaches provide the best clustering performance when computed based on rare SNVs. The performance of the similarity matrices is very similar for common and low-frequency variants, but for rare variants, the unweighted Jaccard matrix provides preferable clustering features. Based on visual inspection and in terms of standard clustering metrics, its clusters are the densest and the best separated in the principal component analysis of variants with rare SNVs compared with the other methods and different allele frequency cutoffs. In an application, we assessed the role of rare variants on local and global PS, using WGS data from multiethnic Alzheimer's disease data sets and European or East Asian populations from the 1000 Genome Project.
Assuntos
Genoma , Genômica , Análise de Componente Principal , Frequência do Gene , Simulação por Computador , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Polygenic risk score (PRS) has been recently developed for predicting complex traits and drug responses. It remains unknown whether multi-trait PRS (mtPRS) methods, by integrating information from multiple genetically correlated traits, can improve prediction accuracy and power for PRS analysis compared with single-trait PRS (stPRS) methods. In this paper, we first review commonly used mtPRS methods and find that they do not directly model the underlying genetic correlations among traits, which has been shown to be useful in guiding multi-trait association analysis in the literature. To overcome this limitation, we propose a mtPRS-PCA method to combine PRSs from multiple traits with weights obtained from performing principal component analysis (PCA) on the genetic correlation matrix. To accommodate various genetic architectures covering different effect directions, signal sparseness and across-trait correlation structures, we further propose an omnibus mtPRS method (mtPRS-O) by combining P values from mtPRS-PCA, mtPRS-ML (mtPRS based on machine learning) and stPRSs using Cauchy Combination Test. Our extensive simulation studies show that mtPRS-PCA outperforms other mtPRS methods in both disease and pharmacogenomics (PGx) genome-wide association studies (GWAS) contexts when traits are similarly correlated, with dense signal effects and in similar effect directions, and mtPRS-O is consistently superior to most other methods due to its robustness under various genetic architectures. We further apply mtPRS-PCA, mtPRS-O and other methods to PGx GWAS data from a randomized clinical trial in the cardiovascular domain and demonstrate performance improvement of mtPRS-PCA in both prediction accuracy and patient stratification as well as the robustness of mtPRS-O in PRS association test.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Estudo de Associação Genômica Ampla/métodos , Farmacogenética , Polimorfismo de Nucleotídeo Único , Fenótipo , Predisposição Genética para DoençaRESUMO
Successful surgical treatment of drug-resistant epilepsy traditionally relies on the identification of seizure onset zones (SOZs). Connectome-based analyses of electrographic data from stereo electroencephalography (SEEG) may empower improved detection of SOZs. Specifically, connectome-based analyses based on the interictal suppression hypothesis posit that when the patient is not having a seizure, SOZs are inhibited by non-SOZs through high inward connectivity and low outward connectivity. However, it is not clear whether there are other motifs that can better identify potential SOZs. Thus, we sought to use unsupervised machine learning to identify network motifs that elucidate SOZs and investigate if there is another motif that outperforms the ISH. Resting-state SEEG data from 81 patients with drug-resistant epilepsy undergoing a pre-surgical evaluation at Vanderbilt University Medical Center were collected. Directed connectivity matrices were computed using the alpha band (8-13 Hz). Principal component analysis (PCA) was performed on each patient's connectivity matrix. Each patient's components were analysed qualitatively to identify common patterns across patients. A quantitative definition was then used to identify the component that most closely matched the observed pattern in each patient. A motif characteristic of the interictal suppression hypothesis (high-inward and low-outward connectivity) was present in all individuals and found to be the most robust motif for identification of SOZs in 64/81 (79%) patients. This principal component demonstrated significant differences in SOZs compared to non-SOZs. While other motifs for identifying SOZs were present in other patients, they differed for each patient, suggesting that seizure networks are patient specific, but the ISH is present in nearly all networks. We discovered that a potentially suppressive motif based on the interictal suppression hypothesis was present in all patients, and it was the most robust motif for SOZs in 79% of patients. Each patient had additional motifs that further characterized SOZs, but these motifs were not common across all patients. This work has the potential to augment clinical identification of SOZs to improve epilepsy treatment.
Assuntos
Conectoma , Epilepsia Resistente a Medicamentos , Eletroencefalografia , Epilepsias Parciais , Convulsões , Humanos , Epilepsias Parciais/fisiopatologia , Epilepsias Parciais/cirurgia , Masculino , Feminino , Adulto , Eletroencefalografia/métodos , Epilepsia Resistente a Medicamentos/fisiopatologia , Epilepsia Resistente a Medicamentos/cirurgia , Convulsões/fisiopatologia , Conectoma/métodos , Adulto Jovem , Pessoa de Meia-Idade , Adolescente , Encéfalo/fisiopatologia , Aprendizado de Máquina não SupervisionadoRESUMO
Multi-principal element alloys (MPEAs) exhibit outstanding mechanical properties because the core effect of severe atomic lattice distortion is distinctly different from that of traditional alloys. However, at the mesoscopic scale the underlying physics for the abundant dislocation activities responsible for strength-ductility synergy has not been uncovered. While the Eshelby mean-field approaches become insufficient to tackle yielding and plasticity in severely distorted crystalline solids, here we develop a three-dimensional discrete dislocation dynamics simulation approach by taking into account the experimentally measured lattice strain field from a model FeCoCrNiMn MPEA to explore the heterogeneous strain-induced strengthening mechanisms. Our results reveal that the heterogeneous lattice strain causes unusual dislocation behaviors (i.e., multiple kinks/jogs and bidirectional cross slips), resulting in the strengthening mechanisms that underpin the strength-ductility synergy. The outcome of our research sheds important insights into the design of strong yet ductile distorted crystalline solids, such as high-entropy alloys and high-entropy ceramics.
RESUMO
Principal component analysis (PCA) is an important and widely used unsupervised learning method that determines population structure based on genetic variation. Genome sequencing of thousands of individuals usually generate tens of millions of SNPs, making it challenging for PCA analysis and interpretation. Here we present VCF2PCACluster, a simple, fast and memory-efficient tool for Kinship estimation, PCA and clustering analysis, and visualization based on VCF formatted SNPs. We implemented five Kinship estimation methods and three clustering methods for its users to choose from. Moreover, unlike other PCA tools, VCF2PCACluster possesses a clustering function based on PCA result, which enabling users to automatically and clearly know about population structure. We demonstrated the same accuracy but a higher performance of this tool in performing PCA analysis on tens of millions of SNPs compared to another popular PLINK2 software, especially in peak memory usage that is independent of the number of SNPs in VCF2PCACluster.