Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 821
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Cell ; 185(6): 1065-1081.e23, 2022 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-35245431

RESUMEN

Motor behaviors are often planned long before execution but only released after specific sensory events. Planning and execution are each associated with distinct patterns of motor cortex activity. Key questions are how these dynamic activity patterns are generated and how they relate to behavior. Here, we investigate the multi-regional neural circuits that link an auditory "Go cue" and the transition from planning to execution of directional licking. Ascending glutamatergic neurons in the midbrain reticular and pedunculopontine nuclei show short latency and phasic changes in spike rate that are selective for the Go cue. This signal is transmitted via the thalamus to the motor cortex, where it triggers a rapid reorganization of motor cortex state from planning-related activity to a motor command, which in turn drives appropriate movement. Our studies show how midbrain can control cortical dynamics via the thalamus for rapid and precise motor behavior.


Asunto(s)
Corteza Motora , Movimiento , Tálamo , Animales , Mesencéfalo , Ratones , Corteza Motora/fisiología , Neuronas/fisiología , Tálamo/fisiología
2.
Annu Rev Neurosci ; 45: 249-271, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35316610

RESUMEN

The brain plans and executes volitional movements. The underlying patterns of neural population activity have been explored in the context of movements of the eyes, limbs, tongue, and head in nonhuman primates and rodents. How do networks of neurons produce the slow neural dynamics that prepare specific movements and the fast dynamics that ultimately initiate these movements? Recent work exploits rapid and calibrated perturbations of neural activity to test specific dynamical systems models that are capable of producing the observed neural activity. These joint experimental and computational studies show that cortical dynamics during motor planning reflect fixed points of neural activity (attractors). Subcortical control signals reshape and move attractors over multiple timescales, causing commitment to specific actions and rapid transitions to movement execution. Experiments in rodents are beginning to reveal how these algorithms are implemented at the level of brain-wide neural circuits.


Asunto(s)
Corteza Motora , Algoritmos , Animales , Encéfalo/fisiología , Corteza Motora/fisiología , Movimiento/fisiología , Neuronas/fisiología
3.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38478692

RESUMEN

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiología , Filogenia , Aprendizaje Automático
4.
Trends Immunol ; 44(5): 329-332, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36997459

RESUMEN

Profiling immune responses across several dimensions, including time, patients, molecular features, and tissue sites, can deepen our understanding of immunity as an integrated system. These studies require new analytical approaches to realize their full potential. We highlight recent applications of tensor methods and discuss several future opportunities.


Asunto(s)
Enfermedades Transmisibles , Inmunidad , Humanos
5.
Proc Natl Acad Sci U S A ; 120(48): e2311420120, 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-37988465

RESUMEN

Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.

6.
Proc Natl Acad Sci U S A ; 120(40): e2303523120, 2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37748075

RESUMEN

Sensorimotor transformation is the process of first sensing an object in the environment and then producing a movement in response to that stimulus. For visually guided saccades, neurons in the superior colliculus (SC) emit a burst of spikes to register the appearance of stimulus, and many of the same neurons discharge another burst to initiate the eye movement. We investigated whether the neural signatures of sensation and action in SC depend on context. Spiking activity along the dorsoventral axis was recorded with a laminar probe as Rhesus monkeys generated saccades to the same stimulus location in tasks that require either executive control to delay saccade onset until permission is granted or the production of an immediate response to a target whose onset is predictable. Using dimensionality reduction and discriminability methods, we show that the subspaces occupied during the visual and motor epochs were both distinct within each task and differentiable across tasks. Single-unit analyses, in contrast, show that the movement-related activity of SC neurons was not different between tasks. These results demonstrate that statistical features in neural activity of simultaneously recorded ensembles provide more insight than single neurons. They also indicate that cognitive processes associated with task requirements are multiplexed in SC population activity during both sensation and action and that downstream structures could use this activity to extract context. Additionally, the entire manifolds associated with sensory and motor responses, respectively, may be larger than the subspaces explored within a certain set of experiments.


Asunto(s)
Líquidos Corporales , Colículos Superiores , Animales , Movimientos Oculares , Neuronas , Macaca mulatta , Sensación
7.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38018908

RESUMEN

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.


Asunto(s)
Multiómica , Neoplasias , Humanos
8.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36458451

RESUMEN

In epistasis analysis, single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) among genes may, alongside other environmental factors, influence the risk of multifactorial diseases. To identify SSI between cases and controls (i.e. binary traits), the score for model quality is affected by different objective functions (i.e. measurements) because of potential disease model preferences and disease complexities. Our previous study proposed a multiobjective approach-based multifactor dimensionality reduction (MOMDR), with the results indicating that two objective functions could enhance SSI identification with weak marginal effects. However, SSI identification using MOMDR remains a challenge because the optimal measure combination of objective functions has yet to be investigated. This study extended MOMDR to the many-objective version (i.e. many-objective MDR, MaODR) by integrating various disease probability measures based on a two-way contingency table to improve the identification of SSI between cases and controls. We introduced an objective function selection approach to determine the optimal measure combination in MaODR among 10 well-known measures. In total, 6 disease models with and 40 disease models without marginal effects were used to evaluate the general algorithms, namely those based on multifactor dimensionality reduction, MOMDR and MaODR. Our results revealed that the MaODR-based three objective function model, correct classification rate, likelihood ratio and normalized mutual information (MaODR-CLN) exhibited the higher 6.47% detection success rates (Accuracy) than MOMDR and higher 17.23% detection success rates than MDR through the application of an objective function selection approach. In a Wellcome Trust Case Control Consortium, MaODR-CLN successfully identified the significant SSIs (P < 0.001) associated with coronary artery disease. We performed a systematic analysis to identify the optimal measure combination in MaODR among 10 objective functions. Our combination detected SSIs-based binary traits with weak marginal effects and thus reduced spurious variables in the score model. MOAI is freely available at https://sites.google.com/view/maodr/home.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Algoritmos , Fenotipo , Reducción de Dimensionalidad Multifactorial/métodos , Polimorfismo de Nucleótido Simple
9.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37088976

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a revolutionary breakthrough that determines the precise gene expressions on individual cells and deciphers cell heterogeneity and subpopulations. However, scRNA-seq data are much noisier than traditional high-throughput RNA-seq data because of technical limitations, leading to many scRNA-seq data studies about dimensionality reduction and visualization remaining at the basic data-stacking stage. In this study, we propose an improved variational autoencoder model (termed DREAM) for dimensionality reduction and a visual analysis of scRNA-seq data. Here, DREAM combines the variational autoencoder and Gaussian mixture model for cell type identification, meanwhile explicitly solving 'dropout' events by introducing the zero-inflated layer to obtain the low-dimensional representation that describes the changes in the original scRNA-seq dataset. Benchmarking comparisons across nine scRNA-seq datasets show that DREAM outperforms four state-of-the-art methods on average. Moreover, we prove that DREAM can accurately capture the expression dynamics of human preimplantation embryonic development. DREAM is implemented in Python, freely available via the GitHub website, https://github.com/Crystal-JJ/DREAM.


Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , RNA-Seq , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
10.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37418278

RESUMEN

Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.


Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Conformación Proteica , Proteínas/química
11.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-36946414

RESUMEN

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Asunto(s)
Proteínas , Programas Informáticos , Humanos , Secuencia de Aminoácidos , Evolución Biológica
12.
Bioinformatics ; 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39172488

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) enables comprehensive characterization of the cell state. However, its destructive nature prohibits measuring gene expression changes during dynamic processes such as embryogenesis. Although recent studies integrating scRNA-seq with lineage tracing have provided clonal insights between progenitor and mature cells, challenges remain. Because of their experimental nature, observations are sparse, and cells observed in the early state are not the exact progenitors of cells observed at later time points. To overcome these limitations, we developed LineageVAE, a novel computational methodology that utilizes deep learning based on the property that cells sharing barcodes have identical progenitors. RESULTS: LineageVAE is a deep generative model that transforms scRNA-seq observations with identical lineage barcodes into sequential trajectories toward a common progenitor in a latent cell state space. This method enables the reconstruction of unobservable cell state transitions, historical transcriptomes, and regulatory dynamics at a single-cell resolution. Applied to hematopoiesis and reprogrammed fibroblast datasets, LineageVAE demonstrated its ability to restore backward cell state transitions and infer progenitor heterogeneity and transcription factor activity along differentiation trajectories. AVAILABILITY AND IMPLEMENTATION: The LineageVAE model was implemented in Python using the PyTorch deep learning library. The code is available on GitHub at https://github.com/LzrRacer/LineageVAE/. SUPPLEMENTARY INFORMATION: Available at Bioinformatics online.

13.
Proc Natl Acad Sci U S A ; 119(26): e2113651119, 2022 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-35737842

RESUMEN

The high-dimensional character of most biological systems presents genuine challenges for modeling and prediction. Here we propose a neural network-based approach for dimensionality reduction and analysis of biological gene expression data, using, as a case study, a well-known genetic network in the early Drosophila embryo, the gap gene patterning system. We build an autoencoder compressing the dynamics of spatial gap gene expression into a two-dimensional (2D) latent map. The resulting 2D dynamics suggests an almost linear model, with a small bare set of essential interactions. Maternally defined spatial modes control gap genes positioning, without the classically assumed intricate set of repressive gap gene interactions. This, surprisingly, predicts minimal changes of neighboring gap domains when knocking out gap genes, consistent with previous observations. Latent space geometries in maternal mutants are also consistent with the existence of such spatial modes. Finally, we show how positional information is well defined and interpretable as a polar angle in latent space. Our work illustrates how optimization of small neural networks on medium-sized biological datasets is sufficiently informative to capture essential underlying mechanisms of network function.


Asunto(s)
Proteínas de Drosophila , Redes Reguladoras de Genes , Redes Neurales de la Computación , Animales , Drosophila/embriología , Drosophila/genética , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Modelos Genéticos
14.
BMC Bioinformatics ; 25(1): 171, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38689234

RESUMEN

BACKGROUND: Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS: This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS: Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.


Asunto(s)
RNA-Seq , Análisis de Expresión Génica de una Sola Célula , Humanos , Algoritmos , Análisis por Conglomerados , Redes Neurales de la Computación , RNA-Seq/métodos , Análisis de Expresión Génica de una Sola Célula/métodos
15.
BMC Bioinformatics ; 25(1): 167, 2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38671342

RESUMEN

BACKGROUND: Numerous transcriptomic-based models have been developed to predict or understand the fundamental mechanisms driving biological phenotypes. However, few models have successfully transitioned into clinical practice due to challenges associated with generalizability and interpretability. To address these issues, researchers have turned to dimensionality reduction methods and have begun implementing transfer learning approaches. METHODS: In this study, we aimed to determine the optimal combination of dimensionality reduction and regularization methods for predictive modeling. We applied seven dimensionality reduction methods to various datasets, including two supervised methods (linear optimal low-rank projection and low-rank canonical correlation analysis), two unsupervised methods [principal component analysis and consensus independent component analysis (c-ICA)], and three methods [autoencoder (AE), adversarial variational autoencoder, and c-ICA] within a transfer learning framework, trained on > 140,000 transcriptomic profiles. To assess the performance of the different combinations, we used a cross-validation setup encapsulated within a permutation testing framework, analyzing 30 different transcriptomic datasets with binary phenotypes. Furthermore, we included datasets with small sample sizes and phenotypes of varying degrees of predictability, and we employed independent datasets for validation. RESULTS: Our findings revealed that regularized models without dimensionality reduction achieved the highest predictive performance, challenging the necessity of dimensionality reduction when the primary goal is to achieve optimal predictive performance. However, models using AE and c-ICA with transfer learning for dimensionality reduction showed comparable performance, with enhanced interpretability and robustness of predictors, compared to models using non-dimensionality-reduced data. CONCLUSION: These findings offer valuable insights into the optimal combination of strategies for enhancing the predictive performance, interpretability, and generalizability of transcriptomic-based models.


Asunto(s)
Fenotipo , Transcriptoma , Transcriptoma/genética , Humanos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , Biología Computacional/métodos , Algoritmos , Análisis de Componente Principal
16.
J Proteome Res ; 23(8): 3088-3095, 2024 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-38690713

RESUMEN

Spatial segmentation is an essential processing method for image analysis aiming to identify the characteristic suborgans or microregions from mass spectrometry imaging (MSI) data, which is critical for understanding the spatial heterogeneity of biological information and function and the underlying molecular signatures. Due to the intrinsic characteristics of MSI data including spectral nonlinearity, high-dimensionality, and large data size, the common segmentation methods lack the capability for capturing the accurate microregions associated with biological functions. Here we proposed an ensemble learning-based spatial segmentation strategy, named eLIMS, that combines a randomized unified manifold approximation and projection (r-UMAP) dimensionality reduction module for extracting significant features and an ensemble pixel clustering module for aggregating the clustering maps from r-UMAP. Three MSI datasets are used to evaluate the performance of eLIMS, including mouse fetus, human adenocarcinoma, and mouse brain. Experimental results demonstrate that the proposed method has potential in partitioning the heterogeneous tissues into several subregions associated with anatomical structure, i.e., the suborgans of the brain region in mouse fetus data are identified as dorsal pallium, midbrain, and brainstem. Furthermore, it effectively discovers critical microregions related to physiological and pathological variations offering new insight into metabolic heterogeneity.


Asunto(s)
Encéfalo , Procesamiento de Imagen Asistido por Computador , Ratones , Animales , Humanos , Encéfalo/metabolismo , Encéfalo/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Espectrometría de Masas/métodos , Feto/metabolismo , Algoritmos , Análisis por Conglomerados , Adenocarcinoma/metabolismo , Adenocarcinoma/patología , Aprendizaje Automático
17.
Neuroimage ; 293: 120625, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38704056

RESUMEN

Principal component analysis (PCA) has been widely employed for dimensionality reduction prior to multivariate pattern classification (decoding) in EEG research. The goal of the present study was to provide an evaluation of the effectiveness of PCA on decoding accuracy (using support vector machines) across a broad range of experimental paradigms. We evaluated several different PCA variations, including group-based and subject-based component decomposition and the application of Varimax rotation or no rotation. We also varied the numbers of PCs that were retained for the decoding analysis. We evaluated the resulting decoding accuracy for seven common event-related potential components (N170, mismatch negativity, N2pc, P3b, N400, lateralized readiness potential, and error-related negativity). We also examined more challenging decoding tasks, including decoding of face identity, facial expression, stimulus location, and stimulus orientation. The datasets also varied in the number and density of electrode sites. Our findings indicated that none of the PCA approaches consistently improved decoding performance related to no PCA, and the application of PCA frequently reduced decoding performance. Researchers should therefore be cautious about using PCA prior to decoding EEG data from similar experimental paradigms, populations, and recording setups.


Asunto(s)
Electroencefalografía , Análisis de Componente Principal , Máquina de Vectores de Soporte , Humanos , Electroencefalografía/métodos , Femenino , Masculino , Adulto , Adulto Joven , Potenciales Evocados/fisiología , Encéfalo/fisiología , Procesamiento de Señales Asistido por Computador
18.
Mol Biol Evol ; 40(10)2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37772983

RESUMEN

Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.


Asunto(s)
Inteligencia Artificial , Genómica , Humanos , Genómica/métodos , Redes Neurales de la Computación , Aprendizaje Automático , Selección Genética
19.
Am J Epidemiol ; 193(7): 1010-1018, 2024 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-38375692

RESUMEN

The statistical analysis of omics data poses a great computational challenge given their ultra-high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.


Asunto(s)
Algoritmos , Índice de Masa Corporal , Metilación de ADN , Epigenómica , Humanos , Epigenómica/métodos , Femenino , Masculino , Teorema de Bayes , Persona de Mediana Edad , Epigénesis Genética , Anciano , Biomarcadores/sangre
20.
Funct Integr Genomics ; 24(5): 139, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39158621

RESUMEN

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Aprendizaje Automático , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Profundo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA