Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 7.746
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 179(2): 355-372.e23, 2019 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-31564455

RESUMO

Animal survival requires a functioning nervous system to develop during embryogenesis. Newborn neurons must assemble into circuits producing activity patterns capable of instructing behaviors. Elucidating how this process is coordinated requires new methods that follow maturation and activity of all cells across a developing circuit. We present an imaging method for comprehensively tracking neuron lineages, movements, molecular identities, and activity in the entire developing zebrafish spinal cord, from neurogenesis until the emergence of patterned activity instructing the earliest spontaneous motor behavior. We found that motoneurons are active first and form local patterned ensembles with neighboring neurons. These ensembles merge, synchronize globally after reaching a threshold size, and finally recruit commissural interneurons to orchestrate the left-right alternating patterns important for locomotion in vertebrates. Individual neurons undergo functional maturation stereotypically based on their birth time and anatomical origin. Our study provides a general strategy for reconstructing how functioning circuits emerge during embryogenesis. VIDEO ABSTRACT.

2.
Trends Biochem Sci ; 48(7): 590-596, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37031054

RESUMO

Investigating large datasets of biological information by automatic procedures may offer chances of progress in knowledge. Recently, tremendous improvements in structural biology have allowed the number of structures in the Protein Data Bank (PDB) archive to increase rapidly, in particular those for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-associated proteins. However, their automatic analysis can be hampered by the nonuniform descriptors used by authors in some records of the PDB and PDBx/mmCIF files. In this opinion article we highlight the difficulties encountered in automating the analysis of hundreds of structures, suggesting that further standardization of the description of these molecular entities and of their attributes, generalized to the macromolecular structures contained in the PDB, might generate files more suitable for automatized analyses of a large number of structures.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Proteínas/química , Estrutura Molecular , Bases de Dados de Proteínas , Conformação Proteica
3.
Trends Genet ; 39(11): 803-807, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37714735

RESUMO

To accelerate the impact of African genomics on human health, data science skills and awareness of Africa's rich genetic diversity must be strengthened globally. We describe the first African genomics data science workshop, implemented by the African Society of Human Genetics (AfSHG) and international partners, providing a framework for future workshops.


Assuntos
Ciência de Dados , Genômica , Humanos , Genética Humana
4.
Am J Hum Genet ; 110(6): 903-912, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37267899

RESUMO

10 years ago, a detailed analysis showed that only 33% of genome-wide association study (GWAS) results included the X chromosome. Multiple recommendations were made to combat such exclusion. Here, we re-surveyed the research landscape to determine whether these earlier recommendations had been translated. Unfortunately, among the genome-wide summary statistics reported in 2021 in the NHGRI-EBI GWAS Catalog, only 25% provided results for the X chromosome and 3% for the Y chromosome, suggesting that the exclusion phenomenon not only persists but has also expanded into an exclusionary problem. Normalizing by physical length of the chromosome, the average number of studies published through November 2022 with genome-wide-significant findings on the X chromosome is ∼1 study/Mb. By contrast, it ranges from ∼6 to ∼16 studies/Mb for chromosomes 4 and 19, respectively. Compared with the autosomal growth rate of ∼0.086 studies/Mb/year over the last decade, studies of the X chromosome grew at less than one-seventh that rate, only ∼0.012 studies/Mb/year. Among the studies that reported significant associations on the X chromosome, we noted extreme heterogeneities in data analysis and reporting of results, suggesting the need for clear guidelines. Unsurprisingly, among the 430 scores sampled from the PolyGenic Score Catalog, 0% contained weights for sex chromosomal SNPs. To overcome the dearth of sex chromosome analyses, we provide five sets of recommendations and future directions. Finally, until the sex chromosomes are included in a whole-genome study, instead of GWASs, we propose such studies would more properly be referred to as "AWASs," meaning "autosome-wide scans."


Assuntos
Estudo de Associação Genômica Ampla , Cromossomos Sexuais , Humanos , Estudo de Associação Genômica Ampla/métodos , Cromossomo Y , Genoma
5.
Am J Hum Genet ; 110(5): 762-773, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37019109

RESUMO

The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.


Assuntos
Exoma , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Exoma/genética , Bancos de Espécimes Biológicos , Fenótipo , Análise de Dados , Reino Unido
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38349062

RESUMO

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to gain biological insights at the cellular level. However, due to technical limitations of the existing sequencing technologies, low gene expression values are often omitted, leading to inaccurate gene counts. Existing methods, including advanced deep learning techniques, struggle to reliably impute gene expressions due to a lack of mechanisms that explicitly consider the underlying biological knowledge of the system. In reality, it has long been recognized that gene-gene interactions may serve as reflective indicators of underlying biology processes, presenting discriminative signatures of the cells. A genomic data analysis framework that is capable of leveraging the underlying gene-gene interactions is thus highly desirable and could allow for more reliable identification of distinctive patterns of the genomic data through extraction and integration of intricate biological characteristics of the genomic data. Here we tackle the problem in two steps to exploit the gene-gene interactions of the system. We first reposition the genes into a 2D grid such that their spatial configuration reflects their interactive relationships. To alleviate the need for labeled ground truth gene expression datasets, a self-supervised 2D convolutional neural network is employed to extract the contextual features of the interactions from the spatially configured genes and impute the omitted values. Extensive experiments with both simulated and experimental scRNA-seq datasets are carried out to demonstrate the superior performance of the proposed strategy against the existing imputation methods.


Assuntos
Aprendizado Profundo , Epistasia Genética , Análise de Dados , Genômica , Expressão Gênica , Perfilação da Expressão Gênica , Análise de Sequência de RNA
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38632951

RESUMO

In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Genoma , Algoritmos
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38517698

RESUMO

The high-throughput genomic and proteomic scanning approaches allow investigators to measure the quantification of genome-wide genes (or gene products) for certain disease conditions, which plays an essential role in promoting the discovery of disease mechanisms. The high-throughput approaches often generate a large gene list of interest (GOIs), such as differentially expressed genes/proteins. However, researchers have to perform manual triage and validation to explore the most promising, biologically plausible linkages between the known disease genes and GOIs (disease signals) for further study. Here, to address this challenge, we proposed a network-based strategy DDK-Linker to facilitate the exploration of disease signals hidden in omics data by linking GOIs to disease knowns genes. Specifically, it reconstructed gene distances in the protein-protein interaction (PPI) network through six network methods (random walk with restart, Deepwalk, Node2Vec, LINE, HOPE, Laplacian) to discover disease signals in omics data that have shorter distances to disease genes. Furthermore, benefiting from the establishment of knowledge base we established, the abundant bioinformatics annotations were provided for each candidate disease signal. To assist in omics data interpretation and facilitate the usage, we have developed this strategy into an application that users can access through a website or download the R package. We believe DDK-Linker will accelerate the exploring of disease genes and drug targets in a variety of omics data, such as genomics, transcriptomics and proteomics data, and provide clues for complex disease mechanism and pharmacological research. DDK-Linker is freely accessible at http://ddklinker.ncpsb.org.cn/.


Assuntos
Proteômica , Software , Proteômica/métodos , Genômica/métodos , Biologia Computacional/métodos , Mapas de Interação de Proteínas
9.
Mol Cell Proteomics ; 23(2): 100712, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38182042

RESUMO

Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.


Assuntos
Proteômica , Software , Proteômica/métodos , Espectrometria de Massas/métodos , Biblioteca Gênica , Proteoma/análise
10.
Mol Cell Proteomics ; 23(2): 100708, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38154689

RESUMO

In the era of open-modification search engines, more posttranslational modifications than ever can be detected by LC-MS/MS-based proteomics. This development can switch proteomics research into a higher gear, as PTMs are key in many cellular pathways important in cell proliferation, migration, metastasis, and aging. However, despite these advances in modification identification, statistical methods for PTM-level quantification and differential analysis have yet to catch up. This absence can partly be explained by statistical challenges inherent to the data, such as the confounding of PTM intensities with its parent protein abundance. Therefore, we have developed msqrob2PTM, a new workflow in the msqrob2 universe capable of differential abundance analysis at the PTM and at the peptidoform level. The latter is important for validating PTMs found as significantly differential. Indeed, as our method can deal with multiple PTMs per peptidoform, there is a possibility that significant PTMs stem from one significant peptidoform carrying another PTM, hinting that it might be the other PTM driving the perceived differential abundance. Our workflows can flag both differential peptidoform abundance (DPA) and differential peptidoform usage (DPU). This enables a distinction between direct assessment of differential abundance of peptidoforms (DPA) and differences in the relative usage of peptidoforms corrected for corresponding protein abundances (DPU). For DPA, we directly model the log2-transformed peptidoform intensities, while for DPU, we correct for parent protein abundance by an intermediate normalization step which calculates the log2-ratio of the peptidoform intensities to their summarized parent protein intensities. We demonstrated the utility and performance of msqrob2PTM by applying it to datasets with known ground truth, as well as to biological PTM-rich datasets. Our results show that msqrob2PTM is on par with, or surpassing the performance of, the current state-of-the-art methods. Moreover, msqrob2PTM is currently unique in providing output at the peptidoform level.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Cromatografia Líquida , Processamento de Proteína Pós-Traducional , Proteínas
11.
Proc Natl Acad Sci U S A ; 120(48): e2311420120, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-37988465

RESUMO

Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.

12.
Proc Natl Acad Sci U S A ; 120(24): e2219557120, 2023 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-37279273

RESUMO

It is widely accepted that there is an inextricable link between neural computations, biological mechanisms, and behavior, but it is challenging to simultaneously relate all three. Here, we show that topological data analysis (TDA) provides an important bridge between these approaches to studying how brains mediate behavior. We demonstrate that cognitive processes change the topological description of the shared activity of populations of visual neurons. These topological changes constrain and distinguish between competing mechanistic models, are connected to subjects' performance on a visual change detection task, and, via a link with network control theory, reveal a tradeoff between improving sensitivity to subtle visual stimulus changes and increasing the chance that the subject will stray off task. These connections provide a blueprint for using TDA to uncover the biological and computational mechanisms by which cognition affects behavior in health and disease.


Assuntos
Encéfalo , Cognição , Humanos , Cognição/fisiologia , Encéfalo/fisiologia , Neurônios/fisiologia
13.
Proc Natl Acad Sci U S A ; 120(12): e2216805120, 2023 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-36920920

RESUMO

Homeostasis, the ability to maintain a relatively constant internal environment in the face of perturbations, is a hallmark of biological systems. It is believed that this constancy is achieved through multiple internal regulation and control processes. Given observations of a system, or even a detailed model of one, it is both valuable and extremely challenging to extract the control objectives of the homeostatic mechanisms. In this work, we develop a robust data-driven method to identify these objectives, namely to understand: "what does the system care about?". We propose an algorithm, Identifying Regulation with Adversarial Surrogates (IRAS), that receives an array of temporal measurements of the system and outputs a candidate for the control objective, expressed as a combination of observed variables. IRAS is an iterative algorithm consisting of two competing players. The first player, realized by an artificial deep neural network, aims to minimize a measure of invariance we refer to as the coefficient of regulation. The second player aims to render the task of the first player more difficult by forcing it to extract information about the temporal structure of the data, which is absent from similar "surrogate" data. We test the algorithm on four synthetic and one natural data set, demonstrating excellent empirical results. Interestingly, our approach can also be used to extract conserved quantities, e.g., energy and momentum, in purely physical systems, as we demonstrate empirically.


Assuntos
Algoritmos , Homeostase
14.
Proc Natl Acad Sci U S A ; 120(32): e2218217120, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37523524

RESUMO

The 70-kD heat shock protein (Hsp70) chaperone system is a central hub of the proteostasis network that helps maintain protein homeostasis in all organisms. The recruitment of Hsp70 to perform different and specific cellular functions is regulated by the J-domain protein (JDP) co-chaperone family carrying the small namesake J-domain, required to interact and drive the ATPase cycle of Hsp70s. Besides the J-domain, prokaryotic and eukaryotic JDPs display a staggering diversity in domain architecture, function, and cellular localization. Very little is known about the overall JDP family, despite their essential role in cellular proteostasis, development, and its link to a broad range of human diseases. In this work, we leverage the exponentially increasing number of JDP gene sequences identified across all kingdoms owing to the advancements in sequencing technology and provide a broad overview of the JDP repertoire. Using an automated classification scheme based on artificial neural networks (ANNs), we demonstrate that the sequences of J-domains carry sufficient discriminatory information to reliably recover the phylogeny, localization, and domain composition of the corresponding full-length JDP. By harnessing the interpretability of the ANNs, we find that many of the discriminatory sequence positions match residues that form the interaction interface between the J-domain and Hsp70. This reveals that key residues within the J-domains have coevolved with their obligatory Hsp70 partners to build chaperone circuits for specific functions in cells.


Assuntos
Proteínas de Choque Térmico HSP70 , Chaperonas Moleculares , Humanos , Sequência de Aminoácidos , Genômica , Proteínas de Choque Térmico HSP40/metabolismo , Proteínas de Choque Térmico HSP70/metabolismo , Chaperonas Moleculares/metabolismo , Filogenia
15.
Plant J ; 118(5): 1689-1698, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38310596

RESUMO

Confocal microscopy has greatly aided our understanding of the major cellular processes and trafficking pathways responsible for plant growth and development. However, a drawback of these studies is that they often rely on the manual analysis of a vast number of images, which is time-consuming, error-prone, and subject to bias. To overcome these limitations, we developed Dot Scanner, a Python program for analyzing the densities, lifetimes, and displacements of fluorescently tagged particles in an unbiased, automated, and efficient manner. Dot Scanner was validated by performing side-by-side analysis in Fiji-ImageJ of particles involved in cellulose biosynthesis. We found that the particle densities and lifetimes were comparable in both Dot Scanner and Fiji-ImageJ, verifying the accuracy of Dot Scanner. Dot Scanner largely outperforms Fiji-ImageJ, since it suffers far less selection bias when calculating particle lifetimes and is much more efficient at distinguishing between weak signals and background signal caused by bleaching. Not only does Dot Scanner obtain much more robust results, but it is a highly efficient program, since it automates much of the analyses, shortening workflow durations from weeks to minutes. This free and accessible program will be a highly advantageous tool for analyzing live-cell imaging in plants.


Assuntos
Processamento de Imagem Assistida por Computador , Microscopia Confocal , Software , Processamento de Imagem Assistida por Computador/métodos , Microscopia Confocal/métodos , Células Vegetais
16.
Am J Hum Genet ; 109(2): 270-281, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35063063

RESUMO

In recent years, exome sequencing (ES) has shown great utility in the diagnoses of Mendelian disorders. However, after rigorous filtering, a typical ES analysis still involves the interpretation of hundreds of variants, which greatly hinders the rapid identification of causative genes. Since the interpretations of ES data require comprehensive clinical analyses, taking clinical expertise into consideration can speed the molecular diagnoses of Mendelian disorders. To leverage clinical expertise to prioritize candidate genes, we developed PhenoApt, a phenotype-driven gene prioritization tool that allows users to assign a customized weight to each phenotype, via a machine-learning algorithm. Using the ability to rank causative genes in top-10 lists as an evaluation metric, baseline analysis demonstrated that PhenoApt outperformed previous phenotype-driven gene prioritization tools by a relative increase of 22.7%-140.0% in three independent, real-world, multi-center cohorts (cohort 1, n = 185; cohort 2, n = 784; and cohort 3, n = 208). Additional trials showed that, by adding weights to clinical indications, which should be explained by the causative gene, PhenoApt performance was improved by a relative increase of 37.3% in cohort 2 (n = 471) and 21.4% in cohort 3 (n = 208). Moreover, PhenoApt could assign an intrinsic weight to each phenotype based on the likelihood of its being a Mendelian trait using term frequency-inverse document frequency techniques. When clinical indications were assigned with intrinsic weights, PhenoApt performance was improved by a relative increase of 23.7% in cohort 2 and 15.5% in cohort 3. For the integration of PhenoApt into clinical practice, we developed a user-friendly website and a command-line tool.


Assuntos
Doenças Genéticas Inatas/genética , Perda Auditiva Neurossensorial/genética , Deficiência Intelectual/genética , Aprendizado de Máquina , Microcefalia/genética , Nistagmo Congênito/genética , Escoliose/genética , Estudos de Coortes , Biologia Computacional , Bases de Dados Genéticas , Exoma , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Testes Genéticos , Genótipo , Perda Auditiva Neurossensorial/diagnóstico , Perda Auditiva Neurossensorial/patologia , Humanos , Deficiência Intelectual/diagnóstico , Deficiência Intelectual/patologia , Microcefalia/diagnóstico , Microcefalia/patologia , Nistagmo Congênito/diagnóstico , Nistagmo Congênito/patologia , Fenótipo , Escoliose/diagnóstico , Escoliose/patologia , Software , Sequenciamento do Exoma
17.
Development ; 149(5)2022 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-35262177

RESUMO

Axonal projections from layer V neurons of distinct neocortical areas are topographically organized into discrete clusters within the pontine nuclei during the establishment of voluntary movements. However, the molecular determinants controlling corticopontine connectivity are insufficiently understood. Here, we show that an intrinsic cortical genetic program driven by Nr2f1 graded expression is directly implicated in the organization of corticopontine topographic mapping. Transgenic mice lacking cortical expression of Nr2f1 and exhibiting areal organization defects were used as model systems to investigate the arrangement of corticopontine projections. By combining three-dimensional digital brain atlas tools, Cre-dependent mouse lines and axonal tracing, we show that Nr2f1 expression in postmitotic neurons spatially and temporally controls somatosensory topographic projections, whereas expression in progenitor cells influences the ratio between corticopontine and corticospinal fibres passing the pontine nuclei. We conclude that cortical gradients of area-patterning genes are directly implicated in the establishment of a topographic somatotopic mapping from the cortex onto pontine nuclei.


Assuntos
Mapeamento Encefálico , Ponte , Animais , Axônios , Córtex Cerebral , Camundongos , Vias Neurais/fisiologia , Neurônios , Ponte/fisiologia
18.
Biostatistics ; 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38869057

RESUMO

In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.

19.
Biostatistics ; 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38476094

RESUMO

Linear and generalized linear scalar-on-function modeling have been commonly used to understand the relationship between a scalar response variable (e.g. continuous, binary outcomes) and functional predictors. Such techniques are sensitive to model misspecification when the relationship between the response variable and the functional predictors is complex. On the other hand, support vector machines (SVMs) are among the most robust prediction models but do not take account of the high correlations between repeated measurements and cannot be used for irregular data. In this work, we propose a novel method to integrate functional principal component analysis with SVM techniques for classification and regression to account for the continuous nature of functional data and the nonlinear relationship between the scalar response variable and the functional predictors. We demonstrate the performance of our method through extensive simulation experiments and two real data applications: the classification of alcoholics using electroencephalography signals and the prediction of glucobrassicin concentration using near-infrared reflectance spectroscopy. Our methods especially have more advantages when the measurement errors in functional predictors are relatively large.

20.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36806894

RESUMO

Bioinformatics analysis and visualization of high-throughput gene expression data require extensive computer programming skills, posing a bottleneck for many wet-lab scientists. In this work, we present an intuitive user-friendly platform for gene expression data analysis and visualization called FungiExpresZ. FungiExpresZ aims to help wet-lab scientists with little to no knowledge of computer programming to become self-reliant in bioinformatics analysis and generating publication-ready figures. The platform contains many commonly used data analysis tools and an extensive collection of pre-processed public ribonucleic acid sequencing (RNA-seq) datasets of many fungal species, including important human, plant and insect pathogens. Users may analyse their data alone or in combination with public RNA-seq data for an integrated analysis. The FungiExpresZ platform helps wet-lab scientists to overcome their limitations in genomics data analysis and can be applied to analyse data of any organism. FungiExpresZ is available as an online web-based tool (https://cparsania.shinyapps.io/FungiExpresZ/) and an offline R-Shiny package (https://github.com/cparsania/FungiExpresZ).


Assuntos
Genômica , Software , Humanos , Perfilação da Expressão Gênica , Análise de Dados , RNA/genética , Expressão Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA