Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 17.031
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 187(7): 1745-1761.e19, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38518772

RESUMO

Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task-driven modeling approach to investigate the neural code of proprioceptive neurons in cuneate nucleus (CN) and somatosensory cortex area 2 (S1). We simulated muscle spindle signals through musculoskeletal modeling and generated a large-scale movement repertoire to train neural networks based on 16 hypotheses, each representing different computational goals. We found that the emerging, task-optimized internal representations generalize from synthetic data to predict neural dynamics in CN and S1 of primates. Computational tasks that aim to predict the limb position and velocity were the best at predicting the neural activity in both areas. Since task optimization develops representations that better predict neural activity during active than passive movements, we postulate that neural activity in the CN and S1 is top-down modulated during goal-directed movements.


Assuntos
Neurônios , Propriocepção , Animais , Propriocepção/fisiologia , Neurônios/fisiologia , Encéfalo/fisiologia , Movimento/fisiologia , Primatas , Redes Neurais de Computação
2.
Proc Natl Acad Sci U S A ; 121(31): e2401246121, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39052832

RESUMO

Modern science is dependent on imaging on the nanoscale, often achieved through processes that detect secondary electrons created by a highly focused incident charged particle beam. Multiple types of measurement noise limit the ultimate trade-off between the image quality and the incident particle dose, which can preclude useful imaging of dose-sensitive samples. Existing methods to improve image quality do not fundamentally mitigate the noise sources. Furthermore, barriers to assigning a physically meaningful scale make the images qualitative. Here, we introduce ion count-aided microscopy (ICAM), which is a quantitative imaging technique that uses statistically principled estimation of the secondary electron yield. With a readily implemented change in data collection, ICAM substantially reduces source shot noise. In helium ion microscopy, we demonstrate 3[Formula: see text] dose reduction and a good match between these empirical results and theoretical performance predictions. ICAM facilitates imaging of fragile samples and may make imaging with heavier particles more attractive.

3.
Hum Mol Genet ; 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38981621

RESUMO

Early or late pubertal onset can lead to disease in adulthood, including cancer, obesity, type 2 diabetes, metabolic disorders, bone fractures, and psychopathologies. Thus, knowing the age at which puberty is attained is crucial as it can serve as a risk factor for future diseases. Pubertal development is divided into five stages of sexual maturation in boys and girls according to the standardized Tanner scale. We performed genome-wide association studies (GWAS) on the "Growth and Obesity Chilean Cohort Study" cohort composed of admixed children with mainly European and Native American ancestry. Using joint models that integrate time-to-event data with longitudinal trajectories of body mass index (BMI), we identified genetic variants associated with phenotypic transitions between pairs of Tanner stages. We identified $42$ novel significant associations, most of them in boys. The GWAS on Tanner $3\rightarrow 4$ transition in boys captured an association peak around the growth-related genes LARS2 and LIMD1 genes, the former of which causes ovarian dysfunction when mutated. The associated variants are expression and splicing Quantitative Trait Loci regulating gene expression and alternative splicing in multiple tissues. Further, higher individual Native American genetic ancestry proportions predicted a significantly earlier puberty onset in boys but not in girls. Finally, the joint models identified a longitudinal BMI parameter significantly associated with several Tanner stages' transitions, confirming the association of BMI with pubertal timing.

4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426325

RESUMO

Accurate metabolite annotation and false discovery rate (FDR) control remain challenging in large-scale metabolomics. Recent progress leveraging proteomics experiences and interdisciplinary inspirations has provided valuable insights. While target-decoy strategies have been introduced, generating reliable decoy libraries is difficult due to metabolite complexity. Moreover, continuous bioinformatics innovation is imperative to improve the utilization of expanding spectral resources while reducing false annotations. Here, we introduce the concept of ion entropy for metabolomics and propose two entropy-based decoy generation approaches. Assessment of public databases validates ion entropy as an effective metric to quantify ion information in massive metabolomics datasets. Our entropy-based decoy strategies outperform current representative methods in metabolomics and achieve superior FDR estimation accuracy. Analysis of 46 public datasets provides instructive recommendations for practical application.


Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Entropia , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas
5.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38960406

RESUMO

Spatial transcriptomics data play a crucial role in cancer research, providing a nuanced understanding of the spatial organization of gene expression within tumor tissues. Unraveling the spatial dynamics of gene expression can unveil key insights into tumor heterogeneity and aid in identifying potential therapeutic targets. However, in many large-scale cancer studies, spatial transcriptomics data are limited, with bulk RNA-seq and corresponding Whole Slide Image (WSI) data being more common (e.g. TCGA project). To address this gap, there is a critical need to develop methodologies that can estimate gene expression at near-cell (spot) level resolution from existing WSI and bulk RNA-seq data. This approach is essential for reanalyzing expansive cohort studies and uncovering novel biomarkers that have been overlooked in the initial assessments. In this study, we present STGAT (Spatial Transcriptomics Graph Attention Network), a novel approach leveraging Graph Attention Networks (GAT) to discern spatial dependencies among spots. Trained on spatial transcriptomics data, STGAT is designed to estimate gene expression profiles at spot-level resolution and predict whether each spot represents tumor or non-tumor tissue, especially in patient samples where only WSI and bulk RNA-seq data are available. Comprehensive tests on two breast cancer spatial transcriptomics datasets demonstrated that STGAT outperformed existing methods in accurately predicting gene expression. Further analyses using the TCGA breast cancer dataset revealed that gene expression estimated from tumor-only spots (predicted by STGAT) provides more accurate molecular signatures for breast cancer sub-type and tumor stage prediction, and also leading to improved patient survival and disease-free analysis. Availability: Code is available at https://github.com/compbiolabucf/STGAT.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Transcriptoma , Humanos , RNA-Seq/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Regulação Neoplásica da Expressão Gênica , Biologia Computacional/métodos , Feminino , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo
6.
Proc Natl Acad Sci U S A ; 120(30): e2217601120, 2023 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-37467271

RESUMO

Armed conflict, displacement and food insecurity have affected Adamawa, Borno, and Yobe states of northeast Nigeria (population ≈ 12 million) since 2009. Insecurity escalated in 2013 to 2015, but the humanitarian response was delayed and the crisis' health impact was unquantified due to incomplete death registration and limited ground access. We estimated mortality attributable to this crisis using a small-area estimation approach that circumvented these challenges. We fitted a mixed effects model to household mortality data collected as part of 70 ground surveys implemented by humanitarian actors. Model predictors, drawn from existing data, included livelihood typology, staple cereal price, vaccination geocoverage, and humanitarian actor presence. To project accurate death tolls, we reconstructed population denominators based on forced displacement. We used the model and population estimates to project mortality under observed conditions and varying assumed counterfactual conditions, had there been no crisis, with the difference providing excess mortality. Death rates were highly elevated across most ground surveys, with net negative household migration. Between April 2016 and December 2019, we projected 490,000 excess deaths (230,000 children under 5 y) in the most likely counterfactual scenario, with a range from 90,000 (best-case) to 550,000 (worst-case). Death rates were two to three times higher than counterfactual levels, double the projected national rate, and highest in 2016 to 2017. Despite limited scope (we could not study the situation before 2016 or in neighboring affected countries), our findings suggest a staggering health impact of this crisis. Further studies to document mortality in this and other crises are needed to guide decision-making and memorialize their human toll.


Assuntos
Convulsões , Vacinação , Criança , Humanos , Nigéria/epidemiologia , Previsões , Conflitos Armados
7.
Proc Natl Acad Sci U S A ; 120(21): e2212795120, 2023 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-37192165

RESUMO

Kinetic proofreading (KPR) has been used as a paradigmatic explanation for the high specificity of ligand discrimination by cellular receptors. KPR enhances the difference in the mean receptor occupancy between different ligands compared to a nonproofread receptor, thus potentially enabling better discrimination. On the other hand, proofreading also attenuates the signal and introduces additional stochastic receptor transitions relative to a nonproofreading receptor. This increases the relative magnitude of noise in the downstream signal, which can interfere with reliable ligand discrimination. To understand the effect of noise on ligand discrimination beyond the comparison of the mean signals, we formulate the task of ligand discrimination as a problem of statistical estimation of the receptor affinity of ligands based on the molecular signaling output. Our analysis reveals that proofreading typically worsens ligand resolution compared to a nonproofread receptor. Furthermore, the resolution decreases further with more proofreading steps under most commonly biologically considered conditions. This contrasts with the usual notion that KPR universally improves ligand discrimination with additional proofreading steps. Our results are consistent across a variety of different proofreading schemes and metrics of performance, suggesting that they are inherent to the KPR mechanism itself rather than any particular model of molecular noise. Based on our results, we suggest alternative roles for KPR schemes such as multiplexing and combinatorial encoding in multi-ligand/multi-output pathways.


Assuntos
Receptores de Superfície Celular , Transdução de Sinais , Ligantes , Receptores de Superfície Celular/metabolismo , Cinética
8.
Proc Natl Acad Sci U S A ; 120(9): e2218375120, 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36821583

RESUMO

The recent increase in openly available ancient human DNA samples allows for large-scale meta-analysis applications. Trans-generational past human mobility is one of the key aspects that ancient genomics can contribute to since changes in genetic ancestry-unlike cultural changes seen in the archaeological record-necessarily reflect movements of people. Here, we present an algorithm for spatiotemporal mapping of genetic profiles, which allow for direct estimates of past human mobility from large ancient genomic datasets. The key idea of the method is to derive a spatial probability surface of genetic similarity for each individual in its respective past. This is achieved by first creating an interpolated ancestry field through space and time based on multivariate statistics and Gaussian process regression and then using this field to map the ancient individuals into space according to their genetic profile. We apply this algorithm to a dataset of 3138 aDNA samples with genome-wide data from Western Eurasia in the last 10,000 y. Finally, we condense this sample-wise record with a simple summary statistic into a diachronic measure of mobility for subregions in Western, Central, and Southern Europe. For regions and periods with sufficient data coverage, our similarity surfaces and mobility estimates show general concordance with previous results and provide a meta-perspective of genetic changes and human mobility.


Assuntos
DNA Antigo , Genômica , Humanos , História Antiga , DNA Antigo/análise , Europa (Continente)
9.
Proc Natl Acad Sci U S A ; 120(7): e2216415120, 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36763529

RESUMO

Computational models have become a powerful tool in the quantitative sciences to understand the behavior of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet, many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper, we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multiagent models acting as forward solvers for systems of ordinary or stochastic differential equations and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a nonconvex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster.

10.
J Neurosci ; 44(12)2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38199865

RESUMO

Regression is a key feature of neurodevelopmental disorders such as autism spectrum disorder, Fragile X syndrome, and Rett syndrome (RTT). RTT is caused by mutations in the X-linked gene methyl-CpG-binding protein 2 (MECP2). It is characterized by an early period of typical development with subsequent regression of previously acquired motor and speech skills in girls. The syndromic phenotypes are individualistic and dynamic over time. Thus far, it has been difficult to capture these dynamics and syndromic heterogeneity in the preclinical Mecp2-heterozygous female mouse model (Het). The emergence of computational neuroethology tools allows for robust analysis of complex and dynamic behaviors to model endophenotypes in preclinical models. Toward this first step, we utilized DeepLabCut, a marker-less pose estimation software to quantify trajectory kinematics and multidimensional analysis to characterize behavioral heterogeneity in Het in the previously benchmarked, ethologically relevant social cognition task of pup retrieval. We report the identification of two distinct phenotypes of adult Het: Het that display a delay in efficiency in early days and then improve over days like wild-type mice and Het that regress and perform worse in later days. Furthermore, regression is dependent on age and behavioral context and can be detected in the initial days of retrieval. Together, the novel identification of two populations of Het suggests differential effects on neural circuitry, opens new avenues to investigate the underlying molecular and cellular mechanisms of heterogeneity, and designs better studies for stratifying therapeutics.


Assuntos
Transtorno do Espectro Autista , Síndrome de Rett , Humanos , Feminino , Animais , Camundongos , Síndrome de Rett/genética , Síndrome de Rett/metabolismo , Proteína 2 de Ligação a Metil-CpG/genética , Proteína 2 de Ligação a Metil-CpG/metabolismo , Fenótipo , Mutação/genética , Comportamento Social , Modelos Animais de Doenças
11.
Annu Rev Med ; 74: 385-400, 2023 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-36706748

RESUMO

In 2020, the nephrology community formally interrogated long-standing race-based clinical algorithms used in the field, including the kidney function estimation equations. A comprehensive understanding of the history of kidney function estimation and racial essentialism is necessary to understand underpinnings of the incorporation of a Black race coefficient into prior equations. We provide a review of this history, as well as the considerations used to develop race-free equations that are a guidepost for a more equity-oriented, scientifically rigorous future for kidney function estimation and other clinical algorithms and processes in which race may be embedded as a variable.


Assuntos
Rim , Grupos Raciais , Humanos , Rim/fisiologia , População Negra
12.
Biostatistics ; 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39083810

RESUMO

This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.

13.
Biostatistics ; 25(2): 429-448, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-37531620

RESUMO

Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events, and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the integrated nested Laplace approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with fewer restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared with alternative estimation strategies. We further apply the methodology to analyze five longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.


Assuntos
Algoritmos , Modelos Estatísticos , Humanos , Teorema de Bayes , Simulação por Computador , Método de Monte Carlo , Estudos Longitudinais
14.
Biostatistics ; 25(2): 323-335, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-37475638

RESUMO

The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.


Assuntos
Registros Eletrônicos de Saúde , Infecções por HIV , Compostos Heterocíclicos com 3 Anéis , Piperazinas , Piridonas , Humanos , Heterogeneidade da Eficácia do Tratamento , Oxazinas , Infecções por HIV/tratamento farmacológico
15.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37529921

RESUMO

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for uncovering cellular heterogeneity. However, the high costs associated with this technique have rendered it impractical for studying large patient cohorts. We introduce ENIGMA (Deconvolution based on Regularized Matrix Completion), a method that addresses this limitation through accurately deconvoluting bulk tissue RNA-seq data into a readout with cell-type resolution by leveraging information from scRNA-seq data. By employing a matrix completion strategy, ENIGMA minimizes the distance between the mixture transcriptome obtained with bulk sequencing and a weighted combination of cell-type-specific expression. This allows the quantification of cell-type proportions and reconstruction of cell-type-specific transcriptomes. To validate its performance, ENIGMA was tested on both simulated and real datasets, including disease-related tissues, demonstrating its ability in uncovering novel biological insights.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Perfilação da Expressão Gênica/métodos , Software , RNA-Seq/métodos , Análise de Sequência de RNA/métodos
16.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37539831

RESUMO

Duplex sequencing technology has been widely used in the detection of low-frequency mutations in circulating tumor deoxyribonucleic acid (DNA), but how to determine the sequencing depth and other experimental parameters to ensure the stable detection of low-frequency mutations is still an urgent problem to be solved. The mutation detection rules of duplex sequencing constrain not only the number of mutated templates but also the number of mutation-supportive reads corresponding to each forward and reverse strand of the mutated templates. To tackle this problem, we proposed a Depth Estimation model for stable detection of Low-Frequency MUTations in duplex sequencing (DELFMUT), which models the identity correspondence and quantitative relationships between templates and reads using the zero-truncated negative binomial distribution without considering the sequences composed of bases. The results of DELFMUT were verified by real duplex sequencing data. In the case of known mutation frequency and mutation detection rule, DELFMUT can recommend the combinations of DNA input and sequencing depth to guarantee the stable detection of mutations, and it has a great application value in guiding the experimental parameter setting of duplex sequencing technology.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Neoplasias/genética , Taxa de Mutação , DNA
17.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37096633

RESUMO

In cryogenic electron microscopy (cryo-EM) single particle analysis (SPA), high-resolution three-dimensional structures of biological macromolecules are determined by iteratively aligning and averaging a large number of two-dimensional projections of molecules. Since the correlation measures are sensitive to the signal-to-noise ratio, various parameter estimation steps in SPA will be disturbed by the high-intensity noise in cryo-EM. However, denoising algorithms tend to damage high frequencies and suppress mid- and high-frequency contrast of micrographs, which exactly the precise parameter estimation relies on, therefore, limiting their application in SPA. In this study, we suggest combining a cryo-EM image processing pipeline with denoising and maximizing the signal's contribution in various parameter estimation steps. To solve the inherent flaws of denoising algorithms, we design an algorithm named MScale to correct the amplitude distortion caused by denoising and propose a new orientation determination strategy to compensate for the high-frequency loss. In the experiments on several real datasets, the denoised particles are successfully applied in the class assignment estimation and orientation determination tasks, ultimately enhancing the quality of biomacromolecule reconstruction. The case study on classification indicates that our strategy not only improves the resolution of difficult classes (up to 5 Å) but also resolves an additional class. In the case study on orientation determination, our strategy improves the resolution of the final reconstructed density map by 0.34 Å compared with conventional strategy. The code is available at https://github.com/zhanghui186/Mscale.


Assuntos
Processamento de Imagem Assistida por Computador , Imagem Individual de Molécula , Microscopia Crioeletrônica/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Razão Sinal-Ruído
18.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36573494

RESUMO

Machine learning including modern deep learning models has been extensively used in drug design and screening. However, reliable prediction of molecular properties is still challenging when exploring out-of-domain regimes, even for deep neural networks. Therefore, it is important to understand the uncertainty of model predictions, especially when the predictions are used to guide further experiments. In this study, we explored the utility and effectiveness of evidential uncertainty in compound screening. The evidential Graphormer model was proposed for uncertainty-guided discovery of KDM1A/LSD1 inhibitors. The benchmarking results illustrated that (i) Graphormer exhibited comparative predictive power to state-of-the-art models, and (ii) evidential regression enabled well-ranked uncertainty estimates and calibrated predictions. Subsequently, we leveraged time-splitting on the curated KDM1A/LSD1 dataset to simulate out-of-distribution predictions. The retrospective virtual screening showed that the evidential uncertainties helped reduce false positives among the top-acquired compounds and thus enabled higher experimental validation rates. The trained model was then used to virtually screen an independent in-house compound set. The top 50 compounds ranked by two different ranking strategies were experimentally validated, respectively. In general, our study highlighted the importance to understand the uncertainty in prediction, which can be recognized as an interpretable dimension to model predictions.


Assuntos
Histonas , Lisina , Estudos Retrospectivos , Incerteza , Histona Desmetilases/metabolismo
19.
Mol Syst Biol ; 2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-39095427

RESUMO

Crosslinking mass spectrometry is a powerful tool to study protein-protein interactions under native or near-native conditions in complex mixtures. Through novel search controls, we show how biassing results towards likely correct proteins can subtly undermine error estimation of crosslinks, with significant consequences. Without adjustments to address this issue, we have misidentified an average of 260 interspecies protein-protein interactions across 16 analyses in which we synthetically mixed data of different species, misleadingly suggesting profound biological connections that do not exist. We also demonstrate how data analysis procedures can be tested and refined to restore the integrity of the decoy-false positive relationship, a crucial element for reliably identifying protein-protein interactions.

20.
Syst Biol ; 2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38733563

RESUMO

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA