Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 16.802
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 179(2): 527-542.e19, 2019 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-31585086

RESUMO

Much of current molecular and cell biology research relies on the ability to purify cell types by fluorescence-activated cell sorting (FACS). FACS typically relies on the ability to label cell types of interest with antibodies or fluorescent transgenic constructs. However, antibody availability is often limited, and genetic manipulation is labor intensive or impossible in the case of primary human tissue. To date, no systematic method exists to enrich for cell types without a priori knowledge of cell-type markers. Here, we propose GateID, a computational method that combines single-cell transcriptomics with FACS index sorting to purify cell types of choice using only native cellular properties such as cell size, granularity, and mitochondrial content. We validate GateID by purifying various cell types from zebrafish kidney marrow and the human pancreas to high purity without resorting to specific antibodies or transgenes.


Assuntos
Separação Celular/métodos , Citometria de Fluxo/métodos , Software , Transcriptoma , Animais , Humanos , Rim/citologia , Pâncreas/citologia , Análise de Célula Única , Peixe-Zebra/anatomia & histologia
2.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38701746

RESUMO

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Assuntos
Asma , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Asma/genética , Cadeias de Markov , Colite Ulcerativa/genética , Reprodutibilidade dos Testes , Fenótipo , Genótipo
3.
Proc Natl Acad Sci U S A ; 121(16): e2317602121, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38598346

RESUMO

Algorithmic bias occurs when algorithms incorporate biases in the human decisions on which they are trained. We find that people see more of their biases (e.g., age, gender, race) in the decisions of algorithms than in their own decisions. Research participants saw more bias in the decisions of algorithms trained on their decisions than in their own decisions, even when those decisions were the same and participants were incentivized to reveal their true beliefs. By contrast, participants saw as much bias in the decisions of algorithms trained on their decisions as in the decisions of other participants and algorithms trained on the decisions of other participants. Cognitive psychological processes and motivated reasoning help explain why people see more of their biases in algorithms. Research participants most susceptible to bias blind spot were most likely to see more bias in algorithms than self. Participants were also more likely to perceive algorithms than themselves to have been influenced by irrelevant biasing attributes (e.g., race) but not by relevant attributes (e.g., user reviews). Because participants saw more of their biases in algorithms than themselves, they were more likely to make debiasing corrections to decisions attributed to an algorithm than to themselves. Our findings show that bias is more readily perceived in algorithms than in self and suggest how to use algorithms to reveal and correct biased human decisions.


Assuntos
Motivação , Resolução de Problemas , Humanos , Viés , Algoritmos
4.
Proc Natl Acad Sci U S A ; 121(12): e2304866121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38483992

RESUMO

Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.


Assuntos
Microscopia , Análise Espectral Raman , Humanos , Microscopia/métodos , Análise Espectral Raman/métodos , Glândula Tireoide , Microscopia Óptica não Linear , Aprendizado de Máquina
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38385874

RESUMO

The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.


Assuntos
Bactérias , Estruturas Cromossômicas , Células Procarióticas , Cromossomos Bacterianos/genética , Algoritmos , Escherichia coli/genética
6.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38855914

RESUMO

Cluster analysis, a pivotal step in single-cell sequencing data analysis, presents substantial opportunities to effectively unveil the molecular mechanisms underlying cellular heterogeneity and intercellular phenotypic variations. However, the inherent imperfections arise as different clustering algorithms yield diverse estimates of cluster numbers and cluster assignments. This study introduces Single Cell Consistent Clustering based on Spectral Matrix Decomposition (SCSMD), a comprehensive clustering approach that integrates the strengths of multiple methods to determine the optimal clustering scheme. Testing the performance of SCSMD across different distances and employing the bespoke evaluation metric, the methodological selection undergoes validation to ensure the optimal efficacy of the SCSMD. A consistent clustering test is conducted on 15 authentic scRNA-seq datasets. The application of SCSMD to human embryonic stem cell scRNA-seq data successfully identifies known cell types and delineates their developmental trajectories. Similarly, when applied to glioblastoma cells, SCSMD accurately detects pre-existing cell types and provides finer sub-division within one of the original clusters. The results affirm the robust performance of our SCSMD method in terms of both the number of clusters and cluster assignments. Moreover, we have broadened the application scope of SCSMD to encompass larger datasets, thereby furnishing additional evidence of its superiority. The findings suggest that SCSMD is poised for application to additional scRNA-seq datasets and for further downstream analyses.


Assuntos
Algoritmos , Análise de Célula Única , Humanos , Análise de Célula Única/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Glioblastoma/genética , Glioblastoma/patologia , Glioblastoma/metabolismo
7.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39101500

RESUMO

Genomic selection (GS) has emerged as an effective technology to accelerate crop hybrid breeding by enabling early selection prior to phenotype collection. Genomic best linear unbiased prediction (GBLUP) is a robust method that has been routinely used in GS breeding programs. However, GBLUP assumes that markers contribute equally to the total genetic variance, which may not be the case. In this study, we developed a novel GS method called GA-GBLUP that leverages the genetic algorithm (GA) to select markers related to the target trait. We defined four fitness functions for optimization, including AIC, BIC, R2, and HAT, to improve the predictability and bin adjacent markers based on the principle of linkage disequilibrium to reduce model dimension. The results demonstrate that the GA-GBLUP model, equipped with R2 and HAT fitness function, produces much higher predictability than GBLUP for most traits in rice and maize datasets, particularly for traits with low heritability. Moreover, we have developed a user-friendly R package, GAGBLUP, for GS, and the package is freely available on CRAN (https://CRAN.R-project.org/package=GAGBLUP).


Assuntos
Algoritmos , Genômica , Seleção Genética , Zea mays , Genômica/métodos , Zea mays/genética , Oryza/genética , Modelos Genéticos , Melhoramento Vegetal/métodos , Desequilíbrio de Ligação , Fenótipo , Locos de Características Quantitativas , Genoma de Planta , Polimorfismo de Nucleotídeo Único , Software
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38801700

RESUMO

irGSEA is an R package designed to assess the outcomes of various gene set scoring methods when applied to single-cell RNA sequencing data. This package incorporates six distinct scoring methods that rely on the expression ranks of genes, emphasizing relative expression levels over absolute values. The implemented methods include AUCell, UCell, singscore, ssGSEA, JASMINE and Viper. Previous studies have demonstrated the robustness of these methods to variations in dataset size and composition, generating enrichment scores based solely on the relative gene expression of individual cells. By employing the robust rank aggregation algorithm, irGSEA amalgamates results from all six methods to ascertain the statistical significance of target gene sets across diverse scoring methods. The package prioritizes user-friendliness, allowing direct input of expression matrices or seamless interaction with Seurat objects. Furthermore, it facilitates a comprehensive visualization of results. The irGSEA package and its accompanying documentation are accessible on GitHub (https://github.com/chuiqin/irGSEA).


Assuntos
Algoritmos , Análise de Célula Única , Software , Análise de Célula Única/métodos , Humanos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos
9.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388680

RESUMO

CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes/métodos , RNA Guia de Sistemas CRISPR-Cas , Redes Neurais de Computação
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557679

RESUMO

The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Proteínas/química , Conformação Proteica , Domínio Catalítico
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557673

RESUMO

IMPRINTS-CETSA (Integrated Modulation of Protein Interaction States-Cellular Thermal Shift Assay) provides a highly resolved means to systematically study the interactions of proteins with other cellular components, including metabolites, nucleic acids and other proteins, at the proteome level, but no freely available and user-friendly data analysis software has been reported. Here, we report IMPRINTS.CETSA, an R package that provides the basic data processing framework for robust analysis of the IMPRINTS-CETSA data format, from preprocessing and normalization to visualization. We also report an accompanying R package, IMPRINTS.CETSA.app, which offers a user-friendly Shiny interface for analysis and interpretation of IMPRINTS-CETSA results, with seamless features such as functional enrichment and mapping to other databases at a single site. For the hit generation part, the diverse behaviors of protein modulations have been typically segregated with a two-measure scoring method, i.e. the abundance and thermal stability changes. We present a new algorithm to classify modulated proteins in IMPRINTS-CETSA experiments by a robust single-measure scoring. In this way, both the numerical changes and the statistical significances of the IMPRINTS information can be visualized on a single plot. The IMPRINTS.CETSA and IMPRINTS.CETSA.app R packages are freely available on GitHub at https://github.com/nkdailingyun/IMPRINTS.CETSA and https://github.com/mgerault/IMPRINTS.CETSA.app, respectively. IMPRINTS.CETSA.app is also available as an executable program at https://zenodo.org/records/10636134.


Assuntos
Aplicativos Móveis , Software , Proteoma , Algoritmos , Projetos de Pesquisa
12.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39073832

RESUMO

Herbal medicines, particularly traditional Chinese medicines (TCMs), are a rich source of natural products with significant therapeutic potential. However, understanding their mechanisms of action is challenging due to the complexity of their multi-ingredient compositions. We introduced Herb-CMap, a multimodal fusion framework leveraging protein-protein interactions and herb-perturbed gene expression signatures. Utilizing a network-based heat diffusion algorithm, Herb-CMap creates a connectivity map linking herb perturbations to their therapeutic targets, thereby facilitating the prioritization of active ingredients. As a case study, we applied Herb-CMap to Suhuang antitussive capsule (Suhuang), a TCM formula used for treating cough variant asthma (CVA). Using in vivo rat models, our analysis established the transcriptomic signatures of Suhuang and identified its key compounds, such as quercetin and luteolin, and their target genes, including IL17A, PIK3CB, PIK3CD, AKT1, and TNF. These drug-target interactions inhibit the IL-17 signaling pathway and deactivate PI3K, AKT, and NF-κB, effectively reducing lung inflammation and alleviating CVA. The study demonstrates the efficacy of Herb-CMap in elucidating the molecular mechanisms of herbal medicines, offering valuable insights for advancing drug discovery in TCM.


Assuntos
Antitussígenos , Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Animais , Medicamentos de Ervas Chinesas/farmacologia , Medicamentos de Ervas Chinesas/uso terapêutico , Medicina Tradicional Chinesa/métodos , Ratos , Antitussígenos/farmacologia , Antitussígenos/uso terapêutico , Mapas de Interação de Proteínas/efeitos dos fármacos , Asma/tratamento farmacológico , Asma/metabolismo , Asma/genética , Transdução de Sinais/efeitos dos fármacos , Tosse/tratamento farmacológico , Transcriptoma , Humanos
13.
Proc Natl Acad Sci U S A ; 120(35): e2309062120, 2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37603744

RESUMO

Identifying efficient and accurate optimization algorithms is a long-desired goal for the scientific community. At present, a combination of evolutionary and deep-learning methods is widely used for optimization. In this paper, we demonstrate three cases involving different physics and conclude that no matter how accurate a deep-learning model is for a single, specific problem, a simple combination of evolutionary and deep-learning methods cannot achieve the desired optimization because of the intrinsic nature of the evolutionary method. We begin by using a physics-supervised deep-learning optimization algorithm (PSDLO) to supervise the results from the deep-learning model. We then intervene in the evolutionary process to eventually achieve simultaneous accuracy and efficiency. PSDLO is successfully demonstrated using both sufficient and insufficient datasets. PSDLO offers a perspective for solving optimization problems and can tackle complex science and engineering problems having many features. This approach to optimization algorithms holds tremendous potential for application in real-world engineering domains.

14.
Proc Natl Acad Sci U S A ; 120(33): e2218961120, 2023 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-37549301

RESUMO

Thinking about God promotes greater acceptance of Artificial intelligence (AI)-based recommendations. Eight preregistered experiments (n = 2,462) reveal that when God is salient, people are more willing to consider AI-based recommendations than when God is not salient. Studies 1 and 2a to 2d demonstrate across a wide variety of contexts, from choosing entertainment and food to mutual funds and dental procedures, that God salience reduces reliance on human recommenders and heightens willingness to consider AI recommendations. Studies 3 and 4 demonstrate that the reduced reliance on humans is driven by a heightened feeling of smallness when God is salient, followed by a recognition of human fallibility. Study 5 addresses the similarity in mysteriousness between God and AI as an alternative, but unsupported, explanation. Finally, study 6 (n = 53,563) corroborates the experimental results with data from 21 countries on the usage of robo-advisors in financial decision-making.


Assuntos
Inteligência Artificial , Tomada de Decisões , Humanos , Inquéritos e Questionários
15.
Proc Natl Acad Sci U S A ; 120(22): e2212323120, 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37216545

RESUMO

An independent set (IS) is a set of vertices in a graph such that no edge connects any two vertices. In adiabatic quantum computation [E. Farhi, et al., Science 292, 472-475 (2001); A. Das, B. K. Chakrabarti, Rev. Mod. Phys. 80, 1061-1081 (2008)], a given graph G(V, E) can be naturally mapped onto a many-body Hamiltonian [Formula: see text], with edges [Formula: see text] being the two-body interactions between adjacent vertices [Formula: see text]. Thus, solving the IS problem is equivalent to finding all the computational basis ground states of [Formula: see text]. Very recently, non-Abelian adiabatic mixing (NAAM) has been proposed to address this task, exploiting an emergent non-Abelian gauge symmetry of [Formula: see text] [B. Wu, H. Yu, F. Wilczek, Phys. Rev. A 101, 012318 (2020)]. Here, we solve a representative IS problem [Formula: see text] by simulating the NAAM digitally using a linear optical quantum network, consisting of three C-Phase gates, four deterministic two-qubit gate arrays (DGA), and ten single rotation gates. The maximum IS has been successfully identified with sufficient Trotterization steps and a carefully chosen evolution path. Remarkably, we find IS with a total probability of 0.875(16), among which the nontrivial ones have a considerable weight of about 31.4%. Our experiment demonstrates the potential advantage of NAAM for solving IS-equivalent problems.

16.
Proc Natl Acad Sci U S A ; 120(21): e2218775120, 2023 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-37186832

RESUMO

Quantum computing technology may soon deliver revolutionary improvements in algorithmic performance, but it is useful only if computed answers are correct. While hardware-level decoherence errors have garnered significant attention, a less recognized obstacle to correctness is that of human programming errors-"bugs." Techniques familiar to most programmers from the classical domain for avoiding, discovering, and diagnosing bugs do not easily transfer, at scale, to the quantum domain because of its unique characteristics. To address this problem, we have been working to adapt formal methods to quantum programming. With such methods, a programmer writes a mathematical specification alongside the program and semiautomatically proves the program correct with respect to it. The proof's validity is automatically confirmed-certified-by a "proof assistant." Formal methods have successfully yielded high-assurance classical software artifacts, and the underlying technology has produced certified proofs of major mathematical theorems. As a demonstration of the feasibility of applying formal methods to quantum programming, we present a formally certified end-to-end implementation of Shor's prime factorization algorithm, developed as part of a framework for applying the certified approach to general applications. By leveraging our framework, one can significantly reduce the effects of human errors and obtain a high-assurance implementation of large-scale quantum applications in a principled way.

17.
Proc Natl Acad Sci U S A ; 120(31): e2216021120, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37490532

RESUMO

Wastewater monitoring has provided health officials with early warnings for new COVID-19 outbreaks, but to date, no approach has been validated to distinguish signal (sustained surges) from noise (background variability) in wastewater data to alert officials to the need for heightened public health response. We analyzed 62 wk of data from 19 sites participating in the North Carolina Wastewater Monitoring Network to characterize wastewater metrics around the Delta and Omicron surges. We found that wastewater data identified outbreaks 4 to 5 d before case data (reported on the earlier of the symptom start date or test collection date), on average. At most sites, correlations between wastewater and case data were similar regardless of how wastewater concentrations were normalized and whether calculated with county-level or sewershed-level cases, suggesting that officials may not need to geospatially align case data with sewershed boundaries to gain insights into disease transmission. Although wastewater trend lines captured clear differences in the Delta versus Omicron surge trajectories, no single wastewater metric (detectability, percent change, or flow-population normalized viral concentrations) reliably signaled when these surges started. After iteratively examining different combinations of these three metrics, we developed the Covid-SURGE (Signaling Unprecedented Rises in Groupwide Exposure) algorithm, which identifies unprecedented signals in the wastewater data. With a true positive rate of 82%, a false positive rate of 7%, and strong performance during both surges and in small and large sites, our algorithm provides public health officials with an automated way to flag community-level COVID-19 surges in real time.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Águas Residuárias , Algoritmos , Benchmarking , Surtos de Doenças , RNA Viral
18.
Proc Natl Acad Sci U S A ; 120(18): e2216507120, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37094135

RESUMO

The number of noisy images required for molecular reconstruction in single-particle cryoelectron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain theoretical and computational contributions for this challenging nonlinear inverse problem with sparsity constraints. We prove that molecular structures modeled as sums of Gaussians are uniquely determined by the second-order autocorrelation of their projection images, implying that the sample complexity is proportional to the square of the variance of the noise. This theory improves upon the nonsparse case, where the third-order autocorrelation is required for uniformly oriented particle images and the sample complexity scales with the cube of the noise variance. Furthermore, we build a computational framework to reconstruct molecular structures which are sparse in the wavelet basis. This method combines the sparse representation for the molecule with projection-based techniques used for phase retrieval in X-ray crystallography.

19.
Genet Epidemiol ; 48(1): 3-26, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37830494

RESUMO

Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.


Assuntos
Variações do Número de Cópias de DNA , Modelos Genéticos , Humanos , Haplótipos , Frequência do Gene , Genótipo
20.
Am J Hum Genet ; 109(3): 446-456, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35216679

RESUMO

Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Probabilidade
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa