ABSTRACT
Much of current molecular and cell biology research relies on the ability to purify cell types by fluorescence-activated cell sorting (FACS). FACS typically relies on the ability to label cell types of interest with antibodies or fluorescent transgenic constructs. However, antibody availability is often limited, and genetic manipulation is labor intensive or impossible in the case of primary human tissue. To date, no systematic method exists to enrich for cell types without a priori knowledge of cell-type markers. Here, we propose GateID, a computational method that combines single-cell transcriptomics with FACS index sorting to purify cell types of choice using only native cellular properties such as cell size, granularity, and mitochondrial content. We validate GateID by purifying various cell types from zebrafish kidney marrow and the human pancreas to high purity without resorting to specific antibodies or transgenes.
Subject(s)
Cell Separation/methods , Flow Cytometry/methods , Software , Transcriptome , Animals , Humans , Kidney/cytology , Pancreas/cytology , Single-Cell Analysis , Zebrafish/anatomy & histologyABSTRACT
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
Subject(s)
Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , GenotypeABSTRACT
Algorithmic bias occurs when algorithms incorporate biases in the human decisions on which they are trained. We find that people see more of their biases (e.g., age, gender, race) in the decisions of algorithms than in their own decisions. Research participants saw more bias in the decisions of algorithms trained on their decisions than in their own decisions, even when those decisions were the same and participants were incentivized to reveal their true beliefs. By contrast, participants saw as much bias in the decisions of algorithms trained on their decisions as in the decisions of other participants and algorithms trained on the decisions of other participants. Cognitive psychological processes and motivated reasoning help explain why people see more of their biases in algorithms. Research participants most susceptible to bias blind spot were most likely to see more bias in algorithms than self. Participants were also more likely to perceive algorithms than themselves to have been influenced by irrelevant biasing attributes (e.g., race) but not by relevant attributes (e.g., user reviews). Because participants saw more of their biases in algorithms than themselves, they were more likely to make debiasing corrections to decisions attributed to an algorithm than to themselves. Our findings show that bias is more readily perceived in algorithms than in self and suggest how to use algorithms to reveal and correct biased human decisions.
Subject(s)
Motivation , Problem Solving , Humans , Bias , AlgorithmsABSTRACT
Finding optimal bipartite matchings-e.g., matching medical students to hospitals for residency, items to buyers in an auction, or papers to reviewers for peer review-is a fundamental combinatorial optimization problem. We found a distributed algorithm for computing matchings by studying the development of the neuromuscular circuit. The neuromuscular circuit can be viewed as a bipartite graph formed between motor neurons and muscle fibers. In newborn animals, neurons and fibers are densely connected, but after development, each fiber is typically matched (i.e., connected) to exactly one neuron. We cast this synaptic pruning process as a distributed matching (or assignment) algorithm, where motor neurons "compete" with each other to "win" muscle fibers. We show that this algorithm is simple to implement, theoretically sound, and effective in practice when evaluated on real-world bipartite matching problems. Thus, insights from the development of neural circuits can inform the design of algorithms for fundamental computational problems.
Subject(s)
Algorithms , Motor Neurons , Motor Neurons/physiology , Animals , Humans , Neural Networks, Computer , Models, NeurologicalABSTRACT
Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.
Subject(s)
Microscopy , Spectrum Analysis, Raman , Humans , Microscopy/methods , Spectrum Analysis, Raman/methods , Thyroid Gland , Nonlinear Optical Microscopy , Machine LearningABSTRACT
Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.
Subject(s)
Machine Learning , Neoplasms , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Neoplasms/metabolism , Antineoplastic Agents/therapeutic use , Antineoplastic Agents/pharmacology , Cell Line, Tumor , Computational Biology/methods , Precision Medicine/methods , MultiomicsABSTRACT
The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.
Subject(s)
Bacteria , Chromosome Structures , Prokaryotic Cells , Chromosomes, Bacterial/genetics , Algorithms , Escherichia coli/geneticsABSTRACT
Cluster analysis, a pivotal step in single-cell sequencing data analysis, presents substantial opportunities to effectively unveil the molecular mechanisms underlying cellular heterogeneity and intercellular phenotypic variations. However, the inherent imperfections arise as different clustering algorithms yield diverse estimates of cluster numbers and cluster assignments. This study introduces Single Cell Consistent Clustering based on Spectral Matrix Decomposition (SCSMD), a comprehensive clustering approach that integrates the strengths of multiple methods to determine the optimal clustering scheme. Testing the performance of SCSMD across different distances and employing the bespoke evaluation metric, the methodological selection undergoes validation to ensure the optimal efficacy of the SCSMD. A consistent clustering test is conducted on 15 authentic scRNA-seq datasets. The application of SCSMD to human embryonic stem cell scRNA-seq data successfully identifies known cell types and delineates their developmental trajectories. Similarly, when applied to glioblastoma cells, SCSMD accurately detects pre-existing cell types and provides finer sub-division within one of the original clusters. The results affirm the robust performance of our SCSMD method in terms of both the number of clusters and cluster assignments. Moreover, we have broadened the application scope of SCSMD to encompass larger datasets, thereby furnishing additional evidence of its superiority. The findings suggest that SCSMD is poised for application to additional scRNA-seq datasets and for further downstream analyses.
Subject(s)
Algorithms , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Cluster Analysis , Computational Biology/methods , Glioblastoma/genetics , Glioblastoma/pathology , Glioblastoma/metabolismABSTRACT
irGSEA is an R package designed to assess the outcomes of various gene set scoring methods when applied to single-cell RNA sequencing data. This package incorporates six distinct scoring methods that rely on the expression ranks of genes, emphasizing relative expression levels over absolute values. The implemented methods include AUCell, UCell, singscore, ssGSEA, JASMINE and Viper. Previous studies have demonstrated the robustness of these methods to variations in dataset size and composition, generating enrichment scores based solely on the relative gene expression of individual cells. By employing the robust rank aggregation algorithm, irGSEA amalgamates results from all six methods to ascertain the statistical significance of target gene sets across diverse scoring methods. The package prioritizes user-friendliness, allowing direct input of expression matrices or seamless interaction with Seurat objects. Furthermore, it facilitates a comprehensive visualization of results. The irGSEA package and its accompanying documentation are accessible on GitHub (https://github.com/chuiqin/irGSEA).
Subject(s)
Algorithms , Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Computational Biology/methods , Gene Expression Profiling/methods , Sequence Analysis, RNA/methodsABSTRACT
Genomic selection (GS) has emerged as an effective technology to accelerate crop hybrid breeding by enabling early selection prior to phenotype collection. Genomic best linear unbiased prediction (GBLUP) is a robust method that has been routinely used in GS breeding programs. However, GBLUP assumes that markers contribute equally to the total genetic variance, which may not be the case. In this study, we developed a novel GS method called GA-GBLUP that leverages the genetic algorithm (GA) to select markers related to the target trait. We defined four fitness functions for optimization, including AIC, BIC, R2, and HAT, to improve the predictability and bin adjacent markers based on the principle of linkage disequilibrium to reduce model dimension. The results demonstrate that the GA-GBLUP model, equipped with R2 and HAT fitness function, produces much higher predictability than GBLUP for most traits in rice and maize datasets, particularly for traits with low heritability. Moreover, we have developed a user-friendly R package, GAGBLUP, for GS, and the package is freely available on CRAN (https://CRAN.R-project.org/package=GAGBLUP).
Subject(s)
Algorithms , Genomics , Selection, Genetic , Zea mays , Genomics/methods , Zea mays/genetics , Oryza/genetics , Models, Genetic , Plant Breeding/methods , Linkage Disequilibrium , Phenotype , Quantitative Trait Loci , Genome, Plant , Polymorphism, Single Nucleotide , SoftwareABSTRACT
We sought to develop and validate a machine learning (ML) model for predicting multidimensional frailty based on clinical and laboratory data. Moreover, an explainable ML model utilizing SHapley Additive exPlanations (SHAP) was constructed. This study enrolled 622 patients hospitalized due to decompensating episodes at a tertiary hospital. The cohort data were randomly divided into training and test sets. External validation was carried out using 131 patients from other tertiary hospitals. The frail phenotype was defined according to a self-reported questionnaire (Frailty Index). The area under the receiver operating characteristics curve was adopted to compare the performance of five ML models. The importance of the features and interpretation of the ML models were determined using the SHAP method. The proportions of cirrhotic patients with nonfrail and frail phenotypes in combined training and test sets were 87.8% and 12.2%, respectively, while they were 88.5% and 11.5% in the external validation dataset. Five ML algorithms were used, and the random forest (RF) model exhibited substantially predictive performance. Regarding the external validation, the RF algorithm outperformed other ML models. Moreover, the SHAP method demonstrated that neutrophil-to-lymphocyte ratio, age, lymphocyte-to-monocyte ratio, ascites, and albumin served as the most important predictors for frailty. At the patient level, the SHAP force plot and decision plot exhibited a clinically meaningful explanation of the RF algorithm. We constructed an ML model (RF) providing accurate prediction of frail phenotype in decompensated cirrhosis. The explainability and generalizability may foster clinicians to understand contributors to this physiologically vulnerable situation and tailor interventions.
Subject(s)
Frailty , Hospitalization , Liver Cirrhosis , Machine Learning , Humans , Liver Cirrhosis/complications , Female , Male , Middle Aged , Aged , Algorithms , ROC CurveABSTRACT
The dynamics and variability of protein conformations are directly linked to their functions. Many comparative studies of X-ray protein structures have been conducted to elucidate the relevant conformational changes, dynamics and heterogeneity. The rapid increase in the number of experimentally determined structures has made comparison an effective tool for investigating protein structures. For example, it is now possible to compare structural ensembles formed by enzyme species, variants or the type of ligands bound to them. In this study, the author developed a multilevel model for estimating two covariance matrices that represent inter- and intra-ensemble variability in the Cartesian coordinate space. Principal component analysis using the two estimated covariance matrices identified the inter-/intra-enzyme variabilities, which seemed to be important for the enzyme functions, with the illustrative examples of cytochrome P450 family 2 enzymes and class A $\beta$-lactamases. In P450, in which each enzyme has its own active site of a distinct size, an active-site motion shared universally between the enzymes was captured as the first principal mode of the intra-enzyme covariance matrix. In this case, the method was useful for understanding the conformational variability after adjusting for the differences between enzyme sizes. The developed method is advantageous in small ensemble-size problems and hence promising for use in comparative studies on experimentally determined structures where ensemble sizes are smaller than those generated, for example, by molecular dynamics simulations.
Subject(s)
Molecular Dynamics Simulation , Proteins , Proteins/chemistry , Protein Conformation , Catalytic DomainABSTRACT
CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.
Subject(s)
CRISPR-Cas Systems , Deep Learning , Gene Editing/methods , RNA, Guide, CRISPR-Cas Systems , Neural Networks, ComputerABSTRACT
IMPRINTS-CETSA (Integrated Modulation of Protein Interaction States-Cellular Thermal Shift Assay) provides a highly resolved means to systematically study the interactions of proteins with other cellular components, including metabolites, nucleic acids and other proteins, at the proteome level, but no freely available and user-friendly data analysis software has been reported. Here, we report IMPRINTS.CETSA, an R package that provides the basic data processing framework for robust analysis of the IMPRINTS-CETSA data format, from preprocessing and normalization to visualization. We also report an accompanying R package, IMPRINTS.CETSA.app, which offers a user-friendly Shiny interface for analysis and interpretation of IMPRINTS-CETSA results, with seamless features such as functional enrichment and mapping to other databases at a single site. For the hit generation part, the diverse behaviors of protein modulations have been typically segregated with a two-measure scoring method, i.e. the abundance and thermal stability changes. We present a new algorithm to classify modulated proteins in IMPRINTS-CETSA experiments by a robust single-measure scoring. In this way, both the numerical changes and the statistical significances of the IMPRINTS information can be visualized on a single plot. The IMPRINTS.CETSA and IMPRINTS.CETSA.app R packages are freely available on GitHub at https://github.com/nkdailingyun/IMPRINTS.CETSA and https://github.com/mgerault/IMPRINTS.CETSA.app, respectively. IMPRINTS.CETSA.app is also available as an executable program at https://zenodo.org/records/10636134.
Subject(s)
Mobile Applications , Software , Proteome , Algorithms , Research DesignABSTRACT
Herbal medicines, particularly traditional Chinese medicines (TCMs), are a rich source of natural products with significant therapeutic potential. However, understanding their mechanisms of action is challenging due to the complexity of their multi-ingredient compositions. We introduced Herb-CMap, a multimodal fusion framework leveraging protein-protein interactions and herb-perturbed gene expression signatures. Utilizing a network-based heat diffusion algorithm, Herb-CMap creates a connectivity map linking herb perturbations to their therapeutic targets, thereby facilitating the prioritization of active ingredients. As a case study, we applied Herb-CMap to Suhuang antitussive capsule (Suhuang), a TCM formula used for treating cough variant asthma (CVA). Using in vivo rat models, our analysis established the transcriptomic signatures of Suhuang and identified its key compounds, such as quercetin and luteolin, and their target genes, including IL17A, PIK3CB, PIK3CD, AKT1, and TNF. These drug-target interactions inhibit the IL-17 signaling pathway and deactivate PI3K, AKT, and NF-κB, effectively reducing lung inflammation and alleviating CVA. The study demonstrates the efficacy of Herb-CMap in elucidating the molecular mechanisms of herbal medicines, offering valuable insights for advancing drug discovery in TCM.
Subject(s)
Antitussive Agents , Drugs, Chinese Herbal , Medicine, Chinese Traditional , Animals , Drugs, Chinese Herbal/pharmacology , Drugs, Chinese Herbal/therapeutic use , Medicine, Chinese Traditional/methods , Rats , Antitussive Agents/pharmacology , Antitussive Agents/therapeutic use , Protein Interaction Maps/drug effects , Asthma/drug therapy , Asthma/metabolism , Asthma/genetics , Signal Transduction/drug effects , Cough/drug therapy , Transcriptome , HumansABSTRACT
Thinking about God promotes greater acceptance of Artificial intelligence (AI)-based recommendations. Eight preregistered experiments (n = 2,462) reveal that when God is salient, people are more willing to consider AI-based recommendations than when God is not salient. Studies 1 and 2a to 2d demonstrate across a wide variety of contexts, from choosing entertainment and food to mutual funds and dental procedures, that God salience reduces reliance on human recommenders and heightens willingness to consider AI recommendations. Studies 3 and 4 demonstrate that the reduced reliance on humans is driven by a heightened feeling of smallness when God is salient, followed by a recognition of human fallibility. Study 5 addresses the similarity in mysteriousness between God and AI as an alternative, but unsupported, explanation. Finally, study 6 (n = 53,563) corroborates the experimental results with data from 21 countries on the usage of robo-advisors in financial decision-making.
Subject(s)
Artificial Intelligence , Decision Making , Humans , Surveys and QuestionnairesABSTRACT
Identifying efficient and accurate optimization algorithms is a long-desired goal for the scientific community. At present, a combination of evolutionary and deep-learning methods is widely used for optimization. In this paper, we demonstrate three cases involving different physics and conclude that no matter how accurate a deep-learning model is for a single, specific problem, a simple combination of evolutionary and deep-learning methods cannot achieve the desired optimization because of the intrinsic nature of the evolutionary method. We begin by using a physics-supervised deep-learning optimization algorithm (PSDLO) to supervise the results from the deep-learning model. We then intervene in the evolutionary process to eventually achieve simultaneous accuracy and efficiency. PSDLO is successfully demonstrated using both sufficient and insufficient datasets. PSDLO offers a perspective for solving optimization problems and can tackle complex science and engineering problems having many features. This approach to optimization algorithms holds tremendous potential for application in real-world engineering domains.
ABSTRACT
An independent set (IS) is a set of vertices in a graph such that no edge connects any two vertices. In adiabatic quantum computation [E. Farhi, et al., Science 292, 472-475 (2001); A. Das, B. K. Chakrabarti, Rev. Mod. Phys. 80, 1061-1081 (2008)], a given graph G(V, E) can be naturally mapped onto a many-body Hamiltonian [Formula: see text], with edges [Formula: see text] being the two-body interactions between adjacent vertices [Formula: see text]. Thus, solving the IS problem is equivalent to finding all the computational basis ground states of [Formula: see text]. Very recently, non-Abelian adiabatic mixing (NAAM) has been proposed to address this task, exploiting an emergent non-Abelian gauge symmetry of [Formula: see text] [B. Wu, H. Yu, F. Wilczek, Phys. Rev. A 101, 012318 (2020)]. Here, we solve a representative IS problem [Formula: see text] by simulating the NAAM digitally using a linear optical quantum network, consisting of three C-Phase gates, four deterministic two-qubit gate arrays (DGA), and ten single rotation gates. The maximum IS has been successfully identified with sufficient Trotterization steps and a carefully chosen evolution path. Remarkably, we find IS with a total probability of 0.875(16), among which the nontrivial ones have a considerable weight of about 31.4%. Our experiment demonstrates the potential advantage of NAAM for solving IS-equivalent problems.
ABSTRACT
Quantum computing technology may soon deliver revolutionary improvements in algorithmic performance, but it is useful only if computed answers are correct. While hardware-level decoherence errors have garnered significant attention, a less recognized obstacle to correctness is that of human programming errors-"bugs." Techniques familiar to most programmers from the classical domain for avoiding, discovering, and diagnosing bugs do not easily transfer, at scale, to the quantum domain because of its unique characteristics. To address this problem, we have been working to adapt formal methods to quantum programming. With such methods, a programmer writes a mathematical specification alongside the program and semiautomatically proves the program correct with respect to it. The proof's validity is automatically confirmed-certified-by a "proof assistant." Formal methods have successfully yielded high-assurance classical software artifacts, and the underlying technology has produced certified proofs of major mathematical theorems. As a demonstration of the feasibility of applying formal methods to quantum programming, we present a formally certified end-to-end implementation of Shor's prime factorization algorithm, developed as part of a framework for applying the certified approach to general applications. By leveraging our framework, one can significantly reduce the effects of human errors and obtain a high-assurance implementation of large-scale quantum applications in a principled way.
ABSTRACT
The number of noisy images required for molecular reconstruction in single-particle cryoelectron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain theoretical and computational contributions for this challenging nonlinear inverse problem with sparsity constraints. We prove that molecular structures modeled as sums of Gaussians are uniquely determined by the second-order autocorrelation of their projection images, implying that the sample complexity is proportional to the square of the variance of the noise. This theory improves upon the nonsparse case, where the third-order autocorrelation is required for uniformly oriented particle images and the sample complexity scales with the cube of the noise variance. Furthermore, we build a computational framework to reconstruct molecular structures which are sparse in the wavelet basis. This method combines the sparse representation for the molecule with projection-based techniques used for phase retrieval in X-ray crystallography.