Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(32): e2218217120, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37523524

RESUMO

The 70-kD heat shock protein (Hsp70) chaperone system is a central hub of the proteostasis network that helps maintain protein homeostasis in all organisms. The recruitment of Hsp70 to perform different and specific cellular functions is regulated by the J-domain protein (JDP) co-chaperone family carrying the small namesake J-domain, required to interact and drive the ATPase cycle of Hsp70s. Besides the J-domain, prokaryotic and eukaryotic JDPs display a staggering diversity in domain architecture, function, and cellular localization. Very little is known about the overall JDP family, despite their essential role in cellular proteostasis, development, and its link to a broad range of human diseases. In this work, we leverage the exponentially increasing number of JDP gene sequences identified across all kingdoms owing to the advancements in sequencing technology and provide a broad overview of the JDP repertoire. Using an automated classification scheme based on artificial neural networks (ANNs), we demonstrate that the sequences of J-domains carry sufficient discriminatory information to reliably recover the phylogeny, localization, and domain composition of the corresponding full-length JDP. By harnessing the interpretability of the ANNs, we find that many of the discriminatory sequence positions match residues that form the interaction interface between the J-domain and Hsp70. This reveals that key residues within the J-domains have coevolved with their obligatory Hsp70 partners to build chaperone circuits for specific functions in cells.


Assuntos
Proteínas de Choque Térmico HSP70 , Chaperonas Moleculares , Humanos , Sequência de Aminoácidos , Genômica , Proteínas de Choque Térmico HSP40/metabolismo , Proteínas de Choque Térmico HSP70/metabolismo , Chaperonas Moleculares/metabolismo , Filogenia
2.
Psychol Sci ; : 9567976241265037, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39356556

RESUMO

Obesity has adverse consequences for those affected. We tested whether the association between obesity and its adverse consequences is reduced in regions in which obesity is prevalent and whether lower weight bias in high-obese regions can account for this reduction. Studies 1 and 2 used data from the United States (N = 2,846,132 adults across 2,546 counties) and United Kingdom (N = 180,615 adults across 380 districts) that assessed obesity's adverse consequences in diverse domains: close relationships, economic outcomes, and health. Both studies revealed that the association between obesity and its adverse consequences is reduced (or absent) in high-obese regions. Study 3 used another large-scale data set (N = 409,837 across 2,928 U.S. counties) and revealed that lower weight bias in high-obese regions seems to account for (i.e., mediate) the reduction in obesity's adverse consequences. Overall, our findings suggest that obesity's adverse consequences are partly social and, thus, not inevitable.

3.
BMC Med ; 21(1): 268, 2023 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-37488535

RESUMO

BACKGROUND: Tumour-infiltrating lymphocytes (TILs), including T and B cells, have been demonstrated to be associated with tumour progression. However, the different subpopulations of TILs and their roles in breast cancer remain poorly understood. Large-scale analysis using multiomics data could uncover potential mechanisms and provide promising biomarkers for predicting immunotherapy response. METHODS: Single-cell transcriptome data for breast cancer samples were analysed to identify unique TIL subsets. Based on the expression profiles of marker genes in these subsets, a TIL-related prognostic model was developed by univariate and multivariate Cox analyses and LASSO regression for the TCGA training cohort containing 1089 breast cancer patients. Multiplex immunohistochemistry was used to confirm the presence of TIL subsets in breast cancer samples. The model was validated with a large-scale transcriptomic dataset for 3619 breast cancer patients, including the METABRIC cohort, six chemotherapy transcriptomic cohorts, and two immunotherapy transcriptomic cohorts. RESULTS: We identified two TIL subsets with high expression of CD103 and LAG3 (CD103+LAG3+), including a CD8+ T-cell subset and a B-cell subset. Based on the expression profiles of marker genes in these two subpopulations, we further developed a CD103+LAG3+ TIL-related prognostic model (CLTRP) based on CXCL13 and BIRC3 genes for predicting the prognosis of breast cancer patients. CLTRP-low patients had a better prognosis than CLTRP-high patients. The comprehensive results showed that a low CLTRP score was associated with a high TP53 mutation rate, high infiltration of CD8 T cells, helper T cells, and CD4 T cells, high sensitivity to chemotherapeutic drugs, and a good response to immunotherapy. In contrast, a high CLTRP score was correlated with a low TP53 mutation rate, high infiltration of M0 and M2 macrophages, low sensitivity to chemotherapeutic drugs, and a poor response to immunotherapy. CONCLUSIONS: Our present study showed that the CLTRP score is a promising biomarker for distinguishing prognosis, drug sensitivity, molecular and immune characteristics, and immunotherapy outcomes in breast cancer patients. The CLTRP could serve as a valuable tool for clinical decision making regarding immunotherapy.


Assuntos
Neoplasias da Mama , Linfócitos do Interstício Tumoral , Linfócitos do Interstício Tumoral/imunologia , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/imunologia , Humanos , Prognóstico , Antineoplásicos/uso terapêutico
4.
Hum Brain Mapp ; 43(18): 5520-5542, 2022 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-35903877

RESUMO

Cognitive abilities are one of the major transdiagnostic domains in the National Institute of Mental Health's Research Domain Criteria (RDoC). Following RDoC's integrative approach, we aimed to develop brain-based predictive models for cognitive abilities that (a) are developmentally stable over years during adolescence and (b) account for the relationships between cognitive abilities and socio-demographic, psychological and genetic factors. For this, we leveraged the unique power of the large-scale, longitudinal data from the Adolescent Brain Cognitive Development (ABCD) study (n ~ 11 k) and combined MRI data across modalities (task-fMRI from three tasks: resting-state fMRI, structural MRI and DTI) using machine-learning. Our brain-based, predictive models for cognitive abilities were stable across 2 years during young adolescence and generalisable to different sites, partially predicting childhood cognition at around 20% of the variance. Moreover, our use of 'opportunistic stacking' allowed the model to handle missing values, reducing the exclusion from around 80% to around 5% of the data. We found fronto-parietal networks during a working-memory task to drive childhood-cognition prediction. The brain-based, predictive models significantly, albeit partially, accounted for variance in childhood cognition due to (1) key socio-demographic and psychological factors (proportion mediated = 18.65% [17.29%-20.12%]) and (2) genetic variation, as reflected by the polygenic score of cognition (proportion mediated = 15.6% [11%-20.7%]). Thus, our brain-based predictive models for cognitive abilities facilitate the development of a robust, transdiagnostic research tool for cognition at the neural level in keeping with the RDoC's integrative framework.


Assuntos
Encéfalo , Cognição , Adolescente , Humanos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico , Imageamento por Ressonância Magnética , Demografia
5.
Multivariate Behav Res ; 57(4): 642-657, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33703972

RESUMO

With the advent of the big data era, machine learning methods have evolved and proliferated. This study focused on penalized regression, a procedure that builds interpretive prediction models among machine learning methods. In particular, penalized regression coupled with large-scale data can explore hundreds or thousands of variables in one statistical model without convergence problems and identify yet uninvestigated important predictors. As one of the first Monte Carlo simulation studies to investigate predictive modeling with missing categorical predictors in the context of social science research, this study endeavored to emulate real social science large-scale data. Likert-scaled variables were simulated as well as multiple-category and count variables. Due to the inclusion of the categorical predictors in modeling, penalized regression methods that consider the grouping effect were employed such as group Mnet. We also examined the applicability of the simulation conditions with a real large-scale dataset that the simulation study referenced. Particularly, the study presented selection counts of variables after multiple iterations of modeling in order to consider the bias resulting from data-splitting in model validation. Selection counts turned out to be a necessary tool when variable selection is of research interest. Efforts to utilize large-scale data to the fullest appear to offer a valid approach to mitigate the effect of nonignorable missingness. Overall, penalized regression which assumes linearity is a viable method to analyze social science large-scale survey data.


Assuntos
Análise de Dados , Modelos Estatísticos , Simulação por Computador , Método de Monte Carlo , Análise de Regressão
6.
J Proteome Res ; 20(4): 2056-2061, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33625229

RESUMO

BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.


Assuntos
Biologia Computacional , Proteômica , Sistema de Registros , Reprodutibilidade dos Testes , Software
7.
J Child Psychol Psychiatry ; 62(10): 1202-1219, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33748971

RESUMO

OBJECTIVE: Some studies have suggested alterations of structural brain asymmetry in attention-deficit/hyperactivity disorder (ADHD), but findings have been contradictory and based on small samples. Here, we performed the largest ever analysis of brain left-right asymmetry in ADHD, using 39 datasets of the ENIGMA consortium. METHODS: We analyzed asymmetry of subcortical and cerebral cortical structures in up to 1,933 people with ADHD and 1,829 unaffected controls. Asymmetry Indexes (AIs) were calculated per participant for each bilaterally paired measure, and linear mixed effects modeling was applied separately in children, adolescents, adults, and the total sample, to test exhaustively for potential associations of ADHD with structural brain asymmetries. RESULTS: There was no evidence for altered caudate nucleus asymmetry in ADHD, in contrast to prior literature. In children, there was less rightward asymmetry of the total hemispheric surface area compared to controls (t = 2.1, p = .04). Lower rightward asymmetry of medial orbitofrontal cortex surface area in ADHD (t = 2.7, p = .01) was similar to a recent finding for autism spectrum disorder. There were also some differences in cortical thickness asymmetry across age groups. In adults with ADHD, globus pallidus asymmetry was altered compared to those without ADHD. However, all effects were small (Cohen's d from -0.18 to 0.18) and would not survive study-wide correction for multiple testing. CONCLUSION: Prior studies of altered structural brain asymmetry in ADHD were likely underpowered to detect the small effects reported here. Altered structural asymmetry is unlikely to provide a useful biomarker for ADHD, but may provide neurobiological insights into the trait.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade , Transtorno do Espectro Autista , Adolescente , Adulto , Encéfalo/diagnóstico por imagem , Núcleo Caudado , Criança , Humanos , Imageamento por Ressonância Magnética
8.
Sensors (Basel) ; 21(4)2021 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-33671419

RESUMO

Various imaging modalities are evaluated for use in forensic incident (crime or accident) scene documentation. Particular attention is paid to the precision vs. cost tradeoff, accomplished by judiciously combining various 3D scans and photogrammetric reconstructions from 2D photographs. Assumptions are proposed for two complementary software systems: an event scene pilot assisting the on-site staff in their work securing evidence and facilitating their communication with stationary support staff, and an evidence keeper, managing the voluminous and varied database of accumulated imagery, textual notes and physical evidence inventory.


Assuntos
Ciências Forenses , Fotogrametria , Documentação , Humanos , Software
9.
Proteomics ; 20(9): e1900147, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31657527

RESUMO

The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single-tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large-scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large-scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.


Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Software , Computação em Nuvem , Análise de Dados , Espectrometria de Massas/métodos , Metabolômica/métodos , Fluxo de Trabalho
10.
Mol Pharm ; 17(12): 4652-4666, 2020 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-33151084

RESUMO

Small molecules with multitarget activity are capable of triggering polypharmacological effects and are of high interest in drug discovery. Compared to single-target compounds, promiscuity also affects drug distribution and pharmacodynamics and alters ADMET characteristics. Features distinguishing between compounds with single- and multitarget activity are currently only little understood. On the basis of systematic data analysis, we have assembled large sets of promiscuous compounds with activity against related or functionally distinct targets and the corresponding compounds with single-target activity. Machine learning predicted promiscuous compounds with surprisingly high accuracy. Molecular similarity analysis combined with control calculations under varying conditions revealed that accurate predictions were largely determined by structural nearest-neighbor relationships between compounds from different classes. We also found that large proportions of promiscuous compounds with activity against related or unrelated targets and corresponding single-target compounds formed analog series with distinct chemical space coverage, which further rationalized the predictions. Moreover, compounds with activity against proteins from functionally distinct classes were often active against unique targets that were not covered by other promiscuous compounds. The results of our analysis revealed that nearest-neighbor effects determined the prediction of promiscuous compounds and that preferential partitioning of compounds with single- and multitarget activity into structurally distinct analog series was responsible for such effects, hence providing a rationale for the presence of different structure-promiscuity relationships.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Polifarmacologia , Análise de Dados , Estrutura Molecular , Relação Estrutura-Atividade
11.
Stat Med ; 38(21): 3997-4012, 2019 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-31267550

RESUMO

A stochastic approximation EM algorithm (SAEM) is described for exploratory factor analysis of dichotomous or ordinal variables. The factor structure is obtained from sufficient statistics that are updated during iterations with the Robbins-Monro procedure. Two large-scale simulations are reported that compare accuracy and CPU time of the proposed SAEM algorithm to the Metropolis-Hasting Robbins-Monro procedure and to a generalized least squares analysis of the polychoric correlation matrix. A smaller-scale application to real data is also reported, including a method for obtaining standard errors of rotated factor loadings. A simulation study based on the real data analysis is conducted to study bias and error estimates. The SAEM factor algorithm requires minimal lines of code, no derivatives, and no large-matrix inversion. It is programmed entirely in R.


Assuntos
Algoritmos , Análise Fatorial , Viés , Simulação por Computador , Humanos , Análise dos Mínimos Quadrados , Funções Verossimilhança , Processos Estocásticos
12.
Sensors (Basel) ; 19(12)2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31212959

RESUMO

Underwater sensor networks have wide application prospects, but the large-scale sensing node deployment is severely hindered by problems like energy constraints, long delays, local disconnections, and heavy energy consumption. These problems can be solved effectively by optimizing sensing node deployment with a genetic algorithm. However, the genetic algorithm (GA) needs many iterations in solving the best location of underwater sensor deployment, which results in long running time delays and limited practical application when dealing with large-scale data. The classical parallel framework Hadoop can improve the GA running efficiency to some extent while the state-of-the-art parallel framework Spark can release much more parallel potential of GA by realizing parallel crossover, mutation, and other operations on each computing node. Giving full allowance for the working environment of the underwater sensor network and the characteristics of sensors, this paper proposes a Spark-based parallel GA to calculate the extremum of the Shubert multi-peak function, through which the optimal deployment of the underwater sensor network can be obtained. Experimental results show that while faced with a large-scale underwater sensor network, compared with single node and Hadoop framework, the Spark-based implementation not only significantly reduces the running time but also effectively avoids the problem of premature convergence because of its powerful randomness.

13.
Behav Res Methods ; 51(4): 1531-1543, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-30251006

RESUMO

Large-scale data sets from online training and game platforms offer the opportunity for more extensive and more precise investigations of human learning than is typically achievable in the laboratory. However, because people make their own choices about participation, any investigation into learning using these data sets must simultaneously model performance-that is, the learning function-and participation. Using a data set of 54 million gameplays from the online brain training site Lumosity, we show that learning functions of participants are systematically biased by participation policies that vary with age. Older adults who are poorer performers are more likely to drop out than older adults who perform well. Younger adults show no such effect. Using this knowledge, we can extrapolate group learning functions that correct for these age-related differences in dropout.


Assuntos
Aprendizagem , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Conjuntos de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
14.
BMC Bioinformatics ; 19(1): 221, 2018 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-29890950

RESUMO

BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. RESULTS: The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. CONCLUSIONS: The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos
15.
RNA ; 22(7): 957-67, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27190231

RESUMO

Nucleic acid sequence complementarity underlies many fundamental biological processes. Although first noticed a long time ago, sequence complementarity between mRNAs and ribosomal RNAs still lacks a meaningful biological interpretation. Here we used statistical analysis of large-scale sequence data sets and high-throughput computing to explore complementarity between 18S and 28S rRNAs and mRNA 3' UTR sequences. By the analysis of 27,646 full-length 3' UTR sequences from 14 species covering both protozoans and metazoans, we show that the computed 18S rRNA complementarity creates an evolutionarily conserved localization pattern centered around the ribosomal mRNA entry channel, suggesting its biological relevance and functionality. Based on this specific pattern and earlier data showing that post-termination 80S ribosomes are not stably anchored at the stop codon and can migrate in both directions to codons that are cognate to the P-site deacylated tRNA, we propose that the 18S rRNA-mRNA complementarity selectively stabilizes post-termination ribosomal complexes to facilitate ribosome recycling. We thus demonstrate that the complementarity between 18S rRNA and 3' UTRs has a non-random nature and very likely carries information with a regulatory potential for translational control.


Assuntos
Regiões 3' não Traduzidas , Biossíntese de Proteínas/fisiologia , RNA Ribossômico/fisiologia , Regiões Terminadoras Genéticas , Animais , Códon , RNA Ribossômico/química
16.
J Med Syst ; 42(4): 69, 2018 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-29500683

RESUMO

This paper presents a new approach to prioritize "Large-scale Data" of patients with chronic heart diseases by using body sensors and communication technology during disasters and peak seasons. An evaluation matrix is used for emergency evaluation and large-scale data scoring of patients with chronic heart diseases in telemedicine environment. However, one major problem in the emergency evaluation of these patients is establishing a reasonable threshold for patients with the most and least critical conditions. This threshold can be used to detect the highest and lowest priority levels when all the scores of patients are identical during disasters and peak seasons. A practical study was performed on 500 patients with chronic heart diseases and different symptoms, and their emergency levels were evaluated based on four main measurements: electrocardiogram, oxygen saturation sensor, blood pressure monitoring, and non-sensory measurement tool, namely, text frame. Data alignment was conducted for the raw data and decision-making matrix by converting each extracted feature into an integer. This integer represents their state in the triage level based on medical guidelines to determine the features from different sources in a platform. The patients were then scored based on a decision matrix by using multi-criteria decision-making techniques, namely, integrated multi-layer for analytic hierarchy process (MLAHP) and technique for order performance by similarity to ideal solution (TOPSIS). For subjective validation, cardiologists were consulted to confirm the ranking results. For objective validation, mean ± standard deviation was computed to check the accuracy of the systematic ranking. This study provides scenarios and checklist benchmarking to evaluate the proposed and existing prioritization methods. Experimental results revealed the following. (1) The integration of TOPSIS and MLAHP effectively and systematically solved the patient settings on triage and prioritization problems. (2) In subjective validation, the first five patients assigned to the doctors were the most urgent cases that required the highest priority, whereas the last five patients were the least urgent cases and were given the lowest priority. In objective validation, scores significantly differed between the groups, indicating that the ranking results were identical. (3) For the first, second, and third scenarios, the proposed method exhibited an advantage over the benchmark method with percentages of 40%, 60%, and 100%, respectively. In conclusion, patients with the most and least urgent cases received the highest and lowest priority levels, respectively.


Assuntos
Interpretação Estatística de Dados , Técnicas de Apoio para a Decisão , Emergências , Cardiopatias/fisiopatologia , Monitorização Ambulatorial/métodos , Telemetria/métodos , Monitorização Ambulatorial da Pressão Arterial , Doença Crônica , Eletrocardiografia Ambulatorial , Humanos , Oxigênio/sangue , Tecnologia de Sensoriamento Remoto , Reprodutibilidade dos Testes , Processos Estocásticos , Fatores de Tempo
17.
J Proteome Res ; 15(3): 707-12, 2016 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-26510693

RESUMO

The use of proteomics bioinformatics substantially contributes to an improved understanding of proteomes, but this novel and in-depth knowledge comes at the cost of increased computational complexity. Parallelization across multiple computers, a strategy termed distributed computing, can be used to handle this increased complexity; however, setting up and maintaining a distributed computing infrastructure requires resources and skills that are not readily available to most research groups. Here we propose a free and open-source framework named Pladipus that greatly facilitates the establishment of distributed computing networks for proteomics bioinformatics tools. Pladipus is straightforward to install and operate thanks to its user-friendly graphical interface, allowing complex bioinformatics tasks to be run easily on a network instead of a single computer. As a result, any researcher can benefit from the increased computational efficiency provided by distributed computing, hence empowering them to tackle more complex bioinformatics challenges. Notably, it enables any research group to perform large-scale reprocessing of publicly available proteomics data, thus supporting the scientific community in mining these data for novel discoveries.


Assuntos
Biologia Computacional/métodos , Redes de Comunicação de Computadores , Proteômica/métodos , Mineração de Dados , Interface Usuário-Computador
18.
Algorithms Mol Biol ; 19(1): 21, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38863064

RESUMO

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

19.
J Biomed Opt ; 29(6): 066006, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38846677

RESUMO

Significance: Photoacoustic computed tomography (PACT) is a promising non-invasive imaging technique for both life science and clinical implementations. To achieve fast imaging speed, modern PACT systems have equipped arrays that have hundreds to thousands of ultrasound transducer (UST) elements, and the element number continues to increase. However, large number of UST elements with parallel data acquisition could generate a massive data size, making it very challenging to realize fast image reconstruction. Although several research groups have developed GPU-accelerated method for PACT, there lacks an explicit and feasible step-by-step description of GPU-based algorithms for various hardware platforms. Aim: In this study, we propose a comprehensive framework for developing GPU-accelerated PACT image reconstruction (GPU-accelerated photoacoustic computed tomography), to help the research community to grasp this advanced image reconstruction method. Approach: We leverage widely accessible open-source parallel computing tools, including Python multiprocessing-based parallelism, Taichi Lang for Python, CUDA, and possible other backends. We demonstrate that our framework promotes significant performance of PACT reconstruction, enabling faster analysis and real-time applications. Besides, we also described how to realize parallel computing on various hardware configurations, including multicore CPU, single GPU, and multiple GPUs platform. Results: Notably, our framework can achieve an effective rate of ∼ 871 times when reconstructing extremely large-scale three-dimensional PACT images on a dual-GPU platform compared to a 24-core workstation CPU. In this paper, we share example codes via GitHub. Conclusions: Our approach allows for easy adoption and adaptation by the research community, fostering implementations of PACT for both life science and medicine.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Imagens de Fantasmas , Técnicas Fotoacústicas , Técnicas Fotoacústicas/métodos , Técnicas Fotoacústicas/instrumentação , Processamento de Imagem Assistida por Computador/métodos , Animais , Gráficos por Computador , Tomografia Computadorizada por Raios X/métodos , Tomografia Computadorizada por Raios X/instrumentação , Humanos
20.
Stat Biosci ; 16(1): 250-264, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38495080

RESUMO

Teaching statistics through engaging applications to contemporary large-scale datasets is essential to attracting students to the field. To this end, we developed a hands-on, week-long workshop for senior high-school or junior undergraduate students, without prior knowledge in statistical genetics but with some basic knowledge in data science, to conduct their own genome-wide association study (GWAS). The GWAS was performed for open source gene expression data, using publicly available human genetics data. Assisted by a detailed instruction manual, students were able to obtain ∼1.4 million p-values from a real scientific study, within several days. This early motivation kept students engaged in learning the theories that support their results, including regression, data visualization, results interpretation, and large-scale multiple hypothesis testing. To further their learning motivation by emphasizing the personal connection to this type of data analysis, students were encouraged to make short presentations about how GWAS has provided insights into the genetic basis of diseases that are present in their friends or families. The appended open source, step-by-step instruction manual includes descriptions of the datasets used, the software needed, and results from the workshop. Additionally, scripts used in the workshop are archived on Github and Zenodo to further enhance reproducible research and training.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa