Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 19(4): e0302045, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38630692

RESUMO

In this work, a Python framework for characteristic feature extraction is developed and applied to gene expression data of human fibroblasts. Unlabeled feature selection objectively determines groups and minimal gene sets separating groups. ML explainability methods transform the features correlating with phenotypic differences into causal reasoning, supported by further pipeline and visualization tools, allowing user knowledge to boost causal reasoning. The purpose of the framework is to identify characteristic features that are causally related to phenotypic differences of single cells. The pipeline consists of several data science methods enriched with purposeful visualization of the intermediate results in order to check them systematically and infuse the domain knowledge about the investigated process. A specific focus is to extract a small but meaningful set of genes to facilitate causal reasoning for the phenotypic differences. One application could be drug target identification. For this purpose, the framework follows different steps: feature reduction (PFA), low dimensional embedding (UMAP), clustering ((H)DBSCAN), feature correlation (chi-square, mutual information), ML validation and explainability (SHAP, tree explainer). The pipeline is validated by identifying and correctly separating signature genes associated with aging in fibroblasts from single-cell gene expression measurements: PLK3, polo-like protein kinase 3; CCDC88A, Coiled-Coil Domain Containing 88A; STAT3, signal transducer and activator of transcription-3; ZNF7, Zinc Finger Protein 7; SLC24A2, solute carrier family 24 member 2 and lncRNA RP11-372K14.2. The code for the preprocessing step can be found in the GitHub repository https://github.com/AC-PHD/NoLabelPFA, along with the characteristic feature extraction https://github.com/LauritzR/characteristic-feature-extraction.


Assuntos
Envelhecimento , Aprendizado de Máquina , Humanos , Proteínas dos Microfilamentos , Proteínas de Transporte Vesicular
2.
PNAS Nexus ; 3(3): pgae089, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38505691

RESUMO

Social group-based identities intersect. The meaning of "woman" is modulated by adding social class as in "rich woman" or "poor woman." How does such intersectionality operate at-scale in everyday language? Which intersections dominate (are most frequent)? What qualities (positivity, competence, warmth) are ascribed to each intersection? In this study, we make it possible to address such questions by developing a stepwise procedure, Flexible Intersectional Stereotype Extraction (FISE), applied to word embeddings (GloVe; BERT) trained on billions of words of English Internet text, revealing insights into intersectional stereotypes. First, applying FISE to occupation stereotypes across intersections of gender, race, and class showed alignment with ground-truth data on occupation demographics, providing initial validation. Second, applying FISE to trait adjectives showed strong androcentrism (Men) and ethnocentrism (White) in dominating everyday English language (e.g. White + Men are associated with 59% of traits; Black + Women with 5%). Associated traits also revealed intersectional differences: advantaged intersectional groups, especially intersections involving Rich, had more common, positive, warm, competent, and dominant trait associates. Together, the empirical insights from FISE illustrate its utility for transparently and efficiently quantifying intersectional stereotypes in existing large text corpora, with potential to expand intersectionality research across unprecedented time and place. This project further sets up the infrastructure necessary to pursue new research on the emergent properties of intersectional identities.

4.
Comput Struct Biotechnol J ; 21: 4895-4913, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37860229

RESUMO

In the fast-evolving landscape of biomedical research, the emergence of big data has presented researchers with extraordinary opportunities to explore biological complexities. In biomedical research, big data imply also a big responsibility. This is not only due to genomics data being sensitive information but also due to genomics data being shared and re-analysed among the scientific community. This saves valuable resources and can even help to find new insights in silico. To fully use these opportunities, detailed and correct metadata are imperative. This includes not only the availability of metadata but also their correctness. Metadata integrity serves as a fundamental determinant of research credibility, supporting the reliability and reproducibility of data-driven findings. Ensuring metadata availability, curation, and accuracy are therefore essential for bioinformatic research. Not only must metadata be readily available, but they must also be meticulously curated and ideally error-free. Motivated by an accidental discovery of a critical metadata error in patient data published in two high-impact journals, we aim to raise awareness for the need of correct, complete, and curated metadata. We describe how the metadata error was found, addressed, and present examples for metadata-related challenges in omics research, along with supporting measures, including tools for checking metadata and software to facilitate various steps from data analysis to published research.

5.
Comput Struct Biotechnol J ; 21: 3293-3314, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37333862

RESUMO

Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.

6.
Biomedicines ; 10(10)2022 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-36289702

RESUMO

Since ancient times aging has also been regarded as a disease, and humankind has always strived to extend the natural lifespan. Analyzing the genes involved in aging and disease allows for finding important indicators and biological markers for pathologies and possible therapeutic targets. An example of the use of omics technologies is the research regarding aging and the rare and fatal premature aging syndrome progeria (Hutchinson-Gilford progeria syndrome, HGPS). In our study, we focused on the in silico analysis of differentially expressed genes (DEGs) in progeria and aging, using a publicly available RNA-Seq dataset (GEO dataset GSE113957) and a variety of bioinformatics tools. Despite the GSE113957 RNA-Seq dataset being well-known and frequently analyzed, the RNA-Seq data shared by Fleischer et al. is far from exhausted and reusing and repurposing the data still reveals new insights. By analyzing the literature citing the use of the dataset and subsequently conducting a comparative analysis comparing the RNA-Seq data analyses of different subsets of the dataset (healthy children, nonagenarians and progeria patients), we identified several genes involved in both natural aging and progeria (KRT8, KRT18, ACKR4, CCL2, UCP2, ADAMTS15, ACTN4P1, WNT16, IGFBP2). Further analyzing these genes and the pathways involved indicated their possible roles in aging, suggesting the need for further in vitro and in vivo research. In this paper, we (1) compare "normal aging" (nonagenarians vs. healthy children) and progeria (HGPS patients vs. healthy children), (2) enlist genes possibly involved in both the natural aging process and progeria, including the first mention of IGFBP2 in progeria, (3) predict miRNAs and interactomes for WNT16 (hsa-mir-181a-5p), UCP2 (hsa-mir-26a-5p and hsa-mir-124-3p), and IGFBP2 (hsa-mir-124-3p, hsa-mir-126-3p, and hsa-mir-27b-3p), (4) demonstrate the compatibility of well-established R packages for RNA-Seq analysis for researchers interested but not yet familiar with this kind of analysis, and (5) present comparative proteomics analyses to show an association between our RNA-Seq data analyses and corresponding changes in protein expression.

7.
Proc Natl Acad Sci U S A ; 119(28): e2121798119, 2022 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-35787033

RESUMO

Using word embeddings from 850 billion words in English-language Google Books, we provide an extensive analysis of historical change and stability in social group representations (stereotypes) across a long timeframe (from 1800 to 1999), for a large number of social group targets (Black, White, Asian, Irish, Hispanic, Native American, Man, Woman, Old, Young, Fat, Thin, Rich, Poor), and their emergent, bottom-up associations with 14,000 words and a subset of 600 traits. The results provide a nuanced picture of change and persistence in stereotypes across 200 y. Change was observed in the top-associated words and traits: Whether analyzing the top 10 or 50 associates, at least 50% of top associates changed across successive decades. Despite this changing content of top-associated words, the average valence (positivity/negativity) of these top stereotypes was generally persistent. Ultimately, through advances in the availability of historical word embeddings, this study offers a comprehensive characterization of both change and persistence in social group representations as revealed through books of the English-speaking world from 1800 to 1999.


Assuntos
Livros , Ferramenta de Busca , Feminino , História do Século XIX , História do Século XX , Humanos , Idioma , Masculino , Grupos Populacionais/história , Estereotipagem
8.
J Cancer Res Clin Oncol ; 145(9): 2227-2240, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31317325

RESUMO

PURPOSE: Enhancer of zeste homolog 2 (EZH2) is associated with epigenetic gene silencing and aggressiveness in many tumor types. However, the prognostic impact of high EZH2 expression is controversially discussed for colorectal cancer. For this reason, we immunohistochemically analyzed EZH2 expression in 105 specimens from colon cancer patients separately for tumor center and invasion front. METHODS: All sections from tissue microarrays were evaluated manually and digitally using Definiens Tissue Studio software (TSS). To mirror-image the EZH2 status at the tumor invasion front, we treated HCT116 colon cancer cells with the EZH2 inhibitor 3-Deazaneplanocin A (DZNep) and studied the growth of in ovo xenografts in the chorioallantoic membrane (CAM) assay. RESULTS: We showed a significant decrease in EZH2 expression and the repressive H3K27me3 code at the tumor invasion front as supported by the TSS-constructed heatmaps. Loss of EZH2 at tumor invasion front, but not in tumor center was correlated with unfavorable prognosis and more advanced tumor stages. The observed cell cycle arrest in vitro and in vivo was associated with higher tumor aggressiveness. Xenografts formed by DZNep-treated HCT116 cells showed loosely packed tumor masses, infiltrative growth into the CAM, and high vessel density. CONCLUSION: The differences in EZH2 expression between tumor center and invasion front as well as different scoring and cutoff values can most likely explain controversial literature data concerning the prognostic value of EZH2. Epigenetic therapies using EZH2 inhibitors have to be carefully evaluated for each specific tumor type, since alterations in cell differentiation might lead to unfavorable results.


Assuntos
Adenocarcinoma/metabolismo , Adenocarcinoma/patologia , Neoplasias Colorretais/metabolismo , Neoplasias Colorretais/patologia , Proteína Potenciadora do Homólogo 2 de Zeste/metabolismo , Margens de Excisão , Adenocarcinoma/diagnóstico , Adenocarcinoma/cirurgia , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Biomarcadores Tumorais/metabolismo , Embrião de Galinha , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/cirurgia , Regulação para Baixo , Feminino , Células HCT116 , Humanos , Imuno-Histoquímica , Masculino , Pessoa de Meia-Idade , Invasividade Neoplásica , Prognóstico , Estudos Retrospectivos , Análise Serial de Tecidos
9.
Cancers (Basel) ; 10(10)2018 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-30304835

RESUMO

The oncogenic cytoplasmic p21 contributes to cancer aggressiveness and chemotherapeutic failure. However, the molecular mechanisms remain obscure. Here, we show for the first time that cytoplasmic p21 mediates 5-Fluorouracil (5FU) resistance by shuttling p-Chk2 out of the nucleus to protect the tumor cells from its pro-apoptotic functions. We observed that cytoplasmic p21 levels were up-regulated in 5FU-resistant colorectal cancer cells in vitro and the in vivo Chorioallantoic membrane (CAM) model. Kinase array analysis revealed that p-Chk2 is a key target of cytoplasmic p21. Importantly, cytoplasmic form of p21 mediated by p21T145D transfection diminished p-Chk2-mediated activation of E2F1 and apoptosis induction. Co-immunoprecipitation, immunofluorescence, and proximity ligation assay showed that p21 forms a complex with p-Chk2 under 5FU exposure. Using in silico computer modeling, we suggest that the p21/p-Chk2 interaction hindered the nuclear localization signal of p-Chk2, and therefore, the complex is exported out of the nucleus. These findings unravel a novel mechanism regarding an oncogenic role of p21 in regulation of resistance to 5FU-based chemotherapy. We suggest a possible value of cytoplasmic p21 as a prognosis marker and a therapeutic target in colorectal cancer patients.

10.
Science ; 356(6334): 183-186, 2017 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-28408601

RESUMO

Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.


Assuntos
Idioma , Aprendizado de Máquina , Semântica , Associação , Feminino , Humanos , Internet , Masculino , Nomes , Fatores Sexuais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...