Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 822
Filtrar
1.
Sensors (Basel) ; 22(8)2022 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-35458823

RESUMO

The performance of deep neural networks and the low costs of computational hardware has made computer vision a popular choice in many robotic systems. An attractive feature of deep-learned methods is their ability to cope with appearance changes caused by day-night cycles and seasonal variations. However, deep learning of neural networks typically relies on large numbers of hand-annotated images, which requires significant effort for data collection and annotation. We present a method that allows autonomous, self-supervised training of a neural network in visual teach-and-repeat (VT&R) tasks, where a mobile robot has to traverse a previously taught path repeatedly. Our method is based on a fusion of two image registration schemes: one based on a Siamese neural network and another on point-feature matching. As the robot traverses the taught paths, it uses the results of feature-based matching to train the neural network, which, in turn, provides coarse registration estimates to the feature matcher. We show that as the neural network gets trained, the accuracy and robustness of the navigation increases, making the robot capable of dealing with significant changes in the environment. This method can significantly reduce the data annotation efforts when designing new robotic systems or introducing robots into new environments. Moreover, the method provides annotated datasets that can be deployed in other navigation systems. To promote the reproducibility of the research presented herein, we provide our datasets, codes and trained models online.


Assuntos
Mãos , Redes Neurais de Computação , Curadoria de Dados , Reprodutibilidade dos Testes , Projetos de Pesquisa
2.
Sensors (Basel) ; 22(7)2022 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-35408389

RESUMO

Image annotation is a time-consuming and costly task. Previously, we published MorphoCluster as a novel image annotation tool to address problems of conventional, classifier-based image annotation approaches: their limited efficiency, training set bias and lack of novelty detection. MorphoCluster uses clustering and similarity search to enable efficient, computer-assisted image annotation. In this work, we provide a deeper analysis of this approach. We simulate the actions of a MorphoCluster user to avoid extensive manual annotation runs. This simulation is used to test supervised, unsupervised and transfer representation learning approaches. Furthermore, shrunken k-means and partially labeled k-means, two new clustering algorithms that are tailored specifically for the MorphoCluster approach, are compared to the previously used HDBSCAN*. We find that labeled training data improve the image representations, that unsupervised learning beats transfer learning and that all three clustering algorithms are viable options, depending on whether completeness, efficiency or runtime is the priority. The simulation results support our earlier finding that MorphoCluster is very efficient and precise. Within the simulation, more than five objects per simulated click are being annotated with 95% precision.


Assuntos
Benchmarking , Curadoria de Dados , Algoritmos , Análise por Conglomerados , Computadores , Processamento de Imagem Assistida por Computador/métodos
3.
Nat Commun ; 13(1): 1161, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35246539

RESUMO

Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have a confounding effect on the assessment of model performance. Nevertheless, employing experts to remove label noise by fully re-annotating large datasets is infeasible in resource-constrained settings, such as healthcare. This work advocates for a data-driven approach to prioritising samples for re-annotation-which we term "active label cleaning". We propose to rank instances according to estimated label correctness and labelling difficulty of each sample, and introduce a simulation framework to evaluate relabelling efficacy. Our experiments on natural images and on a specifically-devised medical imaging benchmark show that cleaning noisy labels mitigates their negative impact on model training, evaluation, and selection. Crucially, the proposed approach enables correcting labels up to 4 × more effectively than typical random selection in realistic conditions, making better use of experts' valuable time for improving dataset quality.


Assuntos
Diagnóstico por Imagem , Aprendizado de Máquina , Benchmarking , Curadoria de Dados , Atenção à Saúde
4.
Sensors (Basel) ; 22(4)2022 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-35214497

RESUMO

Recent advances in computer vision are primarily driven by the usage of deep learning, which is known to require large amounts of data, and creating datasets for this purpose is not a trivial task. Larger benchmark datasets often have detailed processes with multiple stages and users with different roles during annotation. However, this can be difficult to implement in smaller projects where resources can be limited. Therefore, in this work we present our processes for creating an image dataset for kernel fragmentation and stover overlengths in Whole Plant Corn Silage. This includes the guidelines for annotating object instances in respective classes and statistics of gathered annotations. Given the challenging image conditions, where objects are present in large amounts of occlusion and clutter, the datasets appear appropriate for training models. However, we experience annotator inconsistency, which can hamper evaluation. Based on this we argue the importance of having an evaluation form independent of the manual annotation where we evaluate our models with physically based sieving metrics. Additionally, instead of the traditional time-consuming manual annotation approach, we evaluate Semi-Supervised Learning as an alternative, showing competitive results while requiring fewer annotations. Specifically, given a relatively large supervised set of around 1400 images we can improve the Average Precision by a number of percentage points. Additionally, we show a significantly large improvement when using an extremely small set of just over 100 images, with over 3× in Average Precision and up to 20 percentage points when estimating the quality.


Assuntos
Aprendizado Profundo , Curadoria de Dados , Silagem , Aprendizado de Máquina Supervisionado , Zea mays
5.
J Biomed Inform ; 127: 104007, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35124236

RESUMO

Biomedical research data reuse and sharing is essential for fostering research progress. To this aim, data producers need to master data management and reporting through standard and rich metadata, as encouraged by open data initiatives such as the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines. This helps data re-users to understand and reuse the shared data with confidence. Therefore, dedicated frameworks are required. The provenance reporting throughout a biomedical study lifecycle has been proposed as a way to increase confidence in data while reusing it. The Biomedical Study - Lifecycle Management (BMS-LM) data model has implemented provenance and lifecycle traceability for several multimodal-imaging techniques but this is not enough for data understanding while reusing it. Actually, in the large scope of biomedical research, a multitude of metadata sources, also called Knowledge Organization Systems (KOSs), are available for data annotation. In addition, data producers uses local terminologies or KOSs, containing vernacular terms for data reporting. The result is a set of heterogeneous KOSs (local and published) with different formats and levels of granularity. To manage the inherent heterogeneity, semantic interoperability is encouraged by the Research Data Management (RDM) community. Ontologies, and more specifically top ontologies such as BFO and DOLCE, make explicit the metadata semantics and enhance semantic interoperability. Based on the BMS-LM data model and the BFO top ontology, the BioMedical Study - Lifecycle Management (BMS-LM) core ontology is proposed together with an associated framework for semantic interoperability between heterogeneous KOSs. It is made of four ontological levels: top/core/domain/local and aims to build bridges between local and published KOSs. In this paper, the conversion of the BMS-LM data model to a core ontology is detailed. The implementation of its semantic interoperability in a specific domain context is explained and illustrated with examples from small animal preclinical research.


Assuntos
Ontologias Biológicas , Pesquisa Biomédica , Animais , Curadoria de Dados , Metadados , Projetos de Pesquisa , Semântica
6.
PLoS One ; 17(2): e0263616, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35143560

RESUMO

Peste des petits ruminants (PPR) is a highly contagious and devastating viral disease infecting predominantly sheep and goats. Tracking outbreaks of disease and analysing the movement of the virus often involves sequencing part or all of the genome and comparing the sequence obtained with sequences from other outbreaks, obtained from the public databases. However, there are a very large number (>1800) of PPRV sequences in the databases, a large majority of them relatively short, and not always well-documented. There is also a strong bias in the composition of the dataset, with countries with good sequencing capabilities (e.g. China, India, Turkey) being overrepresented, and most sequences coming from isolates in the last 20 years. In order to facilitate future analyses, we have prepared sets of PPRV sequences, sets which have been filtered for sequencing errors and unnecessary duplicates, and for which date and location information has been obtained, either from the database entry or from other published sources. These sequence datasets are freely available for download, and include smaller datasets which maximise phylogenetic information from the minimum number of sequences, and which will be useful for simple lineage identification. Their utility is illustrated by uploading the data to the MicroReact platform to allow simultaneous viewing of lineage date and geographic information on all the viruses for which we have information. While preparing these datasets, we identified a significant number of public database entries which contain clear errors, and propose guidelines on checking new sequences and completing metadata before submission.


Assuntos
Métodos Epidemiológicos , Genoma Viral , Vírus da Peste dos Pequenos Ruminantes/genética , RNA Viral , Análise de Sequência de RNA , Curadoria de Dados , Humanos , Recombinação Genética , Sequenciamento Completo do Genoma
7.
Database (Oxford) ; 20222022 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-35106535

RESUMO

Critical to answering large-scale questions in biology is the integration of knowledge from different disciplines into a coherent, computable whole. Controlled vocabularies such as ontologies represent a clear path toward this goal. Using survey questionnaires, we examined the attitudes of biologists toward adopting controlled vocabularies in phenotype publications. Our questions cover current experience and overall attitude with controlled vocabularies, the awareness of the issues around ambiguity and inconsistency in phenotype descriptions and post-publication professional data curation, the preferred solutions and the effort and desired rewards for adopting a new authoring workflow. Results suggest that although the existence of controlled vocabularies is widespread, their use is not common. A majority of respondents (74%) are frustrated with ambiguity in phenotypic descriptions, and there is a strong agreement (mean agreement score 4.21 out of 5) that author curation would better reflect the original meaning of phenotype data. Moreover, the vast majority (85%) of researchers would try a new authoring workflow if resultant data were more consistent and less ambiguous. Even more respondents (93%) suggested that they would try and possibly adopt a new authoring workflow if it required 5% additional effort as compared to normal, but higher rates resulted in a steep decline in likely adoption rates. Among the four different types of rewards, two types of citations were the most desired incentives for authors to produce computable data. Overall, our results suggest the adoption of a new authoring workflow would be accelerated by a user-friendly and efficient software-authoring tool, an increased awareness of the challenges text ambiguity creates for external curators and an elevated appreciation of the benefits of controlled vocabularies.


Assuntos
Curadoria de Dados , Software , Atitude , Fenótipo , Fluxo de Trabalho
8.
Elife ; 112022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34989675

RESUMO

Deep learning is emerging as a powerful approach for bioimage analysis. Its use in cell tracking is limited by the scarcity of annotated data for the training of deep-learning models. Moreover, annotation, training, prediction, and proofreading currently lack a unified user interface. We present ELEPHANT, an interactive platform for 3D cell tracking that addresses these challenges by taking an incremental approach to deep learning. ELEPHANT provides an interface that seamlessly integrates cell track annotation, deep learning, prediction, and proofreading. This enables users to implement cycles of incremental learning starting from a few annotated nuclei. Successive prediction-validation cycles enrich the training data, leading to rapid improvements in tracking performance. We test the software's performance against state-of-the-art methods and track lineages spanning the entire course of leg regeneration in a crustacean over 1 week (504 timepoints). ELEPHANT yields accurate, fully-validated cell lineages with a modest investment in time and effort.


Assuntos
Linhagem da Célula , Rastreamento de Células/métodos , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Curadoria de Dados , Humanos
10.
Anesth Analg ; 134(2): 380-388, 2022 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-34673658

RESUMO

BACKGROUND: The retrospective analysis of electroencephalogram (EEG) signals acquired from patients under general anesthesia is crucial in understanding the patient's unconscious brain's state. However, the creation of such database is often tedious and cumbersome and involves human labor. Hence, we developed a Raspberry Pi-based system for archiving EEG signals recorded from patients under anesthesia in operating rooms (ORs) with minimal human involvement. METHODS: Using this system, we archived patient EEG signals from over 500 unique surgeries at the Emory University Orthopaedics and Spine Hospital, Atlanta, for about 18 months. For this, we developed a software package that runs on a Raspberry Pi and archives patient EEG signals from a SedLine Root EEG Monitor (Masimo) to a secure Health Insurance Portability and Accountability Act (HIPAA) compliant cloud storage. The OR number corresponding to each surgery was archived along with the EEG signal to facilitate retrospective EEG analysis. We retrospectively processed the archived EEG signals and performed signal quality checks. We also proposed a formula to compute the proportion of true EEG signal and calculated the corresponding statistics. Further, we curated and interleaved patient medical record information with the corresponding EEG signals. RESULTS: We retrospectively processed the EEG signals to demonstrate a statistically significant negative correlation between the relative alpha power (8-12 Hz) of the EEG signal captured under anesthesia and the patient's age. CONCLUSIONS: Our system is a standalone EEG archiver developed using low cost and readily available hardware. We demonstrated that one could create a large-scale EEG database with minimal human involvement. Moreover, we showed that the captured EEG signal is of good quality for retrospective analysis and combined the EEG signal with the patient medical records. This project's software has been released under an open-source license to enable others to use and contribute.


Assuntos
Curadoria de Dados/métodos , Eletroencefalografia/instrumentação , Eletroencefalografia/métodos , Monitorização Intraoperatória/instrumentação , Monitorização Intraoperatória/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Gerenciamento de Dados/instrumentação , Gerenciamento de Dados/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto Jovem
11.
Nucleic Acids Res ; 50(D1): D578-D586, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718729

RESUMO

The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.


Assuntos
Curadoria de Dados/métodos , Bases de Dados de Proteínas , Complexos Multiproteicos/química , Coronavirus/química , Visualização de Dados , Bases de Dados de Compostos Químicos , Enzimas/química , Enzimas/metabolismo , Escherichia coli/química , Humanos , Cooperação Internacional , Anotação de Sequência Molecular , Complexos Multiproteicos/metabolismo , Interface Usuário-Computador
12.
Nucleic Acids Res ; 50(D1): D1282-D1294, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718737

RESUMO

The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb; www.guidetopharmacology.org) is an open-access, expert-curated database of molecular interactions between ligands and their targets. We describe expansion in content over nine database releases made during the last two years, which has focussed on three main areas of infection. The COVID-19 pandemic continues to have a major impact on health worldwide. GtoPdb has sought to support the wider research community to understand the pharmacology of emerging drug targets for SARS-CoV-2 as well as potential targets in the host to block viral entry and reduce the adverse effects of infection in patients with COVID-19. We describe how the database rapidly evolved to include a new family of Coronavirus proteins. Malaria remains a global threat to half the population of the world. Our database content continues to be enhanced through our collaboration with Medicines for Malaria Venture (MMV) on the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY (www.guidetomalariapharmacology.org). Antibiotic resistance is also a growing threat to global health. In response, we have extended our coverage of antibacterials in partnership with AntibioticDB.


Assuntos
Antibacterianos/farmacologia , Antimaláricos/farmacologia , Antivirais/farmacologia , COVID-19/tratamento farmacológico , Antibacterianos/química , COVID-19/etiologia , Curadoria de Dados , Bases de Dados de Produtos Farmacêuticos , Humanos , Ligantes , Malária/tratamento farmacológico , Malária/metabolismo , Interface Usuário-Computador , Proteínas Virais/química , Proteínas Virais/metabolismo
13.
Nucleic Acids Res ; 50(D1): D1508-D1514, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34643700

RESUMO

Stimulated by the growing interest in the role of dNTP pools in physiological and malignant processes, we established dNTPpoolDB, the database that offers access to quantitative data on dNTP pools from a wide range of species, experimental and developmental conditions (https://dntppool.org/). The database includes measured absolute or relative cellular levels of the four canonical building blocks of DNA and of exotic dNTPs, as well. In addition to the measured quantity, dNTPpoolDB contains ample information on sample source, dNTP quantitation methods and experimental conditions including any treatments and genetic manipulations. Functions such as the advanced search offering multiple choices from custom-built controlled vocabularies in 15 categories in parallel, the pairwise comparison of any chosen pools, and control-treatment correlations provide users with the possibility to quickly recognize and graphically analyse changes in the dNTP pools in function of a chosen parameter. Unbalanced dNTP pools, as well as the balanced accumulation or depletion of all four dNTPs result in genomic instability. Accordingly, key roles of dNTP pool homeostasis have been demonstrated in cancer progression, development, ageing and viral infections among others. dNTPpoolDB is designated to promote research in these fields and fills a longstanding gap in genome metabolism research.


Assuntos
Bases de Dados Genéticas , Desoxirribonucleotídeos/classificação , Instabilidade Genômica/genética , Neoplasias/genética , Replicação do DNA/genética , Curadoria de Dados , Desoxirribonucleotídeos/genética , Humanos , Neoplasias/classificação , Neoplasias/patologia
14.
Int J Biol Macromol ; 194: 84-99, 2022 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-34852258

RESUMO

Rapid Alkalinization Factors (RALFs) are plant-secreted, cysteine-rich polypeptides which are known to play essential roles in plant developmental processes and in several defense mechanisms. So far, RALF polypeptides have not been investigated in the Gossypium genus. In this study, 42, 38, 104 and 120 RALFs were identified from diploid G. arboreum and G. raimondi and tetraploid G. hirsutum and G. barbadense, respectively. These were further divided into four groups. Protein characteristics, sequence alignment, gene structure, conserved motifs, chromosomal location and cis-element identification were comprehensively analyzed. Whole genome duplication (WGD) /segmental duplication may be the reason why the number of RALF genes doubled in tetraploid Gossypium species. Expression patterns analysis showed that GhRALFs had different transcript accumulation patterns in the tested tissues and were differentially expressed in response to various abiotic stresses. Furthermore, GhRALF41-3 over-expressing (OE) plants showed reduction in root length and developed later with short stems and small rosettes than that of the wild type. The GhRALF14-8 and GhRALF27-8 OE plants, especially the latter, showed increase in seed abortion. Both transgenic Arabidopsis and VIGS cotton demonstrate that three GhRALFs are negative regulators in response to salt stress. Our systematic analyses provided insights into the characterization of RALF genes in Gossypium, which forms genetic basis for further exploration in their potential applications in cotton production.


Assuntos
Estudos de Associação Genética , Gossypium/fisiologia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Característica Quantitativa Herdável , Biologia Computacional/métodos , Curadoria de Dados , Regulação da Expressão Gênica de Plantas , Humanos , Família Multigênica , Filogenia , Fenômenos Fisiológicos Vegetais , Especificidade da Espécie
15.
Nucleic Acids Res ; 50(D1): D710-D718, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850130

RESUMO

Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes/genética , Especificidade de Órgãos/genética , RNA-Seq , Curadoria de Dados , Gerenciamento de Dados , Genoma Humano/genética , Humanos , Software
16.
Int J Biometeorol ; 66(1): 35-43, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34491440

RESUMO

Collaborative networks that involve the compilation of observations from diverse sources can provide important data, but are difficult to maintain over long periods. The International Phenological Garden (IPG) network, begun in 1959 and still functioning 60 years later, has been no exception. Here we document its history, its monitored 23 species (initially all propagated by cloning), and the locations and years of data contribution of its 131 gardens, of which 63 from 19 countries contributed data in 2021. The decision to use clones, rather than multiple, locally adapted individuals, was based on the idea that this would "control" for genetic effects, and it affects the applicability of the data and duration of the network. We also describe the overlap among the IPG network, the Pan-European Phenology network (PEP725), and the phenological data offered by the German Weather Service. Sustainable data storage and accessibility, as well as the continued monitoring of all 23 species/clones, are under discussion at the moment, as is the fate of other phenological networks, despite a politically mandatory plant-based climate-change monitoring.


Assuntos
Curadoria de Dados , Jardins , Mudança Climática , Humanos , Estações do Ano , Temperatura , Tempo (Meteorologia)
17.
Plant Physiol ; 188(2): 955-970, 2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-34792587

RESUMO

Short interspersed nuclear elements (SINEs) are a widespread type of small transposable element (TE). With increasing evidence for their impact on gene function and genome evolution in plants, accurate genome-scale SINE annotation becomes a fundamental step for studying the regulatory roles of SINEs and their relationship with other components in the genomes. Despite the overall promising progress made in TE annotation, SINE annotation remains a major challenge. Unlike some other TEs, SINEs are short and heterogeneous, and they usually lack well-conserved sequence or structural features. Thus, current SINE annotation tools have either low sensitivity or high false discovery rates. Given the demand and challenges, we aimed to provide a more accurate and efficient SINE annotation tool for plant genomes. The pipeline starts with maximizing the pool of SINE candidates via profile hidden Markov model-based homology search and de novo SINE search using structural features. Then, it excludes the false positives by integrating all known features of SINEs and the features of other types of TEs that can often be misannotated as SINEs. As a result, the pipeline substantially improves the tradeoff between sensitivity and accuracy, with both values close to or over 90%. We tested our tool in Arabidopsis thaliana and rice (Oryza sativa), and the results show that our tool competes favorably against existing SINE annotation tools. The simplicity and effectiveness of this tool would potentially be useful for generating more accurate SINE annotations for other plant species. The pipeline is freely available at https://github.com/yangli557/AnnoSINE.


Assuntos
Arabidopsis/genética , Curadoria de Dados/normas , Genoma de Planta , Guias como Assunto , Oryza/genética , Elementos Nucleotídeos Curtos e Dispersos , Reprodutibilidade dos Testes
18.
Nucleic Acids Res ; 50(D1): D687-D692, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34788843

RESUMO

The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.


Assuntos
Antivirais/farmacologia , Bases de Conhecimento , Proteínas/metabolismo , COVID-19/metabolismo , Curadoria de Dados , Genoma Humano , Interações Hospedeiro-Patógeno , Humanos , Proteínas/genética , Transdução de Sinais , Software
19.
Nat Biotechnol ; 40(4): 555-565, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34795433

RESUMO

A principal challenge in the analysis of tissue imaging data is cell segmentation-the task of identifying the precise boundary of every cell in an image. To address this problem we constructed TissueNet, a dataset for training segmentation models that contains more than 1 million manually labeled cells, an order of magnitude more than all previously published segmentation training datasets. We used TissueNet to train Mesmer, a deep-learning-enabled segmentation algorithm. We demonstrated that Mesmer is more accurate than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We then adapted Mesmer to harness cell lineage information in highly multiplexed datasets and used this enhanced version to quantify cell morphology changes during human gestation. All code, data and models are released as a community resource.


Assuntos
Aprendizado Profundo , Algoritmos , Curadoria de Dados , Humanos , Processamento de Imagem Assistida por Computador/métodos
20.
Drug Discov Today ; 27(1): 207-214, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34332096

RESUMO

Standardizing data is crucial for preserving and exchanging scientific information. In particular, recording the context in which data were created ensures that information remains findable, accessible, interoperable, and reusable. Here, we introduce the concept of self-reporting data assets (SRDAs), which preserve data and contextual information. SRDAs are an abstract concept, which requires a suitable data format for implementation. Four promising data formats or languages are popularly used to represent data in pharma: JCAMP-DX, JSON, AnIML, and, more recently, the Allotrope Data Format (ADF). Here, we evaluate these four options in common use cases within the pharmaceutical industry using multiple criteria. The evaluation shows that ADF is the most suitable format for the implementation of SRDAs.


Assuntos
Confiabilidade dos Dados , Curadoria de Dados , Indústria Farmacêutica , Disseminação de Informação/métodos , Projetos de Pesquisa/normas , Curadoria de Dados/métodos , Curadoria de Dados/normas , Difusão de Inovações , Indústria Farmacêutica/métodos , Indústria Farmacêutica/organização & administração , Humanos , Estudo de Prova de Conceito , Padrões de Referência , Tecnologia Farmacêutica/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...