J Med Internet Res ; 22(10): e21980, 2020 10 09.
Artigo em Inglês | MEDLINE


BACKGROUND: In the prevention and control of infectious diseases, previous research on the application of big data technology has mainly focused on the early warning and early monitoring of infectious diseases. Although the application of big data technology for COVID-19 warning and monitoring remain important tasks, prevention of the disease's rapid spread and reduction of its impact on society are currently the most pressing challenges for the application of big data technology during the COVID-19 pandemic. After the outbreak of COVID-19 in Wuhan, the Chinese government and nongovernmental organizations actively used big data technology to prevent, contain, and control the spread of COVID-19. OBJECTIVE: The aim of this study is to discuss the application of big data technology to prevent, contain, and control COVID-19 in China; draw lessons; and make recommendations. METHODS: We discuss the data collection methods and key data information that existed in China before the outbreak of COVID-19 and how these data contributed to the prevention and control of COVID-19. Next, we discuss China's new data collection methods and new information assembled after the outbreak of COVID-19. Based on the data and information collected in China, we analyzed the application of big data technology from the perspectives of data sources, data application logic, data application level, and application results. In addition, we analyzed the issues, challenges, and responses encountered by China in the application of big data technology from four perspectives: data access, data use, data sharing, and data protection. Suggestions for improvements are made for data collection, data circulation, data innovation, and data security to help understand China's response to the epidemic and to provide lessons for other countries' prevention and control of COVID-19. RESULTS: In the process of the prevention and control of COVID-19 in China, big data technology has played an important role in personal tracking, surveillance and early warning, tracking of the virus's sources, drug screening, medical treatment, resource allocation, and production recovery. The data used included location and travel data, medical and health data, news media data, government data, online consumption data, data collected by intelligent equipment, and epidemic prevention data. We identified a number of big data problems including low efficiency of data collection, difficulty in guaranteeing data quality, low efficiency of data use, lack of timely data sharing, and data privacy protection issues. To address these problems, we suggest unified data collection standards, innovative use of data, accelerated exchange and circulation of data, and a detailed and rigorous data protection system. CONCLUSIONS: China has used big data technology to prevent and control COVID-19 in a timely manner. To prevent and control infectious diseases, countries must collect, clean, and integrate data from a wide range of sources; use big data technology to analyze a wide range of big data; create platforms for data analyses and sharing; and address privacy issues in the collection and use of big data.

Big Data , Infecções por Coronavirus/prevenção & controle , Pandemias/prevenção & controle , Pneumonia Viral/prevenção & controle , Betacoronavirus , China/epidemiologia , Segurança Computacional , Infecções por Coronavirus/epidemiologia , Coleta de Dados , Humanos , Disseminação de Informação , Armazenamento e Recuperação da Informação , Pneumonia Viral/epidemiologia , Privacidade
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 5592-5597, 2020 07.
Artigo em Inglês | MEDLINE


There exists a need for sharing user health data, especially with institutes for research purposes, in a secure fashion. This is especially true in the case of a system that includes a third party storage service, such as cloud computing, which limits the control of the data owner. The use of encryption for secure data storage continues to evolve to meet the need for flexible and fine-grained access control. This evolution has led to the development of Attribute Based Encryption (ABE). The use of ABE to ensure the security and privacy of health data has been explored. This paper presents an ABE based framework which allows for the secure outsourcing of the more computationally intensive processes for data decryption to the cloud servers. This reduces the time needed for decryption to occur at the user end and reduces the amount of computational power needed by users to access data.

Computação em Nuvem , Privacidade , Segurança Computacional , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação
Int Heart J ; 61(5): 922-926, 2020 Sep 29.
Artigo em Inglês | MEDLINE


The incidence of ventricular arrhythmia in patients with an implanted pacemaker is not yet known. The aim of this study was to analyze non-sustained ventricular tachycardia (NSVT) episodes based on stored electrograms (EGM) and determine the occurrence rate and risk factors for NSVT in a pacemaker population.This study included 302 consecutive patients with a dual-chamber pacemaker. A total of 1024 EGMs stored in pacemakers as ventricular high-rate episodes were analyzed. The definition of NSVT was ≥ 5 consecutive ventricular beats at ≥ 150 bpm lasting < 30 seconds.In baseline, most patients (94.8%) had ≥ 60% left ventricular ejection fraction. Of 1024 EGMs, 420 (41.0%) showed appropriate NSVT episodes, as well as premature atrial contractions, atrial tachyarrhythmia, or atrial fibrillation with a rapid ventricular response, whereas other EGMs did not show an actual ventricular arrhythmia. On EGM analysis, during a mean follow-up period of 46.1 months, NSVT occurred one or more times in 82 patients (33.1%). On multivariate analysis, ≥ 50% right ventricular pacing was an independent risk factor for NSVT (odds ratios, 4.519; P < 0.001), but NSVT was not associated with increased all-cause mortality.Moreover, in the pacemaker population, ≥ 50% right ventricular pacing is an independent risk factor for NSVT; however, NSVT was not associated with increased all-cause mortality because of the preserved left ventricular function.

Estimulação Cardíaca Artificial/métodos , Técnicas Eletrofisiológicas Cardíacas , Mortalidade , Marca-Passo Artificial , Taquicardia Ventricular/epidemiologia , Idoso , Idoso de 80 Anos ou mais , Fibrilação Atrial , Complexos Atriais Prematuros , Feminino , Ventrículos do Coração , Humanos , Armazenamento e Recuperação da Informação , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Fatores Sexuais , Taquicardia Supraventricular
Zhongguo Zhong Yao Za Zhi ; 45(15): 3642-3650, 2020 Aug.
Artigo em Chinês | MEDLINE


This article is based on basic data such as field surveys and literature surveys, contrasting and analyzing the distribution of Callicarpa nudiflora by different zoning methods, different data sources, and different spatial scales. The results showed that there were certain differences in the distribution results obtained by using different methods, such as qualitative description, similar ecological environment, and niche model, to divide the distribution of the C. nudiflora, but all of them could reflect the distribution of C. nudiflora to different degrees. Among them, the qualitative description division method has certain advantages in macro guidance in a large scale. The distribution range obtained by the ecological environment similar division method is wider than that obtained by applying the qualitative description method and the niche model method. The results of the zoning of the distribution of the C. nudiflora obtained from different data sources were different. The number and representativeness of the survey data have an impact on the zoning results. Through the analysis of the distribution of different spatial scales, the ecological factors and contribution rates that affect the distribution of C. nudiflora are different in China and in the world. The comprehensive multi-source data analysis showed that C. nudiflora mainly distributed in southern coastal provinces such as Hainan, Guangdong, Guangxi and Fujian in China, and also in Jiangxi, Guizhou, Yunnan, Sichuan, Chongqing, Hunan, Gansu, Taiwan and other provinces. Globally, C. nudiflora are suitable for distribution in Southeast Asia, such as China, Vietnam, Laos, Myanmar, India, etc. There are also potential distribution areas in the southern United States and Mexico.

Callicarpa , China , Coleta de Dados , Armazenamento e Recuperação da Informação , Vietnã
BMC Bioinformatics ; 21(Suppl 13): 380, 2020 Sep 17.
Artigo em Inglês | MEDLINE


BACKGROUND: Biomedical document triage is the foundation of biomedical information extraction, which is important to precision medicine. Recently, some neural networks-based methods have been proposed to classify biomedical documents automatically. In the biomedical domain, documents are often very long and often contain very complicated sentences. However, the current methods still find it difficult to capture important features across sentences. RESULTS: In this paper, we propose a hierarchical attention-based capsule model for biomedical document triage. The proposed model effectively employs hierarchical attention mechanism and capsule networks to capture valuable features across sentences and construct a final latent feature representation for a document. We evaluated our model on three public corpora. CONCLUSIONS: Experimental results showed that both hierarchical attention mechanism and capsule networks are helpful in biomedical document triage task. Our method proved itself highly competitive or superior compared with other state-of-the-art methods.

Armazenamento e Recuperação da Informação/métodos , Redes Neurais de Computação , Triagem/métodos , Humanos
PLoS One ; 15(9): e0238290, 2020.
Artigo em Inglês | MEDLINE


A well-defined protocol for a clinical trial guarantees a successful outcome report. When designing the protocol, most researchers refer to electronic databases and extract protocol elements using a keyword search. However, state-of-the-art database systems only offer text-based searches for user-entered keywords. In this study, we present a database system with a context-dependent and protocol-element-selection function for successfully designing a clinical trial protocol. To do this, we first introduce a database for a protocol retrieval system constructed from individual protocol data extracted from 184,634 clinical trials and 13,210 frame structures of clinical trial protocols. The database contains a variety of semantic information that allows the filtering of protocols during the search operation. Based on the database, we developed a web application called the clinical trial protocol database system (CLIPS; available at This system enables an interactive search by utilizing protocol elements. To enable an interactive search for combinations of protocol elements, CLIPS provides optional next element selection according to the previous element in the form of a connected tree. The validation results show that our method achieves better performance than that of existing databases in predicting phenotypic features.

Protocolos de Ensaio Clínico como Assunto , Ensaios Clínicos como Assunto/normas , Biologia Computacional/métodos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Software , Humanos , Interface Usuário-Computador
Artigo em Inglês | MEDLINE


OBJECTIVE: Coronavirus disease 2019 (COVID-19) has caused substantial panic worldwide since its outbreak in December 2019. This study uses social networks to track the evolution of public emotion during COVID-19 in China and analyzes the root causes of these public emotions from an event-driven perspective. METHODS: A dataset was constructed using microblogs (n = 125,672) labeled with COVID-19-related super topics (n = 680) from 40,891 users from 1 December 2019 to 17 February 2020. Based on the skeleton and key change points of COVID-19 extracted from microblogging contents, we tracked the public's emotional evolution modes (accumulated emotions, emotion covariances, and emotion transitions) by time phase and further extracted the details of dominant social events. RESULTS: Public emotions showed different evolution modes during different phases of COVID-19. Events about the development of COVID-19 remained hot, but generally declined, and public attention shifted to other aspects of the epidemic (e.g., encouragement, support, and treatment). CONCLUSIONS: These findings suggest that the public's feedback on COVID-19 predated official accounts on the microblog platform. There were clear differences in the trending events that large users (users with many fans and readings) and common users paid attention to during each phase of COVID-19.

Blogging/estatística & dados numéricos , Infecções por Coronavirus/psicologia , Coronavirus , Emoções , Armazenamento e Recuperação da Informação/métodos , Pneumonia Viral/psicologia , Mídias Sociais/estatística & dados numéricos , Betacoronavirus , China , Infecções por Coronavirus/epidemiologia , Humanos , Pandemias , Pneumonia Viral/epidemiologia
RECIIS (Online) ; 14(3): 563-579, jul.-set. 2020. ilus
Artigo em Português | LILACS


Um dos desafios das mudanças e evoluções das tecnologias de informação e comunicação (TIC) em corporações é a preservação das informações digitais. Entre as corporações com grande geração de informações digitais estão as universidades. Neste artigo, é apresentada uma estratégia para se elaborar uma política de preservação digital no bojo de uma política arquivística direcionada para a manutenção da autenticidade dos documentos de arquivo. O objetivo é expor um modelo para elaboração de políticas de preservação digital de documentos de arquivo por instituições de ensino superior (IES), com os elementos que devem compô-las, a partir da literatura estudada e da política elaborada e aprovada na Universidade Estadual Paulista (Unesp). São apresentados os conceitos relacionados à política arquivística para a preservação digital de documentos de arquivo, sua definição, seus aspectos e elementos. Concluiu-se que o modelo pode ser adaptado para outros objetos digitais, bem como para outras instituições.

One of the challenges of the changes and evolutions of the information and communication technology (ICT) in corporations is the preservation of digital information. The universities are among the corporations with a large generation of digital information. This article presents a strategy for the elaboration of a digital preservation policy, in the context of an archival policy which is aimed at maintaining the authenticity of archival documents. The objective of this article is to present a model so that the higher education institutions could making policies for the digital preservation of their archival documents, showing the elements that must compose each one, based on the studied literature and on the policy elaborated in Unesp and which was officially approved by that institution. Concepts related to archival policy for the digital preservation of archival documents, their definition, aspects and elements are presented here. It was concluded that the model can be adapted for other digital objects, as well as for other institutions.

Uno de los desafíos de los cambios y la evolución de las TIC en las corporaciones es la preservación de informaciones digitales. Entre las corporaciones con una gran generación de información digital se encuentran las universidades. En este artículo, se presenta una estrategia para elaborar una política de preservación digital, en medio de una política de archivo dirigida a mantener la autenticidad de los documentos de archivo. El objetivo del artículo es presentar un modelo para la elaboración de políticas de preservación digital de documentos de archivo por instituciones de enseñanza superior, con los elementos que deben componerlas, basado en la literatura estudiada y en la política desarrollada y aprobada en la Unesp. Se presentan conceptos relacionados con la política de archivo para la preservación digital de documentos de archivo, su definición, aspectos y elementos. Se concluyó que este modelo puede adaptarse para otros objetos digitales, así como para otras instituciones.

Humanos , Registros , Instituições de Ensino Superior , Tecnologia da Informação , Políticas , Arquivamento , Cultura Organizacional , Armazenamento e Recuperação da Informação , Administração das Tecnologias da Informação
RECIIS (Online) ; 14(3): 597-618, jul.-set. 2020. graf, ilus
Artigo em Português | LILACS


Este artigo busca responder a alguns dos desafios de sistematização, indexação e divulgação de variados documentos acadêmicos da área de pensamento social no Brasil pela Biblioteca Virtual do Pensamento Social (BVPS). Argumentamos que a importância da discussão sobre preservação digital para a BVPS cumpre dois objetivos: o de disponibilizar documentos digitalizados a um público mais amplo e o de mapear a produção contemporânea da área, com intuito de criar uma memória intelectual. Neste artigo, nos deteremos sobretudo no segundo objetivo, tendo em vista definir bem como disponibilizar ao público da biblioteca os critérios de seleção e organização do acervo. Dentro dos limites do recorte proposto, por meio de redes de acoplamento bibliográfico, cocitação e mapas semânticos, apresentaremos aqui uma análise preliminar da produção de artigos na área de pensamento social no Brasil. A atual pesquisa é fundamental para a definição das próximas etapas de ampliação do conteúdo da biblioteca, notadamente a definição de novos seletores de busca, a integração de novos autores e autoras à seção Intérpretes e a indexação de trabalhos com temáticas e abordagens caras à área de pensamento social no Brasil.

This article seeks to respond to some of the challenges of systematization, indexing and dissemination of various academic documents in the field of social thought in Brazil by the BVPS ­ Biblioteca Virtual do Pensamento Social (Virtual Library of Social Thought). We argue that the importance of the discussion on digital preservation for the BVPS fulfills two objectives: that of making digitized documents available to a wider audience and that of mapping contemporary production in that field in order to create an intellectual memory. In this article, we will focus mainly on the second objective, in order to define as well as make available to the library public the selection and organization criteria of the collection. Within the limits of the proposed clipping, we will present here a preliminary analysis of the production of articles in the field of social thought in Brazil through networks of bibliographic coupling, co-quotation and semantic maps. The current research is fundamental for the definition of the next steps to expand the content of the library, notably the definition of new search options the integration of new authors in the section Interpreters and the indexing of works containing important themes and approaches for the area of social thought in Brazil.

Este artículo busca responder a algunos de los desafíos de la sistematización, indexación y difusión de diferentes tipos de documentos académicos en el campo del pensamiento social en Brasil por la BVPS ­ Biblioteca Virtual do Pensamento Social (Biblioteca Virtual del Pensamiento Social). Argumentamos que la importancia de la discusión sobre la preservación digital para la BVPS cumple dos objetivos: el de hacer que los documentos digitalizados estén disponibles para una audiencia más amplia y el de mapear la producción contemporánea en el área para crear una memoria intelectual. En este artículo, nos centraremos principalmente en el segundo objetivo, para definir como también para poner a la disposición del público de la biblioteca los criterios de selección y organización de la colección. Dentro de los límites del recorte propuesto, presentaremos aquí un análisis preliminar de la producción de artículos en el campo del pensamiento social en Brasil a través de redes de acoplamiento bibliográfico, cocitación y mapas semánticos. La investigación actual es fundamental para la definición de los próximos pasos para expandir el contenido de la biblioteca, en particular la definición de nuevos selectores de búsqueda, la integración de nuevos autores y autoras en la sección Intérpretes y la indexación de trabajos conteniendo temas y enfoques relevantes para el área de pensamiento social en Brasil.

Humanos , Brasil , Armazenamento e Recuperação da Informação , Bibliotecas Digitais , Big Data , Antropologia Cultural , Sociologia , Registros , Gestão da Informação
RECIIS (Online) ; 14(3): 724-733, jul.-set. 2020. ilus
Artigo em Espanhol | LILACS


En esta entrevista a Reciis, Miquel Térmens discute la importancia de la preservación digital para crear un sistema de salud que sea bueno no solo para el futuro, pero para el presente. Estamos en una fase de recopilación y almacenamiento de una gran cantidad de datos sobre el nuevo coronavirus para asegurar su rápida utilización, y su preservación a largo plazo es de interés tanto de los gobiernos como de los grupos de investigación que están trabajando a favor de las soluciones. El gran reto de nuestro presente es investigar cómo hacer preservación digital a una nueva escala, incorporando datos de las redes sociales, datos de investigación y Big Data, pero eso solo va a ser posible con normalización y planificación. Miquel Térmens Graells es doctor en Documentación por la Universidad de Barcelona, es profesor titular y decano de la Facultad de Información y Medios Audiovisuales de la misma universidad.

Humanos , Organização e Administração , Sistemas de Saúde , Curadoria de Dados , Big Data , Análise de Dados , Coleta de Dados , Armazenamento e Recuperação da Informação , Acesso à Informação
RECIIS (Online) ; 14(3): 764-781, jul.-set. 2020.
Artigo em Português | LILACS


Este estudo realiza uma reflexão sobre a preservação de documentos arquivísticos digitais em uma perspectiva sistêmica, pautada em padrões reconhecidos pela literatura científica. De tal forma, utiliza-se da visão holística para ressaltar a pertinência da preservação ser pensada em todo o ciclo de vida dos documentos. A metodologia parte do levantamento bibliográfico composto por artigos, livros e publicações técnicas, para assim, obter uma revisão narrativa. Ressalta-se que a preservação digital tem evoluído para novos patamares e requer o uso de padrões para implementar sistemas informatizados confiáveis. Com isso, pode-se envolver todo o ciclo vital em uma cadeia de custódia ininterrupta capaz de assegurar a autenticidade dos documentos digitais. Por fim, defende-se uma abordagem sistêmico-holística, em que os documentos são planejados e produzidos tendo em vista a preservação e o acesso em longo prazo.

This study reflects on the preservation of digital archival records from a systemic perspective, based on standards recognized by scientific literature. In such a way, it uses a holistic view to emphasize the relevance of preservation to be considered throughout the life cycle of records. The methodology is based on a bibliographic survey composed of articles, books, and technical publications, to obtain a narrative review. It is noteworthy that digital preservation has evolved to new heights, and requires the use of standards to implement reliable computer systems. With this, the entire life cycle can be involved in an uninterrupted chain of custody capable of ensuring the authenticity of digital records. Finally, a systemic-holistic approach is advocated, in which records are planned and produced with a view of the preservation and access in the long-term.

Este estudio reflexiona sobre la preservación de los documentos de archivo digital en una perspectiva sistémica, basada en normas y estándares reconocidos por la literatura científica. De esta manera, utiliza la visión holística para enfatizar la relevancia de la preservación a ser considerada a lo largo del ciclo de vida de los documentos. La metodología se basa en una encuesta bibliográfica compuesta de artículos, libros y publicaciones técnicas, para obtener una revisión narrativa. Es de destacar que la preservación digital ha evolucionado a nuevas alturas y requiere el uso de estándares para implementar sistemas informáticos confiables. Con esto, todo el ciclo de vida puede involucrarse en una cadena de custodia ininterrumpida capaz de garantizar la autenticidad de los documentos digitales. Finalmente, se aboga por un enfoque holístico-sistémico, en que los documentos se planifican y producen con miras la preservación y acceso a largo plazo.

Humanos , Arquivos , Sistemas de Informação , Registros , Gestão da Informação , Curadoria de Dados , Coleta de Dados , Armazenamento e Recuperação da Informação , Acesso à Informação
J Toxicol Sci ; 45(8): 449-473, 2020.
Artigo em Inglês | MEDLINE


Although peroxisome proliferator-activated receptor α (PPARα) agonists are obviously hepatocarcinogenic in rodents, they have been widely used for dyslipidemia and proven to be safe for clinical use without respect to the species difference. It is established that PPARα acts as a part of the transcription factor complex, but its precise mechanism is still unknown. Using the data of Toxicogenomics Database, reliable genes responsive to PPARα agonists, clofibrate, fenofibrate and WY-14,643, in rat liver, were extracted from both in vivo and in vitro data, and sorted by their fold increase. It was found that there were many genes responding to fibrates exclusively in vivo. Most of the in vivo specific genes appear to be unrelated to lipid metabolism and are not upregulated in the kidney. Fifty-seven genes directly related to cell proliferation were extracted from in vivo data, but they were not induced in vitro at all. Analysis of PPAR-responsive elements could not explain the observed difference in induction. To evaluate possible interaction between neighboring genes in gene expression, the correlation of the fold changes of neighboring genes for 22 drugs with various PPARα agonistic potencies were calculated for the genes showing more than 2.5 fold induction by 3 fibrates in vivo, and their genomic location was compared with that of the human orthologue. In the present study, many candidates of genes other than lipid metabolism were selected, and these could be good starting points to elucidate the mechanism of PPARα agonist-induced rodent-specific toxicity.

Bases de Dados Genéticas , Fenofibrato/toxicidade , Loci Gênicos/genética , Armazenamento e Recuperação da Informação/métodos , Metabolismo dos Lipídeos/genética , Fígado/metabolismo , PPAR alfa/agonistas , Pirimidinas/toxicidade , Animais , Epistasia Genética , Expressão Gênica , Estudos de Associação Genética , Masculino , Ratos Sprague-Dawley , Especificidade da Espécie
PLoS One ; 15(8): e0236799, 2020.
Artigo em Inglês | MEDLINE


INTRODUCTION: Numerous prior studies, even from countries with free access to care, have associated long travel time to care with poor survival in patients with colorectal cancer. METHODS: This is a data-linkage study of all 3718 patients with colorectal cancer, diagnosed between 2007 and 2013 in Northern Sweden, one of the most sparsely populated areas in Europe. Travel time to nearest hospital was calculated based on GPS coordinates and multivariable Cox regression was used to analyse possible associations between travel time and cause-specific survival. RESULTS: No association between travel time and survival was observed, either in univariable analysis (colon HR 1.00 [95% CI 0.998-1.003]; rectal HR 0.998; [95% CI 0.995-1.002]) or in multivariable Cox regression analysis (colon HR 0.999 [95% CI 0.997-1.002]; rectal HR 0.997 [95% CI 0.992-1.002]). CONCLUSIONS: In contrast to most other studies, no association between travel time and colorectal cancer survival was found; despite that longer travel time was associated with known risk factors for poorer outcome. In the Swedish health care setting, travel time does not appear to represent a barrier to care or to negatively influence outcomes.

Neoplasias Colorretais/mortalidade , Idoso , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/patologia , Bases de Dados Factuais , Escolaridade , Feminino , Humanos , Armazenamento e Recuperação da Informação , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Modelos de Riscos Proporcionais , Sistema de Registros , Fatores de Risco , Fatores de Tempo
BMC Bioinformatics ; 21(1): 274, 2020 Jul 01.
Artigo em Inglês | MEDLINE


BACKGROUND: Obtaining data from single-cell transcriptomic sequencing allows for the investigation of cell-specific gene expression patterns, which could not be addressed a few years ago. With the advancement of droplet-based protocols the number of studied cells continues to increase rapidly. This establishes the need for software tools for efficient processing of the produced large-scale datasets. We address this need by presenting RainDrop for fast gene-cell count matrix computation from single-cell RNA-seq data produced by 10x Genomics Chromium technology. RESULTS: RainDrop can process single-cell transcriptomic datasets consisting of 784 million reads sequenced from around 8.000 cells in less than 40 minutes on a standard workstation. It significantly outperforms the established Cell Ranger pipeline and the recently introduced Alevin tool in terms of runtime by a maximal (average) speedup of 30.4 (22.6) and 3.5 (2.4), respectively, while keeping high agreements of the generated results. CONCLUSIONS: RainDrop is a software tool for highly efficient processing of large-scale droplet-based single-cell RNA-seq datasets on standard workstations written in C++. It is available at .

Análise de Sequência de RNA/métodos , Interface Usuário-Computador , Bases de Dados Genéticas , Humanos , Armazenamento e Recuperação da Informação , Análise de Célula Única
Nat Commun ; 11(1): 3264, 2020 Jun 29.
Artigo em Inglês | MEDLINE


DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.

Armazenamento e Recuperação da Informação , Análise de Sequência de DNA , Viés , Modelos Teóricos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos
Stud Health Technol Inform ; 272: 350-353, 2020 Jun 26.
Artigo em Inglês | MEDLINE


Data quality problems in coded clinical and administrative data have persisted ever since diagnoses and procedures were first coded and used for healthcare billing. These data are used in clinical decision-making introducing a route for iatrogenesis. As we share data on regional Health Information Exchanges (HIEs) and include them in electronic health records the potential for harm may be increased. To study this problem we applied rules-based data quality checks that have been previously tested on Electronic Health Records (EHR) data on a limited set of aggregated claims data. Medicaid claims data was used exclusively. CMS has clear guidelines for claims submitted for Medicaid patients and penalties are incurred for erroneous claims, which should ensure a high quality data source, however reports of low and varying sensitivity, specificity, positive and negative predictive value of coded diagnoses are common. To identify data quality defects in claims data in a state All Payer Claims Dataset (APCD) we applied and evaluated a recently developed rules-based data quality assessment and monitoring system for Electronic Health Record (EHR) data to test effectiveness in claims data. These rules, that are feasible for "All Payer Claims data" and Medicaid data are identified, applied and the Data Quality issue results are produced.

Confiabilidade dos Dados , Registros Eletrônicos de Saúde , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação , Medicaid , Estados Unidos
Stud Health Technol Inform ; 272: 425-428, 2020 Jun 26.
Artigo em Inglês | MEDLINE


This paper reports on the early-stage development of an analytics framework to support the semantic integration of dynamic surveillance data across multiple scales to inform decision making for malaria eradication. We propose using the Semantic Web of Things (SWoT), a combination of Internet of Things (IoT) and semantic web technologies, to support the evolution and integration of dynamic malaria data sources and improve interoperability between different datasets generated through relevant IoT assets (e.g. computers, sensors, persons, and other smart objects and devices).

Web Semântica , Humanos , Armazenamento e Recuperação da Informação , Malária/prevenção & controle , Prevenção Primária
Stud Health Technol Inform ; 272: 55-58, 2020 Jun 26.
Artigo em Inglês | MEDLINE


The automated detection of adverse events in medical records might be a cost-effective solution for patient safety management or pharmacovigilance. Our group proposed an information extraction algorithm (IEA) for detecting adverse events in neurosurgery using documents written in a natural rich-in-morphology language. In this paper, we challenge to optimize and evaluate its performance for the detection of any extremity muscle weakness in clinical texts. Our algorithm shows the accuracy of 0.96 and ROC AUC = 0.96 and might be easily implemented in other medical domains.

Debilidade Muscular , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Farmacovigilância