Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
Add more filters










Publication year range
1.
J Comput Biol ; 31(6): 486-497, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38837136

ABSTRACT

Automatic radiology medical report generation is a necessary development of artificial intelligence technology in the health care. This technology serves to aid doctors in producing comprehensive diagnostic reports, alleviating the burdensome workloads of medical professionals. However, there are some challenges in generating radiological reports: (1) visual and textual data biases and (2) long-distance dependency problem. To tackle these issues, we design a visual recalibration and gating enhancement network (VRGE), which composes of the visual recalibration module and the gating enhancement module (gating enhancement module, GEM). Specifically, the visual recalibration module enhances the recognition of abnormal features in lesion areas of medical images. The GEM dynamically adjusts the contextual information in the report by introducing gating mechanisms, focusing on capturing professional medical terminology in medical text reports. We have conducted sufficient experiments on the public datasets of IU X-Ray to illustrate that the VRGE outperforms existing models.


Subject(s)
Artificial Intelligence , Humans , Radiology/methods , Algorithms
2.
Patterns (N Y) ; 5(4): 100946, 2024 Apr 12.
Article in English | MEDLINE | ID: mdl-38645766

ABSTRACT

Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.

3.
Ecol Evol ; 14(2): e10857, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38304273

ABSTRACT

Tracking the state of biodiversity over time is critical to successful conservation, but conventional monitoring schemes tend to be insufficient to adequately quantify how species' abundances and distributions are changing. One solution to this issue is to leverage data generated by citizen scientists, who collect vast quantities of data at temporal and spatial scales that cannot be matched by most traditional monitoring methods. However, the quality of citizen science data can vary greatly. In this paper, we develop three metrics (inventory completeness, range completeness, spatial bias) to assess the adequacy of spatial observation data. We explore the adequacy of citizen science data at the species level for Australia's terrestrial native birds and then model these metrics against a suite of seven species traits (threat status, taxonomic uniqueness, body mass, average count, range size, species density, and human population density) to identify predictors of data adequacy. We find that citizen science data adequacy for Australian birds is increasing across two of our metrics (inventory completeness and range completeness), but not spatial bias, which has worsened over time. Relationships between the three metrics and seven traits we modelled were variable, with only two traits having consistently significant relationships across the three metrics. Our results suggest that although citizen science data adequacy has generally increased over time, there are still gaps in the spatial adequacy of citizen science for monitoring many Australian birds. Despite these gaps, citizen science can play an important role in biodiversity monitoring by providing valuable baseline data that may be supplemented by information collected through other methods. We believe the metrics presented here constitute an easily applied approach to assessing the utility of citizen science datasets for biodiversity analyses, allowing researchers to identify and prioritise regions or species with lower data adequacy that will benefit most from targeted monitoring efforts.

4.
Heliyon ; 10(2): e24164, 2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38288010

ABSTRACT

Advanced synthetic data generators can simulate data samples that closely resemble sensitive personal datasets while significantly reducing the risk of individual identification. The use of these advanced generators holds enormous potential in the medical field, as it allows for the simulation and sharing of sensitive patient data. This enables the development and rigorous validation of novel AI technologies for accurate diagnosis and efficient disease management. Despite the availability of massive ground truth datasets (such as UK-NHS databases that contain millions of patient records), the risk of biases being carried over to data generators still exists. These biases may arise from the under-representation of specific patient cohorts due to cultural sensitivities within certain communities or standardised data collection procedures. Machine learning models can exhibit bias in various forms, including the under-representation of certain groups in the data. This can lead to missing data and inaccurate correlations and distributions, which may also be reflected in synthetic data. Our paper aims to improve synthetic data generators by introducing probabilistic approaches to first detect difficult-to-predict data samples in ground truth data and then boost them when applying the generator. In addition, we explore strategies to generate synthetic data that can reduce bias and, at the same time, improve the performance of predictive models.

5.
Front Artif Intell ; 6: 1203546, 2023.
Article in English | MEDLINE | ID: mdl-37795496

ABSTRACT

The increasing human population and variable weather conditions, due to climate change, pose a threat to the world's food security. To improve global food security, we need to provide breeders with tools to develop crop cultivars that are more resilient to extreme weather conditions and provide growers with tools to more effectively manage biotic and abiotic stresses in their crops. Plant phenotyping, the measurement of a plant's structural and functional characteristics, has the potential to inform, improve and accelerate both breeders' selections and growers' management decisions. To improve the speed, reliability and scale of plant phenotyping procedures, many researchers have adopted deep learning methods to estimate phenotypic information from images of plants and crops. Despite the successful results of these image-based phenotyping studies, the representations learned by deep learning models remain difficult to interpret, understand, and explain. For this reason, deep learning models are still considered to be black boxes. Explainable AI (XAI) is a promising approach for opening the deep learning model's black box and providing plant scientists with image-based phenotypic information that is interpretable and trustworthy. Although various fields of study have adopted XAI to advance their understanding of deep learning models, it has yet to be well-studied in the context of plant phenotyping research. In this review article, we reviewed existing XAI studies in plant shoot phenotyping, as well as related domains, to help plant researchers understand the benefits of XAI and make it easier for them to integrate XAI into their future studies. An elucidation of the representations within a deep learning model can help researchers explain the model's decisions, relate the features detected by the model to the underlying plant physiology, and enhance the trustworthiness of image-based phenotypic information used in food production systems.

6.
Stud Health Technol Inform ; 302: 428-432, 2023 May 18.
Article in English | MEDLINE | ID: mdl-37203710

ABSTRACT

Over the last decade, the explosion of "Big Data" and its fusion with AI has led many to believe that the development and integration of AI systems in healthcare will usher in a transformative revolution that democratises access to high quality healthcare and collectively improve patient outcomes. However, the nature of market forces in the evolving data economy, has started to show evidence that the opposite is more likely to be true. This paper argues that there is a poorly understood "Inverse Data Law" that will exacerbate the widening health divide between affluent and marginalised communities because: (1) data used to train AI systems favour individuals that are already engaged with healthcare, who have the lowest burden of disease, but the highest purchasing power; and (2) data used to drive market decisions around investment in AI health technology favours tools that increase the commodification of healthcare through over-testing, over-diagnosis, and the acute and episodic management of disease, over tools that support the patient to prevent disease. This dangerous combination is more likely to cripple efforts towards preventative medicine, as data collection and utilisation tends to be inversely proportional to the needs of the patients served - the inverse data law. The paper concludes by introducing important methodological considerations in the design and evaluation of AI systems to promote systems improvement for marginalised users.


Subject(s)
Artificial Intelligence , Big Data , Humans , Delivery of Health Care , Quality of Health Care , Data Collection
7.
Trends Parasitol ; 39(4): 238-241, 2023 04.
Article in English | MEDLINE | ID: mdl-36803860

ABSTRACT

War is an understudied and yet significant contributor to disease outbreaks, necessitating approaches incorporating conflicts into disease studies. We discuss mechanisms by which war affects disease dynamics, and supply an illustrative example. Lastly, we provide relevant data sources and pathways for incorporating metrics of armed conflict into disease ecology.


Subject(s)
Armed Conflicts , Communicable Diseases , Disease Outbreaks , Africa South of the Sahara/epidemiology , Ecology , Zoonoses/epidemiology
8.
Neotrop Entomol ; 52(1): 46-56, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36508148

ABSTRACT

Citizen science is a valuable tool for early detection, distribution, and spread of invasive alien species (IAS). Nevertheless, citizen science initiatives have several potential biases and may be complemented with long-term structured monitoring schemes. We analyzed the spatial-temporal dynamics of the invasion of Harmonia axyridis (Pallas) (Coleoptera: Coccinellidae) in Chile, based upon two citizen sciences databases (WEB and INAT) and one structured monitoring (SAG). We collected 8638 H. axyridis occurrences between 2009 and 2020. WEB had a higher number of records than SAG and INAT, and in all databases, the number of records has increased over time. The three databases showed that the invasion started in central Chile and then spread toward the north and south. The WEB and SAG recorded occurrences in the extreme north and south, whereas INAT concentrated all the occurrences in a more limited area, included in WEB and SAG. Both citizen science initiatives concentrated their records in areas of high human populations whereas SAG records had a more even distribution across regions. At 2020, WEB accounted for 55%, SAG 54%, and INAT 8% of the total area accumulated with H. axyridis, with only 16% of area shared among databases. WEB and INAT obtained most of their records in urban and industrial land cover types, while SAG records were more evenly represented in different land cover types. Our results confirm that combined methods, including citizen science initiatives, national surveillance system, and localized samplings, complement each other in providing knowledge to understand the patterns, processes, and consequences of this invasion.


Subject(s)
Citizen Science , Coleoptera , Humans , Animals , Introduced Species , Chile
9.
Stud Health Technol Inform ; 294: 327-331, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612086

ABSTRACT

Multimorbidity, having a diagnosis of two or more chronic conditions, increases as people age. It is a predictor used in clinical decision-making, but underdiagnosis in underserved populations produces bias in the data that support algorithms used in the healthcare processes. Artificial intelligence (AI) systems could produce inaccurate predictions if patients have multiple unknown conditions. Rural patients are more likely to be underserved and also more likely to have multiple chronic conditions. In this study, data collected during the course of care in a centrally located academic hospital, multimorbidity decreased with rurality. This decrease suggests a bias against rural patients for algorithms that rely on diagnosis information to calculate risk. To test preprocessing to address bias in healthcare data, we measured the amount of discrimination in favor of metropolitan patients in the classification of multimorbidity. We built a model using the biased data to test optimum classification performance. A new unbiased training data set and model were created and tested against unaltered validation data. The new model's classification performance on unaltered data did not diverge significantly from the performance of the initial optimal model trained on the biased data suggesting that bias can be removed with preprocessing.


Subject(s)
Algorithms , Artificial Intelligence , Bias , Delivery of Health Care , Health Facilities , Humans
10.
Angew Chem Int Ed Engl ; 61(29): e202204647, 2022 07 18.
Article in English | MEDLINE | ID: mdl-35512117

ABSTRACT

Assessing the outcomes of chemical reactions in a quantitative fashion has been a cornerstone across all synthetic disciplines. Classically approached through empirical optimization, data-driven modelling bears an enormous potential to streamline this process. However, such predictive models require significant quantities of high-quality data, the availability of which is limited: Main reasons for this include experimental errors and, importantly, human biases regarding experiment selection and result reporting. In a series of case studies, we investigate the impact of these biases for drawing general conclusions from chemical reaction data, revealing the utmost importance of "negative" examples. Eventually, case studies into data expansion approaches showcase directions to circumvent these limitations-and demonstrate perspectives towards a long-term data quality enhancement in chemistry.


Subject(s)
Machine Learning , Humans
11.
Plant Divers ; 44(2): 135-140, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35505988

ABSTRACT

Despite that several studies have shown that data derived from species lists generated from distribution occurrence records in the Global Biodiversity Information Facility (GBIF) are not appropriate for those ecological and biogeographic studies that require high sampling completeness, because species lists derived from GBIF are generally very incomplete, Suissa et al. (2021) generated fern species lists based on data with GBIF for 100 km × 100 km grid cells across the world, and used the data to determine fern diversity hotspots and species richness-climate relationships. We conduct an evaluation on the completeness of fern species lists derived from GBIF at the grid-cell scale and at a larger spatial scale, and determine whether fern data derived from GBIF are appropriate for studies on the relations of species composition and richness with climatic variables. We show that species sampling completeness of GBIF is low (<40%) for most of the grid cells examined, and such low sampling completeness can substantially bias the investigation of geographic and ecological patterns of species diversity and the identification of diversity hotspots. We conclude that fern species lists derived from GBIF are generally very incomplete across a wide range of spatial scales, and are not appropriate for studies that require data derived from species lists in high completeness. We present a map showing global patterns of fern species diversity based on complete or nearly complete regional fern species lists.

12.
Inf Syst Front ; : 1-25, 2022 Mar 22.
Article in English | MEDLINE | ID: mdl-35342331

ABSTRACT

Humanitarian crises, such as the 2014 West Africa Ebola epidemic, challenge information management and thereby threaten the digital resilience of the responding organizations. Crisis information management (CIM) is characterised by the urgency to respond despite the uncertainty of the situation. Coupled with high stakes, limited resources and a high cognitive load, crises are prone to induce biases in the data and the cognitive processes of analysts and decision-makers. When biases remain undetected and untreated in CIM, they may lead to decisions based on biased information, increasing the risk of an inefficient response. Literature suggests that crisis response needs to address the initial uncertainty and possible biases by adapting to new and better information as it becomes available. However, we know little about whether adaptive approaches mitigate the interplay of data and cognitive biases. We investigated this question in an exploratory, three-stage experiment on epidemic response. Our participants were experienced practitioners in the fields of crisis decision-making and information analysis. We found that analysts fail to successfully debias data, even when biases are detected, and that this failure can be attributed to undervaluing debiasing efforts in favor of rapid results. This failure leads to the development of biased information products that are conveyed to decision-makers, who consequently make decisions based on biased information. Confirmation bias reinforces the reliance on conclusions reached with biased data, leading to a vicious cycle, in which biased assumptions remain uncorrected. We suggest mindful debiasing as a possible counter-strategy against these bias effects in CIM.

13.
Mol Phylogenet Evol ; 167: 107342, 2022 02.
Article in English | MEDLINE | ID: mdl-34785384

ABSTRACT

Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.


Subject(s)
Crassulaceae , Genome , Genomics/methods , High-Throughput Nucleotide Sequencing , Phylogeny
15.
J Clin Epidemiol ; 139: 264-268, 2021 11.
Article in English | MEDLINE | ID: mdl-34119647

ABSTRACT

A previous note illustrated how the odds of an outcome have an undesirable property for risk summarization and communication: Noncollapsibility, defined as a failure of a group measure to represent a simple average of the measure over individuals or subgroups. The present sequel discusses how odds ratios amplify odds noncollapsibility and provides a basic numeric illustration of how noncollapsibility differs from confounding of effects (with which it is often confused). It also draws a connection of noncollapsibility to sparse-data bias in logistic, log-linear, and proportional-hazards regression.


Subject(s)
Biomedical Research/standards , Data Accuracy , Odds Ratio , Publication Bias/statistics & numerical data , Research Design/standards , Research Personnel/psychology , Biomedical Research/statistics & numerical data , Confounding Factors, Epidemiologic , Humans , Logistic Models , Research Design/statistics & numerical data
18.
Brief Bioinform ; 21(3): 791-802, 2020 05 21.
Article in English | MEDLINE | ID: mdl-31220208

ABSTRACT

Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.


Subject(s)
Computational Biology/methods , Drug Discovery/methods , Humans , Reproducibility of Results , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology
19.
Cytokine ; 120: 191, 2019 08.
Article in English | MEDLINE | ID: mdl-31100683

ABSTRACT

The aim of this study was to mention some methodological issues in a study which investigate the effect of Granulocyte colony-stimulating factor on developing of aortitis.


Subject(s)
Aortitis , Drug-Related Side Effects and Adverse Reactions , Granulocyte Colony-Stimulating Factor , Humans , Japan
20.
Scand J Trauma Resusc Emerg Med ; 27(1): 37, 2019 Apr 05.
Article in English | MEDLINE | ID: mdl-30953532

ABSTRACT

The aim of this Letter to the Editor was to report some methodological shortcomings in a recently published Article. We proved that the obtained results are subjected to the sparse data bias and presented some remedial tools such as penalization approaches. In addition, model fitting and performance aroused some controversies. In conclusion, the results of this study should be interpreted with caution and further reanalysis is necessary.


Subject(s)
Delivery, Obstetric/statistics & numerical data , Home Childbirth/statistics & numerical data , Infant, Newborn, Diseases/mortality , Premature Birth/mortality , Risk Assessment , Female , Follow-Up Studies , Humans , Infant, Newborn , Iran/epidemiology , Perinatal Mortality/trends , Pregnancy , Prospective Studies , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL