Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
PLoS One ; 18(12): e0292356, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38100453

RESUMO

Automatic biomedical relation extraction (bioRE) is an essential task in biomedical research in order to generate high-quality labelled data that can be used for the development of innovative predictive methods. However, building such fully labelled, high quality bioRE data sets of adequate size for the training of state-of-the-art relation extraction models is hindered by an annotation bottleneck due to limitations on time and expertise of researchers and curators. We show here how Active Learning (AL) plays an important role in resolving this issue and positively improve bioRE tasks, effectively overcoming the labelling limits inherent to a data set. Six different AL strategies are benchmarked on seven bioRE data sets, using PubMedBERT as the base model, evaluating their area under the learning curve (AULC) as well as intermediate results measurements. The results demonstrate that uncertainty-based strategies, such as Least-Confident or Margin Sampling, are statistically performing better in terms of F1-score, accuracy and precision, than other types of AL strategies. However, in terms of recall, a diversity-based strategy, called Core-set, outperforms all strategies. AL strategies are shown to reduce the annotation need (in order to reach a performance at par with training on all data), from 6% to 38%, depending on the data set; with Margin Sampling and Least-Confident Sampling strategies moreover obtaining the best AULCs compared to the Random Sampling baseline. We show through the experiments the importance of using AL methods to reduce the amount of labelling needed to construct high-quality data sets leading to optimal performance of deep learning models. The code and data sets to reproduce all the results presented in the article are available at https://github.com/oligogenic/Deep_active_learning_bioRE.


Assuntos
Pesquisa Biomédica , Confiabilidade dos Dados , Área Sob a Curva
2.
BMC Bioinformatics ; 24(1): 179, 2023 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-37127601

RESUMO

BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.

3.
HGG Adv ; 4(1): 100165, 2023 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-36578772

RESUMO

Although standards and guidelines for the interpretation of variants identified in genes that cause Mendelian disorders have been developed, this is not the case for more complex genetic models including variant combinations in multiple genes. During a large curation process conducted on 318 research articles presenting oligogenic variant combinations, we encountered several recurring issues concerning their proper reporting and pathogenicity assessment. These mainly concern the absence of strong evidence that refutes a monogenic model and the lack of a proper genetic and functional assessment of the joint effect of the involved variants. With the increasing accumulation of such cases, it has become essential to develop standards and guidelines on how these oligogenic/multilocus variant combinations should be interpreted, validated, and reported in order to provide high-quality data and supporting evidence to the scientific community.


Assuntos
Software , Virulência
4.
Database (Oxford) ; 20222022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35411390

RESUMO

Improving the understanding of the oligogenic nature of diseases requires access to high-quality, well-curated Findable, Accessible, Interoperable, Reusable (FAIR) data. Although first steps were taken with the development of the Digenic Diseases Database, leading to novel computational advancements to assist the field, these were also linked with a number of limitations, for instance, the ad hoc curation protocol and the inclusion of only digenic cases. The OLIgogenic diseases DAtabase (OLIDA) presents a novel, transparent and rigorous curation protocol, introducing a confidence scoring mechanism for the published oligogenic literature. The application of this protocol on the oligogenic literature generated a new repository containing 916 oligogenic variant combinations linked to 159 distinct diseases. Information extracted from the scientific literature is supplemented with current knowledge support obtained from public databases. Each entry is an oligogenic combination linked to a disease, labelled with a confidence score based on the level of genetic and functional evidence that supports its involvement in this disease. These scores allow users to assess the relevance and proof of pathogenicity of each oligogenic combination in the database, constituting markers for reporting improvements on disease-causing oligogenic variant combinations. OLIDA follows the FAIR principles, providing detailed documentation, easy data access through its application programming interface and website, use of unique identifiers and links to existing ontologies. DATABASE URL: https://olida.ibsquare.be.


Assuntos
Software , Vocabulário Controlado , Bases de Dados Factuais
5.
Genes Cells ; 25(1): 22-32, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31680384

RESUMO

DNA methylation controls gene expression, and once established, DNA methylation patterns are faithfully copied during DNA replication by the maintenance DNA methyltransferase Dnmt1. In vivo, Dnmt1 interacts with Uhrf1, which recognizes hemimethylated CpGs. Recently, we reported that Uhrf1-catalyzed K18- and K23-ubiquitinated histone H3 binds to the N-terminal region (the replication focus targeting sequence, RFTS) of Dnmt1 to stimulate its methyltransferase activity. However, it is not yet fully understood how ubiquitinated histone H3 stimulates Dnmt1 activity. Here, we show that monoubiquitinated histone H3 stimulates Dnmt1 activity toward DNA with multiple hemimethylated CpGs but not toward DNA with only a single hemimethylated CpG, suggesting an influence of ubiquitination on the processivity of Dnmt1. The Dnmt1 activity stimulated by monoubiquitinated histone H3 was additively enhanced by the Uhrf1 SRA domain, which also binds to RFTS. Thus, Dnmt1 activity is regulated by catalysis (ubiquitination)-dependent and -independent functions of Uhrf1.


Assuntos
DNA (Citosina-5-)-Metiltransferase 1/genética , DNA (Citosina-5-)-Metiltransferase 1/metabolismo , Histonas/metabolismo , Proteínas Estimuladoras de Ligação a CCAAT/genética , DNA/metabolismo , DNA (Citosina-5-)-Metiltransferases/genética , DNA (Citosina-5-)-Metiltransferases/metabolismo , Metilação de DNA , Replicação do DNA , Histonas/fisiologia , Humanos , Ligação Proteica , Ubiquitina/metabolismo , Ubiquitina-Proteína Ligases/metabolismo , Ubiquitinação
6.
Artif Intell Med ; 99: 101690, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31606112

RESUMO

In order to gain insight into oligogenic disorders, understanding those involving bi-locus variant combinations appears to be key. In prior work, we showed that features at multiple biological scales can already be used to discriminate among two types, i.e. disorders involving true digenic and modifier combinations. The current study expands this machine learning work towards dual molecular diagnosis cases, providing a classifier able to effectively distinguish between these three types. To reach this goal and gain an in-depth understanding of the decision process, game theory and tree decomposition techniques are applied to random forest predictors to investigate the relevance of feature combinations in the prediction. A machine learning model with high discrimination capabilities was developed, effectively differentiating the three classes in a biologically meaningful manner. Combining prediction interpretation and statistical analysis, we propose a biologically meaningful characterization of each class relying on specific feature strengths. Figuring out how biological characteristics shift samples towards one of three classes provides clinically relevant insight into the underlying biological processes as well as the disease itself.


Assuntos
Teoria dos Jogos , Predisposição Genética para Doença/genética , Aprendizado de Máquina , Herança Multifatorial/genética , Árvores de Decisões , Humanos
7.
Nucleic Acids Res ; 47(W1): W93-W98, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31147699

RESUMO

A tremendous amount of DNA sequencing data is being produced around the world with the ambition to capture in more detail the mechanisms underlying human diseases. While numerous bioinformatics tools exist that allow the discovery of causal variants in Mendelian diseases, little to no support is provided to do the same for variant combinations, an essential task for the discovery of the causes of oligogenic diseases. ORVAL (the Oligogenic Resource for Variant AnaLysis), which is presented here, provides an answer to this problem by focusing on generating networks of candidate pathogenic variant combinations in gene pairs, as opposed to isolated variants in unique genes. This online platform integrates innovative machine learning methods for combinatorial variant pathogenicity prediction with visualization techniques, offering several interactive and exploratory tools, such as pathogenic gene and protein interaction networks, a ranking of pathogenic gene pairs, as well as visual mappings of the cellular location and pathway information. ORVAL is the first web-based exploration platform dedicated to identifying networks of candidate pathogenic variant combinations with the sole ambition to help in uncovering oligogenic causes for patients that cannot rely on the classical disease analysis tools. ORVAL is available at https://orval.ibsquare.be.


Assuntos
Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Herança Multifatorial/genética , Software , Biologia Computacional , Doenças Genéticas Inatas/diagnóstico , Humanos , Mutação/genética , Análise de Sequência de DNA
8.
Proc Natl Acad Sci U S A ; 116(24): 11878-11887, 2019 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-31127050

RESUMO

Notwithstanding important advances in the context of single-variant pathogenicity identification, novel breakthroughs in discerning the origins of many rare diseases require methods able to identify more complex genetic models. We present here the Variant Combinations Pathogenicity Predictor (VarCoPP), a machine-learning approach that identifies pathogenic variant combinations in gene pairs (called digenic or bilocus variant combinations). We show that the results produced by this method are highly accurate and precise, an efficacy that is endorsed when validating the method on recently published independent disease-causing data. Confidence labels of 95% and 99% are identified, representing the probability of a bilocus combination being a true pathogenic result, providing geneticists with rational markers to evaluate the most relevant pathogenic combinations and limit the search space and time. Finally, the VarCoPP has been designed to act as an interpretable method that can provide explanations on why a bilocus combination is predicted as pathogenic and which biological information is important for that prediction. This work provides an important step toward the genetic understanding of rare diseases, paving the way to clinical knowledge and improved patient care.


Assuntos
Predisposição Genética para Doença/genética , Variação Genética/genética , Doenças Raras/genética , Marcadores Genéticos/genética , Humanos
9.
Stem Cells Dev ; 24(22): 2674-86, 2015 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-26192274

RESUMO

Facioscapulohumeral muscular dystrophy (FSHD) is associated with an activation of the double homeobox 4 (DUX4) gene, which we previously identified within the D4Z4 repeated elements in the 4q35 subtelomeric region. The pathological DUX4 mRNA is derived from the most distal D4Z4 unit and extends unexpectedly within the flanking pLAM region, which provides an intron and polyadenylation signal. The conditions that are required to develop FSHD are a permissive allele providing the polyadenylation signal and hypomethylation of the D4Z4 repeat array compared with the healthy muscle. The DUX4 protein is a 52-kDa transcription factor that initiates a large gene deregulation cascade leading to muscle atrophy, inflammation, differentiation defects, and oxidative stress, which are the key features of FSHD. DUX4 is a retrogene that is normally expressed in germline cells and is submitted to repeat-induced silencing in adult tissues. Since DUX4 mRNAs have been detected in human embryonic and induced pluripotent stem cells, we investigated whether they could also be expressed in human mesenchymal stromal cells (hMSCs). We found that DUX4 mRNAs were induced during the differentiation of hMSCs into osteoblasts and that this process involved DUX4 and new longer protein forms (58 and 70 kDa). A DUX4 mRNA with a more distant 5' start site was characterized that presented a 60-codon reading frame extension and encoded the 58-kDa protein. Transfections of hMSCs with an antisense oligonucleotide targeting DUX4 mRNAs decreased both the 52- and 58-kDa protein levels and confirmed their identity. Gain- and loss-of-function experiments in hMSCs suggested these DUX4 proteins had opposite roles in osteogenic differentiation as evidenced by the alkaline phosphatase activity and calcium deposition. Differentiation was delayed by the 58-kDa DUX4 expression and it was increased by 52-kDa DUX4. These data indicate a role for DUX4 protein forms in the osteogenic differentiation of hMSCs.


Assuntos
Diferenciação Celular , Proteínas de Homeodomínio/genética , Células-Tronco Mesenquimais/citologia , Osteogênese , Fosfatase Alcalina/metabolismo , Cálcio/metabolismo , Linhagem Celular , Células Cultivadas , Proteínas de Homeodomínio/metabolismo , Humanos , Células-Tronco Mesenquimais/metabolismo , Osteoblastos/citologia , Osteoblastos/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA