Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
1.
Clin Pharmacol Ther ; 115(4): 786-794, 2024 04.
Article in English | MEDLINE | ID: mdl-38140747

ABSTRACT

Natural language processing (NLP) is a branch of artificial intelligence, which combines computational linguistics, machine learning, and deep learning models to process human language. Although there is a surge in NLP usage across various industries in recent years, NLP has not been widely evaluated and utilized to support drug development. To demonstrate how advanced NLP can expedite the extraction and analyses of information to help address clinical pharmacology questions, inform clinical trial designs, and support drug development, three use cases are described in this article: (1) dose optimization strategy in oncology, (2) common covariates on pharmacokinetic (PK) parameters in oncology, and (3) physiologically-based PK (PBPK) analyses for regulatory review and product label. The NLP workflow includes (1) preparation of source files, (2) NLP model building, and (3) automation of data extraction. The Clinical Pharmacology and Biopharmaceutics Summary Basis of Approval (SBA) documents, US package inserts (USPI), and approval letters from the US Food and Drug Administration (FDA) were used as our source data. As demonstrated in the three example use cases, advanced NLP can expedite the extraction and analyses of large amounts of information from regulatory review documents to help address important clinical pharmacology questions. Although this has not been adopted widely, integrating advanced NLP into the clinical pharmacology workflow can increase efficiency in extracting impactful information to advance drug development.


Subject(s)
Natural Language Processing , Pharmacology, Clinical , Humans , Artificial Intelligence , Electronic Health Records , Machine Learning
2.
J Bioinform Comput Biol ; 7(2): 373-88, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19340921

ABSTRACT

Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based-homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.


Subject(s)
Chromosome Mapping/methods , Cluster Analysis , Genome/genetics , RNA, Untranslated/genetics , Sequence Analysis, RNA/methods
3.
Nucleic Acids Res ; 35(14): 4809-19, 2007.
Article in English | MEDLINE | ID: mdl-17621584

ABSTRACT

We applied a computational pipeline based on comparative genomics to bacteria, and identified 22 novel candidate RNA motifs. We predicted six to be riboswitches, which are mRNA elements that regulate gene expression on binding a specific metabolite. In separate studies, we confirmed that two of these are novel riboswitches. Three other riboswitch candidates are upstream of either a putative transporter gene in the order Lactobacillales, citric acid cycle genes in Burkholderiales or molybdenum cofactor biosynthesis genes in several phyla. The remaining riboswitch candidate, the widespread Genes for the Environment, for Membranes and for Motility (GEMM) motif, is associated with genes important for natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. Among the other motifs, one has a genetic distribution similar to a previously published candidate riboswitch, ykkC/yxkD, but has a different structure. We identified possible non-coding RNAs in five phyla, and several additional cis-regulatory RNAs, including one in epsilon-proteobacteria (upstream of purD, involved in purine biosynthesis), and one in Cyanobacteria (within an ATP synthase operon). These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered.


Subject(s)
Genomics/methods , RNA, Bacterial/chemistry , Regulatory Sequences, Ribonucleic Acid , Sequence Analysis, RNA/methods , Base Sequence , Computational Biology , Consensus Sequence , Genome, Bacterial , Molecular Sequence Data , Nucleic Acid Conformation , RNA, Messenger/chemistry , RNA, Untranslated/chemistry
4.
Sci Transl Med ; 11(489)2019 04 24.
Article in English | MEDLINE | ID: mdl-31019026

ABSTRACT

By informing timely targeted treatments, rapid whole-genome sequencing can improve the outcomes of seriously ill children with genetic diseases, particularly infants in neonatal and pediatric intensive care units (ICUs). The need for highly qualified professionals to decipher results, however, precludes widespread implementation. We describe a platform for population-scale, provisional diagnosis of genetic diseases with automated phenotyping and interpretation. Genome sequencing was expedited by bead-based genome library preparation directly from blood samples and sequencing of paired 100-nt reads in 15.5 hours. Clinical natural language processing (CNLP) automatically extracted children's deep phenomes from electronic health records with 80% precision and 93% recall. In 101 children with 105 genetic diseases, a mean of 4.3 CNLP-extracted phenotypic features matched the expected phenotypic features of those diseases, compared with a match of 0.9 phenotypic features used in manual interpretation. We automated provisional diagnosis by combining the ranking of the similarity of a patient's CNLP phenome with respect to the expected phenotypic features of all genetic diseases, together with the ranking of the pathogenicity of all of the patient's genomic variants. Automated, retrospective diagnoses concurred well with expert manual interpretation (97% recall and 99% precision in 95 children with 97 genetic diseases). Prospectively, our platform correctly diagnosed three of seven seriously ill ICU infants (100% precision and recall) with a mean time saving of 22:19 hours. In each case, the diagnosis affected treatment. Genome sequencing with automated phenotyping and interpretation in a median of 20:10 hours may increase adoption in ICUs and, thereby, timely implementation of precise treatments.


Subject(s)
Diabetic Ketoacidosis/genetics , Genomics/methods , Electronic Health Records , Female , Humans , Intensive Care Units/statistics & numerical data , Natural Language Processing , Retrospective Studies
5.
Appl Environ Microbiol ; 72(8): 5239-45, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16885271

ABSTRACT

Short nucleotide sequence repetitions in DNA can provide selective benefits and also can be a source of genetic instability arising from deletions guided by pairing between misaligned strands. These findings raise the question of how the frequency of deletion mutations is influenced by the length of sequence repetitions and by the distance between them. An experimental approach to this question was presented by the heat-sensitive phenotype conferred by pcaG1102, a 30-bp deletion in one of the structural genes for Acinetobacter baylyi protocatechuate 3,4-dioxygenase, which is required for growth with quinate. The original pcaG1102 deletion appears to have been guided by pairing between slipped DNA strands from nearby repeated sequences in wild-type pcaG. Placement of an in-phase termination codon between the repeated sequences in pcaG prevents growth with quinate and permits selection of sequence-guided deletions that excise the codon and permit quinate to be used as a growth substrate at room temperature. Natural transformation facilitated introduction of 68 different variants of the wild-type repeat structure within pcaG into the A. baylyi chromosome, and the frequency of deletion between the repetitions was determined with a novel method, precision plating. The deletion frequency increases with repeat length, decreases with the distance between repeats, and requires a minimum amount of similarity to occur at measurable rates. Deletions occurred in a recA-deficient background. Their frequency was unaffected by deficiencies in mutS and was increased by inactivation of recG.


Subject(s)
Acinetobacter/genetics , DNA, Bacterial/genetics , DNA, Single-Stranded/genetics , Mutation , Sequence Deletion , Acinetobacter/enzymology , Acinetobacter/growth & development , Base Sequence , Culture Media , DNA, Bacterial/metabolism , DNA, Single-Stranded/metabolism , Escherichia coli/genetics , Plasmids/genetics , Protocatechuate-3,4-Dioxygenase/genetics , Protocatechuate-3,4-Dioxygenase/metabolism , Repetitive Sequences, Nucleic Acid/genetics , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL