Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
Emerg Infect Dis ; 30(3): 560-563, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38407162

RESUMO

Analysis of genome sequencing data from >100,000 genomes of Mycobacterium tuberculosis complex using TB-Annotator software revealed a previously unknown lineage, proposed name L10, in central Africa. Phylogenetic reconstruction suggests L10 could represent a missing link in the evolutionary and geographic migration histories of M. africanum.


Assuntos
Evolução Biológica , Mycobacterium , Filogenia , Mycobacterium/genética , Software , África Central/epidemiologia
2.
J Imaging ; 10(1)2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38249003

RESUMO

Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to their unique characteristics such as writing style variations, overlapped characters and words, and marginal annotations. The objective of this survey paper is to summarize research on handwritten text documents and provide research directions toward effectively transcribing this French dataset. To achieve this goal, we presented a brief survey of several modern and historical HTR offline systems of different international languages, and the top state-of-the-art contributions reported of the French language specifically. The survey classifies the HTR systems based on techniques employed, datasets used, publication years, and the level of recognition. Furthermore, an analysis of the systems' accuracies is presented, highlighting the best-performing approach. We have also showcased the performance of some HTR commercial systems. In addition, this paper presents a summarization of the HTR datasets that publicly available, especially those identified as benchmark datasets in the International Conference on Document Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting Recognition (ICFHR) competitions. This paper, therefore, presents updated state-of-the-art research in HTR and highlights new directions in the research field.

3.
Tuberculosis (Edinb) ; 143S: 102374, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-38012920

RESUMO

The daily increasing sequencing of Mycobacterium tuberculosis has made it possible to establish an advanced phylogeny of this bacterium. It currently includes 9 lineages mainly affecting humans, completed by animal lineages, which form the Mycobacterium tuberculosis complex. Inherited from various historical approaches, this phylogeny is now based on Single Nucleotide Polymorphisms (SNPs), of which updates are frequently proposed. We present here evidence that the task needs refinements: some lineages have currently suboptimal defining SNPs, and many sublineages still need to be named and characterized. These findings are based on a new tool specifically designed to index the entire existing sequencing data. In this article, we focus on lineages 4.5, 4.7, 6 and 7. We take the opportunity to present some evidence that TB-annotator shows strong relevance, identifying well supported sublineages, as well as good global agreement with previous findings.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Animais , Mycobacterium tuberculosis/genética , Tuberculose/diagnóstico , Tuberculose/genética , Tuberculose/microbiologia , Filogenia , Polimorfismo de Nucleotídeo Único , Genótipo
4.
Tuberculosis (Edinb) ; 143S: 102376, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-38012933

RESUMO

Mycobacterium tuberculosis complex (MTBC) has a population structure consisting of 9 human and animal lineages. The genomic diversity within these lineages is a pathogenesis factor that affects virulence, transmissibility, host response, and antibiotic resistance. Hence it is important to develop improved information systems for tracking and understanding the spreading and evolution of genomes. We present results obtained thanks to a new informatics platform for computational biology of MTBC, that uses a convenience sample from public/private SRAs, designated as TB-Annotator. Version 1 was a first interactive graphic-based web tool based on 15,901 representative genomes. Version 2, still interactive, is a more sophisticated database, developed using the Snakemake Workflow Management System (WMS) that allows an unsupervised global and scalable analysis of the content of the USA National Center for Biotechnology Information Short Read Archives database. This platform analyzes nucleotide variants, the presence/absence of genes, known regions of difference and detect new deletions, the insertion sites of mobile genetic elements, and allows phylogenetic trees to be built, imported in a graphical interface and interactively analyzed between the data and the tree. The objective of TB-Annotator is triple: detect recent epidemiological links, reconstruct distant phylogeographical histories as well as perform more complex phenotypic/genotypic Genome-Wide Association Studies (GWAS). In this paper, we compare the various taxonomic SNPs-based labels and hierarchies previously described in recent reference papers for L1, and present a comparative analysis that allows identification of alias and thus provides the basis of a future unifying naming scheme for L1 sublineages. We present a global phylogenetic tree built with RAxML-NG, and one on L2; at the time of writing, we characterized about 200 sublineages, with many new ones; a detail tree for Modern L2 and a hierarchical scheme allowing to facilitate L2 lineage assignment are also presented.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Animais , Mycobacterium tuberculosis/genética , Tuberculose/diagnóstico , Tuberculose/genética , Tuberculose/epidemiologia , Filogenia , Estudo de Associação Genômica Ampla , Biologia Computacional
5.
PLoS Negl Trop Dis ; 17(10): e0011619, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37824575

RESUMO

In this article, we provide an in-depth analysis on the drug-resistance phenotypic characteristics of a cohort of 325 tuberculosis and characterize by Whole Genome Sequencing 24 isolates from Nigeria belonging to L4, L5 and L6. Our results suggest an alarming rate of drug-resistance of the L4.6.2.2 Mycobacterium tuberculosis complex (MTBC) lineage and a high diversity of L5. We compiled these new Sequence Read Archives (SRAs) to previously published ones from available Bioprojects run in Nigeria. We performed RAxML phylogenetic reconstructions of larger samples that include public NCBI SRAs from some neighboring countries (Cameroon, Ghana). To confront phylogenetic reconstruction to metadata, we used a new proprietary database named TB-Annotator. We show that L5 genomes in Northern Nigeria belong to new clades as the ones described until now and allow an update of the taxonomy of L5. In addition, we describe the L4.6.2.2 lineage in Nigeria, Cameroon and Ghana. We provide computations on the likely divergence time of L4.6.2.2 and suggest a new hypothesis concerning its origin. Finally we provide a short overview on M. bovis diversity in Nigeria. This study constitutes a baseline knowledge on the global genomic diversity, phylogeography and phylodynamics of MTBC in Nigeria, as well as on the natural history of this largely ignored but densely populated country of Africa. These results highlight the need of sequencing additional MTBC genomes in Nigeria and more generally in West-Africa, both for public health and for academic reasons. The likelihood of replacement of L5-L6 by L4.6.2.2 isolates, leave potentially little time to gather historical knowledge informative on the ancient history of tuberculosis in West-Africa.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Camarões , Genótipo , Gana/epidemiologia , Nigéria , Filogenia , Tuberculose/epidemiologia , Tuberculose/microbiologia
6.
Sensors (Basel) ; 23(16)2023 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-37631575

RESUMO

With the proliferation of IoT devices, ensuring the security and privacy of these devices and their associated data has become a critical challenge. In this paper, we propose a federated sampling and lightweight intrusion-detection system for IoT networks that use K-meansfor sampling network traffic and identifying anomalies in a semi-supervised way. The system is designed to preserve data privacy by performing local clustering on each device and sharing only summary statistics with a central aggregator. The proposed system is particularly suitable for resource-constrained IoT devices such as sensors with limited computational and storage capabilities. We evaluate the system's performance using the publicly available NSL-KDD dataset. Our experiments and simulations demonstrate the effectiveness and efficiency of the proposed intrusion-detection system, highlighting the trade-offs between precision and recall when sharing statistics between workers and the coordinator. Notably, our experiments show that the proposed federated IDS can increase the true-positive rate up to 10% when the workers and the coordinator collaborate.

7.
Sci Rep ; 13(1): 11368, 2023 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-37443186

RESUMO

Bacterial strain-types in the Mycobacterium tuberculosis complex underlie tuberculosis disease, and have been associated with drug resistance, transmissibility, virulence, and host-pathogen interactions. Spoligotyping was developed as a molecular genotyping technique used to determine strain-types, though recent advances in whole genome sequencing (WGS) technology have led to their characterization using SNP-based sub-lineage nomenclature. Notwithstanding, spoligotyping remains an important tool and there is a need to study the congruence between spoligotyping-based and SNP-based sub-lineage assignation. To achieve this, an in silico spoligotype prediction method ("Spolpred2") was developed and integrated into TB-Profiler. Lineage and spoligotype predictions were generated for > 28 k isolates and the overlap between strain-types was characterized. Major spoligotype families detected were Beijing (25.6%), T (18.6%), LAM (13.1%), CAS (9.4%), and EAI (8.3%), and these broadly followed known geographic distributions. Most spoligotypes were perfectly correlated with the main MTBC lineages (L1-L7, plus animal). Conversely, at lower levels of the sub-lineage system, the relationship breaks down, with only 65% of spoligotypes being perfectly associated with a sub-lineage at the second or subsequent levels of the hierarchy. Our work supports the use of spoligotyping (membrane or WGS-based) for low-resolution surveillance, and WGS or SNP-based systems for higher-resolution studies.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Tuberculose/microbiologia , Técnicas de Tipagem Bacteriana , Resistência a Medicamentos , Pequim , Genótipo
8.
Entropy (Basel) ; 25(3)2023 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-36981334

RESUMO

We investigate the extent to which a two-level quantum system subjected to an external time-dependent drive can be characterized by supervised learning. We apply this approach to the case of bang-bang control and the estimation of the offset and the final distance to a given target state. For any control protocol, the goal is to find the mapping between the offset and the distance. This mapping is interpolated using a neural network. The estimate is global in the sense that no a priori knowledge is required on the relation to be determined. Different neural network algorithms are tested on a series of data sets. We show that the mapping can be reproduced with very high precision in the direct case when the offset is known, while obstacles appear in the indirect case starting from the distance to the target. We point out the limits of the estimation procedure with respect to the properties of the mapping to be interpolated. We discuss the physical relevance of the different results.

9.
Genes (Basel) ; 13(12)2022 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-36553596

RESUMO

The spoligotype is a graphical description of the CRISPR locus present in Mycobacterium tuberculosis, which has the particularity of having only 68 possible spacers. This spoligotype, which can be easily obtained either in vitro or in silico, allows to have a summary information of lineage or even antibiotic resistance (when known to be associated to a particular cluster) at a lower cost. The objective of this article is to show that this representation is richer than it seems, and that it is under-exploited until now. We first recall an original way to represent these spoligotypes as points in the plane, allowing to highlight possible sub-lineages, particularities in the animal strains, etc. This graphical representation shows clusters and a skeleton in the form of a graph, which led us to see these spoligotypes as vertices of an unconnected directed graph. In this paper, we therefore propose to exploit in detail the description of the variety of spoligotypes using a graph, and we show to what extent such a description can be informative.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Animais , Tuberculose/genética , Tuberculose/microbiologia , Mycobacterium tuberculosis/genética
10.
Neural Comput Appl ; 34(12): 10117-10132, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35250179

RESUMO

In some countries such as France, the number of operations assisted by firefighters has shown an almost linear increase over the years, contrary to their resource capacity. For this reason, predicting the number of interventions has become a necessity. Initially, time series models were developed with several types of qualitative and quantitative features, including the alert level of the bulletins, to predict the operational load. We realized that interventions related to human activities are quite predictable. However, the recognition of interventions due to rare events such as storms or floods needs more than quantitative meteorological data to be identified, since there are almost always zero cases. Thus, this work proposes the application of natural language processing techniques, namely long short-term memory, convolutional neural networks, FlauBERT, and CamemBERT to extract features from the texts of weather bulletins in order to recognize periods with peak interventions, where the intense workload of firefighters is caused by rare events. Four categories identified as Emergency Person Rescue, Total Person Rescue, interventions related to Heating, and Storm/Flood were our targets for the multilabel classification models developed. The results showed a remarkable accuracy of 80%, 86%, 92%, and 86% for Emergency Rescue People, Total Rescue People, Heating, and Storm/Flood, respectively.

12.
Cancer ; 128(3): 519-528, 2022 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-34605020

RESUMO

BACKGROUND: No study has focused on the economic burden in non-Hodgkin lymphoma (NHL) survivors, even though this knowledge is essential. This study reports on health care resource use and associated health care costs as well as related factors in a series of 1671 French long-term NHL survivors. METHODS: Health care costs were measured from the payer perspective. Only direct medical costs (medical consultations, outpatient treatments, hospitalizations, and medical transport) in the past 12 months were included (reference year 2015). Multiple linear regression was used to search for explanatory factors of health care costs. RESULTS: In total, 1100 survivors (66%) reported having used at least 1 health care resource, and 867 (52%) reported having used at least 1 outpatient treatment. After the authors accounted for missing data, the mean health care cost was estimated at €702 ± €2221. Hospitalizations and outpatient treatments were the main cost drivers. Sensitivity analyses confirmed the robustness of the results. For the 1100 survivors who reported using at least 1 health care resource, the mean health care cost was €1067 ± €2268. Several factors demonstrated statistically significant relationships with health care costs. For instance, cardiovascular disorders increased costs by 66% ± 16%. In contrast, rituximab or autologous stem cell transplantation as initial therapy had no effect on health care costs. CONCLUSIONS: The consideration of economic constraints in health care is now a reality. This retrospective study reports on a better understanding of health care resource use and associated health care costs as well as related factors. It may help health care professionals in their ongoing efforts to design person-centered health care pathways.


Assuntos
Transplante de Células-Tronco Hematopoéticas , Linfoma não Hodgkin , Linfoma , Estudos Transversais , Estresse Financeiro , Custos de Cuidados de Saúde , Humanos , Linfoma não Hodgkin/terapia , Estudos Retrospectivos , Sobreviventes , Transplante Autólogo
13.
Appl Soft Comput ; 109: 107561, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34899108

RESUMO

When ambulances' turnaround time (TT) in emergency departments is prolonged, it not only affects the victim severely but also causes unavailability of resources in emergency medical services (EMSs) and, consequently, leaves a locality unprotected. This problem may worsen with abnormal situations, e.g., the current coronavirus disease 2019 (COVID-19) pandemic. Taking this into consideration, this paper presents a first study on the COVID-19 impact on ambulances' TT by analyzing historical data from the Departmental Fire and Rescue Service of the Doubs (SDIS 25), in France, for three hospitals. Because the TTs of SDIS 25 ambulances increased, this paper also calculated and analyzed the number of breakdowns in services, which augmented due to shortage of ambulances that return on service in time. It is, therefore, vital to have a decision-support tool to better reallocate resources by knowing the time EMSs ambulances and personnel will be in use. Thus, this paper proposes a novel two-stage methodology based on machine learning (ML) models to forecast the TT of each ambulance in a given time and hospital. The first stage uses a multivariate model of regularly spaced time series to predict the average TT (AvTT) per hour, which considers temporal variables and external ones (e.g., COVID-19 statistics, weather data). The second stage utilizes a multivariate irregularly spaced time series model, which considers temporal variables of each ambulance departure, type of intervention, external variables, and the previously predicted AvTT as inputs. Four state-of-the-art ML models were considered in this paper, namely, Light Gradient Boosted Machine, Multilayer Perceptron, Long Short-Term Memory, and Prophet. As shown in the results, the proposed methodology provided remarkable results for practical purposes. The AvTT accuracies obtained for the three hospitals were 90.16%, 97.02%, and 93.09%. And the TT accuracies were 74.42%, 86.63%, and 76.67%, all with an error margin of ± 10 min.

14.
PLoS Comput Biol ; 17(3): e1008500, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33667225

RESUMO

Mycobacterium tuberculosis complex (MTC) CRISPR locus diversity has long been studied solely investigating the presence/absence of a known set of spacers. Unveiling the genetic mechanisms of its evolution requires a more exhaustive reconstruction in a large amount of representative strains. In this article, we point out and resolve, with a new pipeline, the problem of CRISPR reconstruction based directly on short read sequences in M. tuberculosis. We first show that the process we set up, that we coin as "CRISPRbuilder-TB" (https://github.com/cguyeux/CRISPRbuilder-TB), allows an efficient reconstruction of simulated or real CRISPRs, even when including complex evolutionary steps like the insertions of mobile elements. Compared to more generalist tools, the whole process is much more precise and robust, and requires only minimal manual investigation. Second, we show that more than 1/3 of the currently complete genomes available for this complex in the public databases contain largely erroneous CRISPR loci. Third, we highlight how both the classical experimental in vitro approach and the basic in silico spoligotyping provided by existing analytic tools miss a whole diversity of this locus in MTC, by not capturing duplications, spacer and direct repeats variants, and IS6110 insertion locations. This description is extended in a second article that describes MTC-CRISPR diversity and suggests general rules for its evolution. This work opens perspectives for an in-depth exploration of M. tuberculosis CRISPR loci diversity and of mechanisms involved in its evolution and its functionality, as well as its adaptation to other CRISPR locus-harboring bacterial species.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Mycobacterium tuberculosis/genética , Tuberculose/microbiologia , Genes Bacterianos
15.
Environ Microbiol ; 23(3): 1594-1607, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33393164

RESUMO

Secreted proteins are key players in fungal physiology and cell protection against external stressing agents and antifungals. Oak stress-induced protein 1 (OSIP1) is a fungal-specific protein with unknown function. By using Podospora anserina and Phanerochaete chrysosporium as models, we combined both in vivo functional approaches and biophysical characterization of OSIP1 recombinant protein. The P. anserina OSIP1Δ mutant showed an increased sensitivity to the antifungal caspofungin compared to the wild type. This correlated with the production of a weakened extracellular exopolysaccharide/protein matrix (ECM). Since the recombinant OSIP1 from P. chrysosporium self-assembled as fibers and was capable of gelation, it is likely that OSIP1 is linked to ECM formation that acts as a physical barrier preventing drug toxicity. Moreover, compared to the wild type, the OSIP1Δ mutant was more sensitive to oak extractives including chaotropic phenols and benzenes. It exhibited a strongly modified secretome pattern and an increased production of proteins associated to the cell-wall integrity signalling pathway, when grown on oak sawdust. This demonstrates that OSIP1 has also an important role in fungal resistance to extractive-induced stress.


Assuntos
Phanerochaete , Podospora , Antifúngicos/farmacologia , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Phanerochaete/metabolismo , Transdução de Sinais
16.
BMC Genomics ; 21(1): 841, 2020 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-33256602

RESUMO

BACKGROUND: Diversity of the CRISPR locus of Mycobacterium tuberculosis complex has been studied since 1997 for molecular epidemiology purposes. By targeting solely the 43 spacers present in the two first sequenced genomes (H37Rv and BCG), it gave a biased idea of CRISPR diversity and ignored diversity in the neighbouring cas-genes. RESULTS: We set up tailored pipelines to explore the diversity of CRISPR-cas locus in Short Reads. We analyzed data from a representative set of 198 clinical isolates as evidenced by well-characterized SNPs. We found a relatively low diversity in terms of spacers: we recovered only the 68 spacers that had been described in 2000. We found no partial or global inversions in the sequences, letting always the Direct Variant Repeats (DVR) in the same order. In contrast, we found an unexpected diversity in the form of: SNPs in spacers and in Direct Repeats, duplications of various length, and insertions at various locations of the IS6110 insertion sequence, as well as blocks of DVR deletions. The diversity was in part specific to lineages. When reconstructing evolutionary steps of the locus, we found no evidence for SNP reversal. DVR deletions were linked to recombination between IS6110 insertions or between Direct Repeats. CONCLUSION: This work definitively shows that CRISPR locus of M. tuberculosis did not evolve by classical CRISPR adaptation (incorporation of new spacers) since the last most recent common ancestor of virulent lineages. The evolutionary mechanisms that we discovered could be involved in bacterial adaptation but in a way that remains to be identified.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Mycobacterium tuberculosis , Sequência de Bases , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Elementos de DNA Transponíveis , Epidemiologia Molecular , Mycobacterium tuberculosis/genética
17.
Mol Phylogenet Evol ; 151: 106906, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32653553

RESUMO

For decades coffees were associated with the genus Coffea. In 2011, the closely related genus Psilanthus was subsumed into Coffea. However, results obtained in 2017-based on 28,800 nuclear SNPs-indicated that there is not substantial phylogenetic support for this incorporation. In addition, a recent study of 16 plastid full-genome sequences highlighted an incongruous placement of Coffea canephora (Robusta coffee) between maternal and nuclear trees. In this study, similar global features of the plastid genomes of Psilanthus and Coffea are observed. In agreement with morphological and physiological traits, the nuclear phylogenetic tree clearly separates Psilanthus from Coffea (with exception to C. rhamnifolia, closer to Psilanthus than to Coffea). In contrast, the maternal molecular tree was incongruent with both morphological and nuclear differentiation, with four main clades observed, two of which include both Psilanthus and Coffea species, and two with either Psilanthus or Coffea species. Interestingly, Coffea and Psilanthus taxa sampled in West and Central Africa are members of the same group. Several mechanisms such as the retention of ancestral polymorphisms due to incomplete lineage sorting, hybridization leading to homoploidy (without chromosome doubling) and alloploidy (for C. arabica) are involved in the evolutionary history of the coffee species. While sharing similar morphological characteristics, the genetic relationships within C. canephora have shown that some populations are well differentiated and genetically isolated. Given the position of its closely-related species, we may also consider C. canephora to be undergoing a long process of speciation with an intermediate step of (sub-)speciation.


Assuntos
Núcleo Celular/genética , Coffea/genética , Evolução Molecular , Genomas de Plastídeos , Polimorfismo de Nucleotídeo Único/genética , Análise por Conglomerados , Filogenia , Especificidade da Espécie
18.
PLoS One ; 15(4): e0232295, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32353023

RESUMO

In Rubiaceae phylogenetics, the number of markers often proved a limitation with authors failing to provide well-supported trees at tribal and generic levels. A robust phylogeny is a prerequisite to study the evolutionary patterns of traits at different taxonomic levels. Advances in next-generation sequencing technologies have revolutionized biology by providing, at reduced cost, huge amounts of data for an increased number of species. Due to their highly conserved structure, generally recombination-free, and mostly uniparental inheritance, chloroplast DNA sequences have long been used as choice markers for plant phylogeny reconstruction. The main objectives of this study are: 1) to gain insight in chloroplast genome evolution in the Rubiaceae (Ixoroideae) through efficient methodology for de novo assembly of plastid genomes; and, 2) to test the efficiency of mining SNPs in the nuclear genome of Ixoroideae based on the use of a coffee reference genome to produce well-supported nuclear trees. We assembled whole chloroplast genome sequences for 27 species of the Rubiaceae subfamily Ixoroideae using next-generation sequences. Analysis of the plastid genome structure reveals a relatively good conservation of gene content and order. Generally, low variation was observed between taxa in the boundary regions with the exception of the inverted repeat at both the large and short single copy junctions for some taxa. An average of 79% of the SNP determined in the Coffea genus are transferable to Ixoroideae, with variation ranging from 35% to 96%. In general, the plastid and the nuclear genome phylogenies are congruent with each other. They are well-resolved with well-supported branches. Generally, the tribes form well-identified clades but the tribe Sherbournieae is shown to be polyphyletic. The results are discussed relative to the methodology used and the chloroplast genome features in Rubiaceae and compared to previous Rubiaceae phylogenies.


Assuntos
Cloroplastos/genética , DNA de Cloroplastos/genética , Genoma de Cloroplastos/genética , Genoma de Planta/genética , Rubiaceae/genética , Coffea/genética , Evolução Molecular , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos
19.
J Integr Bioinform ; 16(4)2019 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-31860470

RESUMO

In this article, we propose a semi-automated method to rebuild genome ancestors of chloroplasts by taking into account gene duplication. Two methods have been used in order to achieve this work: a naked eye investigation using homemade scripts, whose results are considered as a basis of knowledge, and a dynamic programming based approach similar to Needleman-Wunsch. The latter fundamentally uses the Gestalt pattern matching method of sequence matcher to evaluate the occurrences probability of each gene in the last common ancestor of two given genomes. The two approaches have been applied on chloroplastic genomes from Apiales, Asterales, and Fabids orders, the latter belonging to Pentapetalae group. We found that Apiales species do not undergo indels, while they occur in the Asterales and Fabids orders. A series of experiments was then carried out to extensively verify our findings by comparing the obtained ancestral reconstruction results with the latest released approach called MLGO (Maximum Likelihood for Gene-Order analysis).


Assuntos
Evolução Molecular , Genoma de Cloroplastos , Filogenia , Recombinação Genética , Rosaceae/genética
20.
Comput Biol Med ; 114: 103439, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31550555

RESUMO

This paper presents SpCLUST, a new C++ package that takes a list of sequences as input, aligns them with MUSCLE, computes their similarity matrix in parallel and then performs the clustering. SpCLUST extends a previously released software by integrating additional scoring matrices which enables it to cover the clustering of amino-acid sequences. The similarity matrix is now computed in parallel according to the master/slave distributed architecture, using MPI. Performance analysis, realized on two real datasets of 100 nucleotide sequences and 1049 amino-acids ones, show that the resulting library substantially outperforms the original Python package. The proposed package was also intensively evaluated on simulated and real genomic and protein data sets. The clustering results were compared to the most known traditional tools, such as UCLUST, CD-HIT and DNACLUST. The comparison showed that SpCLUST outperforms the other tools when clustering divergent sequences, and contrary to the others, it does not require any user intervention or prior knowledge about the input sequences.


Assuntos
Análise por Conglomerados , DNA , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , DNA/classificação , DNA/genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA