RESUMEN
BACKGROUND: The reconstruction of the evolutionary history of organisms has been greatly influenced by the advent of molecular techniques, leading to a significant increase in studies utilizing genomic data from different species. However, the lack of standardization in gene nomenclature poses a challenge in database searches and evolutionary analyses, impacting the accuracy of results obtained. RESULTS: To address this issue, a Python class for standardizing gene nomenclatures, SynGenes, has been developed. It automatically recognizes and converts different nomenclature variations into a standardized form, facilitating comprehensive and accurate searches. Additionally, SynGenes offers a web form for individual searches using different names associated with the same gene. The SynGenes database contains a total of 545 gene name variations for mitochondrial and 2485 for chloroplasts genes, providing a valuable resource for researchers. CONCLUSIONS: The SynGenes platform offers a solution for standardizing gene nomenclatures of mitochondrial and chloroplast genes and providing a standardized search solution for specific markers in GenBank. Evaluation of SynGenes effectiveness through research conducted on GenBank and PubMedCentral demonstrated its ability to yield a greater number of outcomes compared to conventional searches, ensuring more comprehensive and accurate results. This tool is crucial for accurate database searches, and consequently, evolutionary analyses, addressing the challenges posed by non-standardized gene nomenclature.
Asunto(s)
Evolución Molecular , Terminología como Asunto , Genes del Cloroplasto , Genes Mitocondriales , Bases de Datos Genéticas , Cloroplastos/genética , Internet , Programas InformáticosRESUMEN
Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 1,000 species of arthropods, 8,441 vertebrates, and 430 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.
RESUMEN
Coronavirus Disease 2019 (COVID-19) is a sudden viral contagion that appeared at the end of last year in Wuhan city, the Chinese province of Hubei, China. The fast spread of COVID-19 has led to a dangerous threat to worldwide health. Also in the last two decades, several viral epidemics have been listed like the severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002/2003, the influenza H1N1 in 2009 and recently the Middle East respiratory syndrome coronavirus (MERS-CoV) which appeared in Saudi Arabia in 2012. In this research, an automated system is created to differentiate between the COVID-19, SARS-CoV and MERS-CoV epidemics by using their genomic sequences recorded in the NCBI GenBank in order to facilitate the diagnosis process and increase the accuracy of disease detection in less time. The selected database contains 76 genes for each epidemic. Then, some features are extracted like a discrete Fourier transform (DFT), discrete cosine transform (DCT) and the seven moment invariants to two different classifiers. These classifiers are the k-nearest neighbor (KNN) algorithm and the trainable cascade-forward back propagation neural network where they give satisfying results to compare. To evaluate the performance of classifiers, there are some effective parameters calculated. They are accuracy (ACC), F1 score, error rate and Matthews correlation coefficient (MCC) that are 100%, 100%, 0 and 1, respectively, for the KNN algorithm and 98.89%, 98.34%, 0.0111 and 0.9754, respectively, for the cascade-forward network.
Asunto(s)
COVID-19/diagnóstico , Genoma Viral , SARS-CoV-2/genética , Algoritmos , COVID-19/virología , Análisis de Fourier , HumanosRESUMEN
The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Ácidos Grasos , Análisis de Secuencia de ADN , Reproducibilidad de los Resultados , ARN Ribosómico 16S/genética , Filogenia , Composición de Base , ADN Bacteriano/genética , Técnicas de Tipificación Bacteriana , Ácidos Grasos/químicaRESUMEN
Erroneous taxonomic attributions in GenBank accessions can mislead phylogenetic inference and appear to be widespread within genera. We investigate the influence of taxonomic misattributions for reconstructing the phylogeny of three-striped dasyures, which include four recognized Myoictis species (Marsupialia: Dasyuridae) that are distributed across New Guinea and nearby islands. Molecular phylogenetic studies that have focused on dasyurids consistently resolve the interrelationships of these small carnivores, grouping M. leucura with M. wavicus, and placing M. wallacei and M. melas as successively deeper divergences from these. Two recent marsupial and mammalian supermatrix phylogenies instead favour an alternative Myoictis topology that is discordant with each of these relationships. We add new nuclear and mitochondrial sequences and employ randomized accession resampling that shows the supermatrix topologies are an artefact of several outdated taxonomic attributions in GenBank. Updating these accessions brings agreement across Myoictis phylogenies with randomly resampled accessions. We encourage authors to update GenBank taxonomic attributions and we argue that an option is needed for flagging accessions that are not demonstrably incorrect, but that provide anomalous results. This would serve both as a caution for future supermatrix construction and to highlight accessions of potentially significant biological interest for further study.
Asunto(s)
Marsupiales , Animales , Núcleo Celular/genética , Bases de Datos de Ácidos Nucleicos , Nueva Guinea , FilogeniaRESUMEN
BACKGROUND AND OBJECTIVES: The LW gene encodes the LW glycoprotein that carries the antigens of the LW blood group system. LW antigens are distinct from D antigen, however, they are phenotypically related and anti-LW antibodies are often mistaken as anti-D. An antibody was detected in an Australian patient of Aboriginal descent who consistently typed as LW(a+b-). This study aimed to describe the antibody recognizing a high-prevalence antigen on the LW glycoprotein. STUDY DESIGN AND METHODS: Samples from the patient and her four siblings were investigated. DNA was genotyped by single nucleotide polymorphism (SNP)-microarray and massively parallel sequencing (MPS) platforms. Red blood cells (RBCs) were phenotyped using standard haemagglutination techniques. Antibody investigations were performed using a panel of phenotyped RBCs from adults and cord blood cells. RESULTS: SNP-microarray and MPS genotyped all family members as LW*A/A, (c.299A), predicting LW(a+b-). In addition, a novel LW*A c.309C>A single nucleotide variant was detected in all family members. The patient and one of her siblings (M4) were LW c.309C>A homozygous. Antibody from the patient reacted positive to all reagent panel RBCs and cord blood cells but negative with RBCs from LW(a-b-), Rhnull and sibling M4. Antibody failed to react with RBCs treated with dithiothreitol. CONCLUSION: Antibody detected in the patient recognized a novel high-prevalence antigen, LWEM, in the LW blood group system. LWEM-negative patients who developed anti-LWEM can be safely transfused with D+ RBCs, however, D- is preferred. Accurate antibody identification can help better manage allocation of blood products especially when D- RBCs are in short supply.
Asunto(s)
Antígenos de Grupos Sanguíneos , Isoanticuerpos , Adulto , Australia/epidemiología , Antígenos de Grupos Sanguíneos/genética , Femenino , Hemaglutinación , Humanos , Prevalencia , Sistema del Grupo Sanguíneo Rh-Hr/genéticaRESUMEN
The E6 region has higher protuberant probability annealing than consensus probe focusing on another region in the human papillomavirus (HPV) genome in terms of detection and screening method. Here, we designed the first multiple virus single-stranded deoxyribonucleic acid (ssDNA) for multiple detections in an early phase of screening for cervical cancer in the E6 region and became a fundamental evolution of detection electrochemical HPV biosensor. Gene profiling of the virus ssDNA sequences has been carried by high-end bioinformatics tools such as GenBank, Basic Local Alignment Searching Tools (BLAST), and Clustal OMEGA in a row. The output from bioinformatics tools resulted in 100% of similarities between our virus ssDNA probe and HPV complete genome in the databases. The cross-validation between HPV genome and our designed virus ssDNA provided high specificity and selectivity during screening methods compared with Pap smear. The DNA probe for HPV 18, 5' COOH-GAT CCA GAA GGT ACA GAC GGG GAG GGC ACG 3', while 5'COOH-GGG CGC TGT GCA GTG TGT TGG AGA CCC CGA3' as DNA probe for HPV 58 designed with 66.77% guanine (G) and cytosine (C) content for both. Our virus ssDNA probe for the HPV biosensor promises high sensitivity, specificity, selectivity, repeatability, low fluid consumption, and will be useful in mini-size diagnostic devices for cervical cancer detection.
Asunto(s)
Nanopartículas del Metal , Proteínas Oncogénicas Virales , Infecciones por Papillomavirus , Neoplasias del Cuello Uterino , Femenino , Humanos , Papillomavirus Humano 18/genética , Neoplasias del Cuello Uterino/diagnóstico , Oro , Infecciones por Papillomavirus/diagnóstico , Papillomaviridae/genética , Sondas de ADN , Proteínas Oncogénicas Virales/genéticaRESUMEN
Species within Fusarium are of global agricultural, medical, and food/feed safety concern and have been extensively characterized. However, accurate identification of species is challenging and usually requires DNA sequence data. FUSARIUM-ID (http://isolate.fusariumdb.org/blast.php) is a publicly available database designed to support the identification of Fusarium species using sequences of multiple phylogenetically informative loci, especially the highly informative â¼680-bp 5' portion of the translation elongation factor 1-alpha (TEF1) gene that has been adopted as the primary barcoding locus in the genus. However, FUSARIUM-ID v.1.0 and 2.0 had several limitations, including inconsistent metadata annotation for the archived sequences and poor representation of some species complexes and marker loci. Here, we present FUSARIUM-ID v.3.0, which provides the following improvements: (i) additional and updated annotation of metadata for isolates associated with each sequence, (ii) expanded taxon representation in the TEF1 sequence database, (iii) availability of the sequence database as a downloadable file to enable local BLAST queries, and (iv) a tutorial file for users to perform local BLAST searches using either freely available software, such as SequenceServer, BLAST+ executable in the command line, and Galaxy, or the proprietary Geneious software. FUSARIUM-ID will be updated on a regular basis by archiving sequences of TEF1 and other loci from newly identified species and greater in-depth sampling of currently recognized species.
Asunto(s)
Fusarium , ADN de Hongos/genética , Fusarium/genética , FilogeniaRESUMEN
Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. Colletotrichum species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available Colletotrichum species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.
Asunto(s)
Colletotrichum , Colletotrichum/genética , Comercio , ADN , Internacionalidad , FilogeniaRESUMEN
Threadfins (Teleostei: Polynemidae) are a group of fishes named for their elongated and threadlike pectoral-fin rays. These fishes are commonly found in the world's tropical and subtropical waters, and are an economically important group for people living in these regions, with more than 100,000 t harvested in recent years. However, we do not have a detailed understanding of polynemid evolutionary history such that these fishes can be monitored, managed and conserved as an important tropical food source. Recent studies hypothesize at least one genus of threadfins is polyphyletic, and no studies have focused on generating a hypothesis of relationship for the Polynemidae using DNA sequences. In this study, we analyse a genomic dataset of ultraconserved-element and mitochondrial loci to construct a phylogeny of the Polynemidae. We recover the threadfins as a clade sister to flatfishes, with the most taxonomically rich genus, Polydactylus, being resolved as polyphyletic. When comparing our dataset to data from previous studies, we find that a few recent broad-scale phylogenies of fishes have incorporated mislabelled, misidentified or chimeric terminals into their analyses, impacting the relationships of threadfins they recover. We highlight these problematic sequences, providing revised identifications based on the data sequenced in this study. We then discuss the intrarelationships of threadfins, highlighting morphological or ecological characters that support the clades we recover.
Asunto(s)
Evolución Biológica , Peces Planos , Animales , Peces , Peces Planos/genética , Genoma , Genómica , Humanos , FilogeniaRESUMEN
Public molecular databases are fundamental tools for modern taxonomic studies whose usefulness rely on the soundness of the data within them. Here, we study potential errors that can arise along the data pipeline from sampling, specimen identification and molecular processing (digestion, amplification and sequencing) to the submission of sequences to these databases by using the DNA sequences of Hydrachnidia (Acari, Parasitengona) as a case study. Our results indicate that molecular information is available for only about 3% of the Hydrachnidia species known to date; yet, within this small percentage, errors are present in almost 5% of the species analyzed (0.5% of the sequences and almost 11% of the genera). This study underscores the scarcity of genetic data available for Hydrachnidia, but also that the proportion of errors in DNA sequences is relatively small. Even so, it highlights the danger associated with using DNA sequences from public databases, particularly for species identification, and reinforces the need for greater quality control measures and/or protocols to avoid an intensification of errors in the (post) genomics era. Finally, our study emphasizes that potential errors may also reveal cryptic diversity within a species.
Asunto(s)
Ácaros , Animales , Código de Barras del ADN Taxonómico , Ácaros/genética , FilogeniaRESUMEN
Millette et al. (Ecology Letters, 2020, 23:55-67) reported no consistent worldwide anthropogenic effects on animal genetic diversity using repurposed mitochondrial DNA sequences. We reexamine data from this study, describe genetic marker and scale limitations which might lead to misinterpretations with conservation implications, and provide advice to improve future macrogenetic studies.
Asunto(s)
ADN Mitocondrial , Variación Genética , Animales , ADN Mitocondrial/genética , Ecología , Marcadores GenéticosRESUMEN
Lamiales is one of the most intractable orders of flowering plants, with several changes in family composition, and circumscription throughout history. The order is worldwide distributed, occurring in tropical forests and frozen habitats. In this study, a comprehensive phylogeny of Lamiales was reconstructed using DNA sequences. The tree was used to infer dispersal patterns, focusing on the tropics and extratropics. Molecular and species geographic data available from public repositories were combined to address both objectives. A total of 6,910 species, and 842 genera of Lamiales were sampled using the Python tool PyPHLAWD. The tree was inferred using RAxML, and recovered a monophyletic Lamiales. All 26 families were recovered as monophyletic with high support. The families Bignoniaceae, and Plantaginaceae are remarkable examples. The first emerged as monophyletic and included tribe Jacarandeae, while the later emerged as monophyletic in its sensu lato and included both the tribes Angelonieae, and Gratioleae. Distribution points for all species were retrieved from GBIF. After filtering, 1,136,425 records were retained. Species were coded as present in extratropical or tropical environments. The in and out of the tropics dispersal patterns were inferred using a maximum likelihood approach that identifies hidden rate changes. The model recovered higher rates of transition from extratropics to tropics, estimating two rates of state transitions. When ancestral states are considered, more discrete transitions from extratropics to tropics were observed. The extratropical state was also inferred for the crown node of Lamiales and old nested nodes, revealing a rare pattern of transitions to the tropics throughout the upper Cretaceous and Tertiary. A significant phylogenetic signal was recovered for the in and out of the tropics dispersal patterns, showing that state transitions are not frequent enough to erase the effect of tree structure on the data.
Asunto(s)
Lamiales , Magnoliopsida , Teorema de Bayes , Geografía , Humanos , Funciones de Verosimilitud , FilogeniaRESUMEN
Accurate taxonomic identifications and species delimitations are a fundamental problem in biology. The complex taxonomy of Nematoda is primarily based on morphology, which is often dubious. DNA barcoding emerged as a handy tool to identify specimens and assess diversity, but its applications in Nematoda are incipient. We evaluated cytochrome c oxidase subunit I (cox1) efficiency as a DNA barcode for nematodes scrutinising 5241 sequences retrieved from BOLD and GenBank. The samples included genera with medical, agricultural, or ecological relevance: Anguillicola, Caenorhabditis, Heterodera, Meloidogyne, Onchocerca, Strongyloides, and Trichinella. We assessed cox1 performance through barcode gap and Probability of Correct Identification (PCI) analyses, and estimated species richness through Automatic Barcode Gap Discovery (ABGD). Each genus presented distinct gap ranges, mirroring the evolutionary diversity within Nematoda. Thus, to survey the diversity of the phylum, a careful definition of thresholds for lower taxonomic levels should be considered. PCIs were around 70% for both databases, highlighting operational biases and challenges in nematode taxonomy. ABGD inferred higher richness than the taxonomic labels informed by databases. The prevalence of specimen misidentifications and dubious species delimitations emphasise the value of integrative approaches to nematode taxonomy and systematics. Overall, cox1 is a relevant tool for integrative taxonomy of nematodes.
Asunto(s)
Código de Barras del ADN Taxonómico , Complejo IV de Transporte de Electrones , Nematodos , Animales , ADN de Helmintos , Complejo IV de Transporte de Electrones/genética , Nematodos/genética , FilogeniaRESUMEN
The problem of low species-level identification rates in plants by DNA barcoding is exacerbated by the fact that reference databases are far from being comprehensive. We investigate the impact of increased sampling depth on identification success by analyzing the efficacy of established plant barcode marker sequences (rbcL, matK, trnL-trnF, psbA-trnH, ITS). Adding sequences of the same species to the reference database led to an increase in correct species assignment of +10.9% for rbcL and +19.0% for ITS. Simultaneously, erroneous identification dropped from â¼40% to â¼12.5%. Despite its evolutionary constraints, ITS showed the highest identification rate and identification gain by increased sampling effort, which makes it a very suitable marker in the planning phase of a barcode study. The limited sequence availability of trnL-trnF is problematic for an otherwise very promising plastid plant barcoding marker. Future developments in machine learning algorithms have the potential to give new impetus to plant barcoding, but are dependent on extensive reference databases. We expect that our results will be incorporated into future plans for the development of DNA barcoding reference databases and will lead to these being developed with greater depth and taxonomic coverage.
Asunto(s)
Código de Barras del ADN Taxonómico , ADN de Plantas , Plantas/clasificación , ADN Espaciador Ribosómico , Bases de Datos de Ácidos Nucleicos , Marcadores Genéticos , Plantas/genéticaRESUMEN
Parasites are important components of biodiversity and contributors to ecosystem functioning, but are often neglected in ecological studies. Most studies examine model parasite systems or single taxa, thus our understanding of community composition is lacking. Here, the seasonal and annual dynamics of parasites was quantified using a 5-year metabarcoding time-series of freshwater plankton, collected weekly. We first identified parasites in the dataset using literature searches of the taxonomic match and using sequence metadata from the National Center for Biotechnology Information (NCBI) nucleotide database. In total, 441 amplicon sequence variants (belonging to 18 phyla/clades) were classified as parasites. The four phyla/clades with the highest relative read abundance and richness were Chytridiomycota, Dinoflagellata, Oomycota and Perkinsozoa. Relative read abundance of total parasite taxa, Dinoflagellata and Perkinsozoa significantly varied with season and was highest in summer. Parasite richness varied significantly with season and year, and was generally lowest in spring. Each season had distinct parasite communities, and the difference between summer and winter communities was most pronounced. Combining DNA metabarcoding with searches of the literature and NCBI metadata allowed us to characterize parasite diversity and community dynamics and revealed the extent to which parasites contribute to the diversity of freshwater plankton communities.
Asunto(s)
Parásitos , Plancton , Animales , Biodiversidad , Código de Barras del ADN Taxonómico , Ecosistema , Agua Dulce , Parásitos/genética , Plancton/genética , ARN Ribosómico 18S/genéticaRESUMEN
BACKGROUND: Haemosporidians (Apicomplexa, Protista) are obligate heteroxenous parasites of vertebrates and blood-sucking dipteran insects. Avian haemosporidians comprise more than 250 species traditionally classified into four genera, Plasmodium, Haemoproteus, Leucocytozoon, and Fallisia. However, analyses of the mitochondrial CytB gene revealed a vast variety of lineages not yet linked to morphospecies. This study aimed to analyse and discuss the data of haemosporidian lineages isolated from birds of the family Turdidae, to visualise host and geographic distribution using DNA haplotype networks and to suggest directions for taxonomy research on parasite species. METHODS: Haemosporidian CytB sequence data from 350 thrushes were analysed for the present study and complemented with CytB data of avian haemosporidians gathered from Genbank and MalAvi database. Maximum Likelihood trees were calculated to identify clades featuring lineages isolated from Turdidae species. For each clade, DNA haplotype networks were calculated and provided with information on host and geographic distribution. RESULTS: In species of the Turdidae, this study identified 82 Plasmodium, 37 Haemoproteus, and 119 Leucocytozoon lineages, 68, 28, and 112 of which are mainly found in this host group. Most of these lineages cluster in the clades, which are shown as DNA haplotype networks. The lineages of the Leucocytozoon clades were almost exclusively isolated from thrushes and usually were restricted to one host genus, whereas the Plasmodium and Haemoproteus networks featured multiple lineages also recovered from other passeriform and non-passeriform birds. CONCLUSION: This study represents the first attempt to summarise information on the haemosporidian parasite lineages of a whole bird family. The analyses allowed the identification of numerous groups of related lineages, which have not been linked to morphologically defined species yet, and they revealed several cases in which CytB lineages were probably assigned to the wrong morphospecies. These taxonomic issues are addressed by comparing distributional patterns of the CytB lineages with data from the original species descriptions and further literature. The authors also discuss the availability of sequence data and emphasise that MalAvi database should be considered an extremely valuable addition to GenBank, but not a replacement.
Asunto(s)
Enfermedades de las Aves/epidemiología , Haemosporida/fisiología , Interacciones Huésped-Parásitos , Infecciones Protozoarias en Animales/epidemiología , Pájaros Cantores , Animales , Enfermedades de las Aves/parasitología , Filogeografía , Prevalencia , Infecciones Protozoarias en Animales/parasitologíaRESUMEN
The typical wet lab user often annotates smaller sequences in the GenBank format, but resulting files are not accepted for database submission by NCBI. This makes submission of such annotations a cumbersome task. Here we present "GB2sequin" an easy-to-use web application that converts custom annotations in the GenBank format into the NCBI direct submission format Sequin. Additionally, the program generates a "five-column, tab-delimited feature table" and a FASTA file. Those are required for submission through BankIt or the update of an existing GenBank entry. We specifically developed "GB2sequin" for the regular wet lab researcher with strong focus on user-friendliness and flexibility. The application is equipped with an intuitive graphical interface and a comprehensive documentation. It can be employed to prepare any GenBank file for database submission and is freely available online at https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html.
Asunto(s)
Bases de Datos de Ácidos Nucleicos/normas , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Anotación de Secuencia Molecular/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normasRESUMEN
Great progress has been made in unravelling the evolutionary history of Asian colobines, largely through the use of dated molecular phylogenies based on multiple markers. The Presbytis langurs are a case in point, with more allopatric species being identified, recognition of Presbytis thomasi from Sumatra rather than P. potenziani from the Mentawai Islands as being the most basal species of the group, and the discovery that P. rubicunda from Borneo is nested among the Sumatran species and only made it to Borneo in the last 1.3 million years. Based on variation in mitochondrial d-loop, it has recently been argued that Malaysia's P. femoralis femoralis is actually P. neglectus neglectus. Unfortunately, despite being available, sequences from the type locality, Singapore, were excluded from the analysis, and none of the newly generated sequences was deposited in GenBank. I manually reconstructed these sequences, which allowed me to present a molecular phylogeny that includes 8 additional sequences from West Malaysia and Singapore. P. neglectus from Malaysia and P. femoralis from Singapore form one monophyletic clade, with minimal divergence. I conclude that recognition of P. neglectus is erroneous and the name is a junior synonym of P. femoralis. Colobine taxonomy and systematics have advanced, and continue to advance, mostly by considering evidence from a wide range of individuals, species and data sets (molecular, behavioural and morphological) rather than focusing on single molecular markers from 1 or 2 species from one small geographic area. For an orderly taxonomic debate where evidence can be evaluated and reinterpreted it is essential that newly generated sequences are deposited in public repositories.
Asunto(s)
Secuencia de Bases , Presbytini/clasificación , Presbytini/genética , Animales , Asia Sudoriental , ADN Mitocondrial , Filogenia , Análisis de Secuencia de ADN , Especificidad de la EspecieRESUMEN
China National GeneBank DataBase (CNGBdb) is a data platform aiming to systematically archiving and sharing of multi-omics data in life science. As the service portal of Bio-informatics Data Center of the core structure, namely, "Three Banks and Two Platforms" of China National GeneBank (CNGB), CNGBdb has the advantages of rich sample resources, data resources, cooperation projects, powerful data computation and analysis capabilities. With the advent of high throughput sequencing technologies, research in life science has entered the big data era, which is in the need of closer international cooperation and data sharing. With the development of China's economy and the increase of investment in life science research, we need to establish a national public platform for data archiving and sharing in life science to promote the systematic management, application and industrial utilization. Currently, CNGBdb can provide genomic data archiving, information search engines, data management and data analysis services. The data schema of CNGBdb has covered projects, samples, experiments, runs, assemblies, variations and sequences. Until May 22, 2020, CNGBdb has archived 2176 research projects and more than 2221 TB sequencing data submitted by researchers globally. In the future, CNGBdb will continue to be dedicated to promoting data sharing in life science research and improving the service capability. CNGBdb website is: https://db.cngb.org/.