Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Sci Adv ; 10(21): eadj6823, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781323

RESUMEN

We present a draft genome of the little bush moa (Anomalopteryx didiformis)-one of approximately nine species of extinct flightless birds from Aotearoa, New Zealand-using ancient DNA recovered from a fossil bone from the South Island. We recover a complete mitochondrial genome at 249.9× depth of coverage and almost 900 megabases of a male moa nuclear genome at ~4 to 5× coverage, with sequence contiguity sufficient to identify more than 85% of avian universal single-copy orthologs. We describe a diverse landscape of transposable elements and satellite repeats, estimate a long-term effective population size of ~240,000, identify a diverse suite of olfactory receptor genes and an opsin repertoire with sensitivity in the ultraviolet range, show that the wingless moa phenotype is likely not attributable to gene loss or pseudogenization, and identify potential function-altering coding sequence variants in moa that could be synthesized for future functional assays. This genomic resource should support further studies of avian evolution and morphological divergence.


Asunto(s)
Aves , Extinción Biológica , Genoma , Animales , Aves/genética , Núcleo Celular/genética , Filogenia , Fósiles , Genoma Mitocondrial , Vuelo Animal , Nueva Zelanda , Masculino , Elementos Transponibles de ADN/genética , Genómica/métodos
2.
Nat Cell Biol ; 25(6): 865-876, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37169880

RESUMEN

The elucidation of the mechanisms of ageing and the identification of methods to control it have long been anticipated. Recently, two factors associated with ageing-the accumulation of senescent cells and the change in the composition of gut microbiota-have been shown to play key roles in ageing. However, little is known about how these phenomena occur and are related during ageing. Here we show that the persistent presence of commensal bacteria gradually induces cellular senescence in gut germinal centre B cells. Importantly, this reduces both the production and diversity of immunoglobulin A (IgA) antibodies that target gut bacteria, thereby changing the composition of gut microbiota in aged mice. These results have revealed the existence of IgA-mediated crosstalk between the gut microbiota and cellular senescence and thus extend our understanding of the mechanism of gut microbiota changes with age, opening up possibilities for their control.


Asunto(s)
Microbioma Gastrointestinal , Animales , Ratones , Bacterias , Inmunoglobulina A , Senescencia Celular , Linfocitos B
3.
Biophys Rev ; 14(6): 1247-1253, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36536641

RESUMEN

Structural genomics began as a global effort in the 1990s to determine the tertiary structures of all protein families as a response to large-scale genome sequencing projects. The immediate outcome was an influx of tens of thousands of protein structures, many of which had unknown functions. At the time, the value of structural genomics was controversial. However, the structures themselves were only the most obvious output. In addition, these newly solved structures motivated the emergence of huge data science and infrastructure efforts, which, together with advances in Deep Learning, have brought about a revolution in computational molecular biology. Here, we review some of the computational research carried out at the Protein Data Bank Japan (PDBj) during the Protein 3000 project under the leadership of Haruki Nakamura, much of which continues to flourish today.

4.
Sci Transl Med ; 14(650): eabn7737, 2022 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-35471044

RESUMEN

The Omicron (B.1.1.529) SARS-CoV-2 variant contains an unusually high number of mutations in the spike protein, raising concerns of escape from vaccines, convalescent serum, and therapeutic drugs. Here, we analyzed the degree to which Omicron pseudo-virus evades neutralization by serum or therapeutic antibodies. Serum samples obtained 3 months after two doses of BNT162b2 vaccination exhibited 18-fold lower neutralization titers against Omicron than parental virus. Convalescent serum samples from individuals infected with the Alpha and Delta variants allowed similar frequencies of Omicron breakthrough infections. Domain-wise analysis using chimeric spike proteins revealed that this efficient evasion was primarily achieved by mutations clustered in the receptor binding domain but that multiple mutations in the N-terminal domain contributed as well. Omicron escaped a therapeutic cocktail of imdevimab and casirivimab, whereas sotrovimab, which targets a conserved region to avoid viral mutation, remains effective. Angiotensin-converting enzyme 2 (ACE2) decoys are another virus-neutralizing drug modality that are free, at least in theory, from complete escape. Deep mutational analysis demonstrated that an engineered ACE2 molecule prevented escape for each single-residue mutation in the receptor binding domain, similar to immunized serum. Engineered ACE2 neutralized Omicron comparably to the Wuhan strain and also showed a therapeutic effect against Omicron infection in hamsters and human ACE2 transgenic mice. Similar to previous SARS-CoV-2 variants, some sarbecoviruses showed high sensitivity against engineered ACE2, confirming the therapeutic value against diverse variants, including those that are yet to emerge.


Asunto(s)
Enzima Convertidora de Angiotensina 2 , COVID-19 , Animales , Anticuerpos Monoclonales Humanizados , Anticuerpos Neutralizantes/uso terapéutico , Anticuerpos Antivirales/uso terapéutico , Vacuna BNT162 , COVID-19/terapia , Humanos , Inmunización Pasiva , Ratones , Peptidil-Dipeptidasa A/química , Peptidil-Dipeptidasa A/genética , Peptidil-Dipeptidasa A/metabolismo , SARS-CoV-2 , Sueroterapia para COVID-19
5.
J Mol Evol ; 90(1): 73-94, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35084522

RESUMEN

Extant organisms commonly use 20 amino acids in protein synthesis. In the translation system, aminoacyl-tRNA synthetase (ARS) selectively binds an amino acid and transfers it to the cognate tRNA. It is postulated that the amino acid repertoire of ARS expanded during the development of the translation system. In this study we generated composite phylogenetic trees for seven ARSs (SerRS, ProRS, ThrRS, GlyRS-1, HisRS, AspRS, and LysRS) which are thought to have diverged by gene duplication followed by mutation, before the evolution of the last universal common ancestor. The composite phylogenetic tree shows that the AspRS/LysRS branch diverged from the other five ARSs at the deepest node, with the GlyRS/HisRS branch and the other three ARSs (ThrRS, ProRS and SerRS) diverging at the second deepest node. ThrRS diverged next, and finally ProRS and SerRS diverged from each other. Based on the phylogenetic tree, sequences of the ancestral ARSs prior to the evolution of the last universal common ancestor were predicted. The amino acid specificity of each ancestral ARS was then postulated by comparison with amino acid recognition sites of ARSs of extant organisms. Our predictions demonstrate that ancestral ARSs had substantial specificity and that the number of amino acid types amino-acylated by proteinaceous ARSs was limited before the appearance of a fuller range of proteinaceous ARS species. From an assumption that 10 amino acid species are required for folding and function, proteinaceous ARS possibly evolved in a translation system composed of preexisting ribozyme ARSs, before the evolution of the last universal common ancestor.


Asunto(s)
Aminoacil-ARNt Sintetasas , Aminoácidos/genética , Aminoacil-ARNt Sintetasas/genética , Aminoacil-ARNt Sintetasas/metabolismo , Filogenia , ARN de Transferencia/metabolismo
6.
BMC Biol ; 19(1): 217, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-34587965

RESUMEN

BACKGROUND: DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. RESULTS: We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells ("R10.3") which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. CONCLUSIONS: We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle ("Flongle") while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.


Asunto(s)
Biodiversidad , Biología Computacional , Código de Barras del ADN Taxonómico , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos
7.
Commun Biol ; 4(1): 1134, 2021 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-34552191

RESUMEN

The ability to predict emerging variants of SARS-CoV-2 would be of enormous value, as it would enable proactive design of vaccines in advance of such emergence. We estimated diversity of each site on a multiple sequence alignment (MSA) of the Spike (S) proteins from close relatives of SARS-CoV-2 that infected bat and pangolin before the pandemic. Then we compared the locations of high diversity sites in this MSA and those of mutations found in multiple emerging lineages of human-infecting SARS-CoV-2. This comparison revealed a significant correspondence, which suggests that a limited number of sites in this protein are repeatedly substituted in different lineages of this group of viruses. It follows, therefore, that the sites of future emerging mutations in SARS-CoV-2 can be predicted by analyzing their relatives (outgroups) that have infected non-human hosts. We discuss a possible evolutionary basis for these substitutions and provide a list of frequently substituted sites that potentially include future emerging variants in SARS-CoV-2.


Asunto(s)
Evolución Molecular , SARS-CoV-2/genética , Animales , Genoma Viral/genética , Alineación de Secuencia
9.
Heliyon ; 7(2): e06317, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33665461

RESUMEN

The oomycete genus Phytophthora includes devastating plant pathogens that are found in almost all ecosystems. We sequenced the genomes of two quarantined Phytophthora species-P. fragariae and P. rubi. Comparing these Phytophthora species and related genera allowed reconstruction of the phylogenetic relationships within the genus Phytophthora and revealed Phytophthora genomic features associated with infection and pathogenicity. We found that several hundred Phytophthora genes are putatively inherited from red algae, but Phytophthora does not have vestigial plastids originating from phototrophs. The horizontally-transferred Phytophthora genes are abundant transposons that "transmit" exogenous gene to Phytophthora species thus bring about the gene recombination possibility. Several expansion events of Phytophthora gene families associated with cell wall biogenesis can be used as mutational targets to elucidate gene function in pathogenic interactions with host plants. This work enhanced the understanding of Phytophthora evolution and will also be helpful for the design of phytopathological control strategies.

10.
Methods Mol Biol ; 2231: 135-145, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33289891

RESUMEN

Long DNA and RNA reads from nanopore and PacBio technologies have many applications, but the raw reads have a substantial error rate. More accurate sequences can be obtained by merging multiple reads from overlapping parts of the same sequence. lamassemble aligns up to ∼1000 reads to each other, and makes a consensus sequence, which is often much more accurate than the raw reads. It is useful for studying a region of interest such as an expanded tandem repeat or other disease-causing mutation.


Asunto(s)
Secuencia de Consenso , Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Técnicas Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Nanoporos
11.
Methods Mol Biol ; 2231: 163-177, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33289893

RESUMEN

The Database of Aligned Structural Homologs (DASH) is a tool for efficiently navigating the Protein Data Bank (PDB) by means of pre-computed pairwise structural alignments. We recently showed that, by integrating DASH structural alignments with the multiple sequence alignment (MSA) software MAFFT, we were able to significantly improve MSA accuracy without dramatically increasing manual or computational complexity. In the latest DASH update, such queries are not limited to PDB entries but can also be launched from user-provided protein coordinates. Here, we describe a further extension of DASH that retrieves intermolecular interactions of all structurally similar domains in the PDB to a query domain of interest. We illustrate these new features using a model of the NYN domain of the ribonuclease N4BP1 as an example. We show that the protein-nucleotide interactions returned are distributed on the surface of the NYN domain in an asymmetric manner, roughly centered on the known nuclease active site.


Asunto(s)
Proteínas de Unión al ARN/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos de Proteínas , Proteínas Nucleares/química , Unión Proteica , Dominios Proteicos , Ribonucleasas/química
12.
Front Microbiol ; 11: 2112, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33042039

RESUMEN

The SARS-CoV-2 S protein is a major point of interaction between the virus and the human immune system. As a consequence, the S protein is not a static target but undergoes rapid molecular evolution. In order to more fully understand the selection pressure during evolution, we examined residue positions in the S protein that vary greatly across closely related viruses but are conserved in the subset of viruses that infect humans. These "evolutionarily important" residues were not distributed evenly across the S protein but were concentrated in two domains: the N-terminal domain and the receptor-binding domain, both of which play a role in host cell binding in a number of related viruses. In addition to being localized in these two domains, evolutionary importance correlated with structural flexibility and inversely correlated with distance from known or predicted host receptor-binding residues. Finally, we observed a bias in the composition of the amino acids that make up such residues toward more human-like, rather than virus-like, sequence motifs.

13.
Genome Med ; 12(1): 67, 2020 07 31.
Artículo en Inglés | MEDLINE | ID: mdl-32731881

RESUMEN

BACKGROUND: Many genetic/genomic disorders are caused by genomic rearrangements. Standard methods can often characterize these variations only partly, e.g., copy number changes or breakpoints. It is important to fully understand the order and orientation of rearranged fragments, with precise breakpoints, to know the pathogenicity of the rearrangements. METHODS: We performed whole-genome-coverage nanopore sequencing of long DNA reads from four patients with chromosomal translocations. We identified rearrangements relative to a reference human genome, subtracted rearrangements shared by any of 33 control individuals, and determined the order and orientation of rearranged fragments, with our newly developed analysis pipeline. RESULTS: We describe the full characterization of complex chromosomal rearrangements, by filtering out genomic rearrangements seen in controls without the same disease, reducing the number of loci per patient from a few thousand to a few dozen. Breakpoint detection was very accurate; we usually see ~ 0 ± 1 base difference from Sanger sequencing-confirmed breakpoints. For one patient with two reciprocal chromosomal translocations, we find that the translocation points have complex rearrangements of multiple DNA fragments involving 5 chromosomes, which we could order and orient by an automatic algorithm, thereby fully reconstructing the rearrangement. A rearrangement is more than the sum of its parts: some properties, such as sequence loss, can be inferred only after reconstructing the whole rearrangement. In this patient, the rearrangements were evidently caused by shattering of the chromosomes into multiple fragments, which rejoined in a different order and orientation with loss of some fragments. CONCLUSIONS: We developed an effective analytic pipeline to find chromosomal aberration in congenital diseases by filtering benign changes, only from long read sequencing. Our algorithm for reconstruction of complex rearrangements is useful to interpret rearrangements with many breakpoints, e.g., chromothripsis. Our approach promises to fully characterize many congenital germline rearrangements, provided they do not involve poorly understood loci such as centromeric repeats.


Asunto(s)
Reordenamiento Génico , Estudio de Asociación del Genoma Completo , Mutación de Línea Germinal , Aberraciones Cromosómicas , Puntos de Rotura del Cromosoma , Estudios de Asociación Genética/métodos , Predisposición Genética a la Enfermedad , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Translocación Genética , Secuenciación Completa del Genoma
14.
J Hum Genet ; 65(8): 667-674, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32296131

RESUMEN

Chromothripsis is a type of chaotic complex genomic rearrangement caused by a single event of chromosomal shattering and repair processes. Chromothripsis is known to cause rare congenital diseases when it occurs in germline cells, however, current genome analysis technologies have difficulty in detecting and deciphering chromothripsis. It is possible that this type of complex rearrangement may be overlooked in rare-disease patients whose genetic diagnosis is unsolved. We applied long read nanopore sequencing and our recently developed analysis pipeline dnarrange to a patient who has a reciprocal chromosomal translocation t(8;18)(q22;q21) as a result of chromothripsis between the two chromosomes, and fully characterize the complex rearrangements at the translocation site. The patient genome was evidently shattered into 19 fragments, and rejoined into derivative chromosomes in a random order and orientation. The reconstructed patient genome indicates loss of five genomic regions, which all overlap with microarray-detected copy number losses. We found that two disease-related genes RAD21 and EXT1 were lost by chromothripsis. These two genes could fully explain the disease phenotype with facial dysmorphisms and bone abnormality, which is likely a contiguous gene syndrome, Cornelia de Lange syndrome type IV (CdLs-4) and atypical Langer-Giedion syndrome (LGS), also known as trichorhinophalangeal syndrome type II (TRPSII). This provides evidence that our approach based on long read sequencing can fully characterize chromothripsis in a patient's genome, which is important for understanding the phenotype of disease caused by complex genomic rearrangement.


Asunto(s)
Proteínas de Ciclo Celular/genética , Cromotripsis , Proteínas de Unión al ADN/genética , Síndrome de Cornelia de Lange/genética , Síndrome de Langer-Giedion/genética , N-Acetilglucosaminiltransferasas/genética , Niño , Deleción Cromosómica , Síndrome de Cornelia de Lange/diagnóstico , Síndrome de Cornelia de Lange/fisiopatología , Genoma , Humanos , Síndrome de Langer-Giedion/diagnóstico , Síndrome de Langer-Giedion/fisiopatología , Masculino , Secuenciación de Nanoporos , Fenotipo , Análisis de Secuencia de ADN , Translocación Genética
15.
J Hum Genet ; 65(5): 475-480, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-32066831

RESUMEN

Recently, a recessively inherited intronic repeat expansion in replication factor C1 (RFC1) was identified in cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Here, we describe a Japanese case of genetically confirmed CANVAS with autonomic failure and auditory hallucination. The case showed impaired uptake of iodine-123-metaiodobenzylguanidine and 123I-ioflupane in the cardiac sympathetic nerve and dopaminergic neurons, respectively, by single-photon emission computed tomography. Long-read sequencing identified biallelic pathogenic (AAGGG)n nucleotide repeat expansion in RFC1 and heterozygous benign (TAAAA)n and (TAGAA)n expansions in brain expressed, associated with NEDD4 (BEAN1). Enrichment of the repeat regions in RFC1 and BEAN1 using a Cas9-mediated system clearly distinguished between pathogenic and benign repeat expansions. The haplotype around RFC1 indicated that the (AAGGG)n expansion in our case was on the same ancestral allele as that of European cases. Thus, long-read sequencing facilitates precise genetic diagnosis of diseases with complex repeat structures and various expansions.


Asunto(s)
Vestibulopatía Bilateral/genética , Ataxia Cerebelosa/genética , Expansión de las Repeticiones de ADN , Proteína de Replicación C/genética , Análisis de Secuencia de ADN , Anciano de 80 o más Años , Pueblo Asiatico , Vestibulopatía Bilateral/diagnóstico , Ataxia Cerebelosa/diagnóstico , Femenino , Humanos , Japón , Ubiquitina-Proteína Ligasas Nedd4/genética
16.
Methods Mol Biol ; 2048: 207-229, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31396940

RESUMEN

Structural modeling plays a key role in protein function prediction on a genome-wide scale. For B and T lymphocyte receptors, the critical functional question is: which antigens and epitopes are targeted? With emerging B cell receptor (BCR) and T cell receptor (TCR) sequencing methods improving in both breadth and depth, there is a growing need for methods that can help answer this question. Since lymphocyte-antigen recognition depends on complementarity, structural modeling is likely to play an important role in understanding antigen specificity and affinity. In the case of BCRs, such modeling methods have a long history in the study and design of antibodies. However, for TCRs there are relatively few publicly available modeling tools, and, to our knowledge, none that incorporate interaction between TCRs and peptide-MHC (pMHC) complexes. Here, we provide a web-based tool, ImmuneScape ( https://sysimm.org/immune-scape/ ), to carry out TCR-pMHC modeling as a first step toward structure-based function prediction.


Asunto(s)
Antígenos HLA/metabolismo , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Receptores de Antígenos de Linfocitos T/metabolismo , Linfocitos T/metabolismo , Alelos , Mapeo Epitopo/métodos , Epítopos de Linfocito T/genética , Epítopos de Linfocito T/inmunología , Epítopos de Linfocito T/metabolismo , Antígenos HLA/genética , Antígenos HLA/inmunología , Humanos , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T/inmunología , Alineación de Secuencia , Programas Informáticos , Relación Estructura-Actividad , Linfocitos T/inmunología
17.
Nat Genet ; 51(8): 1215-1221, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31332381

RESUMEN

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5' region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.


Asunto(s)
Encéfalo/patología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Desequilibrio de Ligamiento , Enfermedades Neurodegenerativas/genética , Enfermedades Neurodegenerativas/patología , Receptores Notch/genética , Expansión de Repetición de Trinucleótido/genética , Adolescente , Adulto , Anciano , Encéfalo/metabolismo , Estudios de Casos y Controles , Femenino , Marcadores Genéticos/genética , Humanos , Cuerpos de Inclusión Intranucleares/genética , Cuerpos de Inclusión Intranucleares/patología , Masculino , Persona de Mediana Edad , Linaje , Receptores Notch/metabolismo , Adulto Joven
18.
Nucleic Acids Res ; 47(W1): W5-W10, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31062021

RESUMEN

Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.


Asunto(s)
Secuencia de Aminoácidos/genética , Proteínas/genética , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Bases de Datos de Proteínas , Humanos , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de ARN , Homología de Secuencia
19.
Brief Bioinform ; 20(4): 1160-1166, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-28968734

RESUMEN

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Internet , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia , Interfaz Usuario-Computador
20.
Bioinformatics ; 34(14): 2490-2492, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29506019

RESUMEN

Summary: We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences. Availability and implementation: This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Estructura Secundaria de Proteína , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de ARN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...