Nat Genet ; 51(6): 1052-1059, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31152161


Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78 megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight-a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields.

Estudos de Associação Genética , Genoma de Planta , Genômica , Fenótipo , Zea mays/genética , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Anotação de Sequência Molecular , Melhoramento Vegetal , Plantas Geneticamente Modificadas , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável
Artif Intell Med ; 78: 41-46, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28764871


OBJECTIVES: In this paper, a high-quality sequence encoding scheme is proposed for predicting subcellular location of apoptosis proteins. METHODS: In the proposed methodology, the novel evolutionary-conservative information is introduced to represent protein sequences. Meanwhile, based on the proportion of golden section in mathematics, position-specific scoring matrix (PSSM) is divided into several blocks. Then, these features are predicted by support vector machine (SVM) and the predictive capability of proposed method is implemented by jackknife test RESULTS: The results show that the golden section method is better than no segmentation method. The overall accuracy for ZD98 and CL317 is 98.98% and 91.11%, respectively, which indicates that our method can play a complimentary role to the existing methods in the relevant areas. CONCLUSIONS: The proposed feature representation is powerful and the prediction accuracy will be improved greatly, which denotes our method provides the state-of-the-art performance for predicting subcellular location of apoptosis proteins.

Sequência de Aminoácidos , Proteínas Reguladoras de Apoptose , Máquina de Vetores de Suporte , Apoptose , Biologia Computacional , Bases de Dados de Proteínas , Evolução Molecular , Proteínas
Biochimie ; 97: 60-5, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24067326


Knowledge of protein secondary structural classes plays an important role in understanding protein folding patterns. In this paper, 25 features based on position-specific scoring matrices are selected to reflect evolutionary information. In combination with other 11 rational features based on predicted protein secondary structure sequences proposed by the previous researchers, a 36-dimensional representation feature vector is presented to predict protein secondary structural classes for low-similarity sequences. ASTRALtraining dataset is used to train and design our method, other three low-similarity datasets ASTRALtest, 25PDB and 1189 are used to test the proposed method. Comparisons with other methods show that our method is effective to predict protein secondary structural classes. Stand alone version of the proposed method (PSSS-PSSM) is written in MATLAB language and it can be downloaded from

Algoritmos , Matrizes de Pontuação de Posição Específica , Proteínas , Software , Sequência de Aminoácidos , Inteligência Artificial , Bases de Dados de Proteínas , Dados de Sequência Molecular , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência
J Theor Biol ; 336: 52-60, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-23876763


Lempel-Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel-Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.

Algoritmos , Homologia de Sequência , Sequência de Bases , Análise por Conglomerados , Coronavirus/classificação , Coronavirus/genética , Genoma Viral , Vírus da Hepatite E/genética , Filogenia