Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 48
Filtrer
1.
J Biosci ; 32(1): 51-70, 2007 Jan.
Article de Anglais | MEDLINE | ID: mdl-17426380

RÉSUMÉ

The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C alpha in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs)are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called "pinning strategy" that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.


Sujet(s)
Biologie informatique/méthodes , Conformation des protéines , Protéines/composition chimique , Analyse de séquence de protéine , Séquence d'acides aminés , Théorème de Bayes , Bases de données de protéines , Protéines Escherichia coli/composition chimique , Données de séquences moléculaires , Banque de peptides
2.
Nucleic Acids Res ; 34(Web Server issue): W75-8, 2006 Jul 01.
Article de Anglais | MEDLINE | ID: mdl-16845113

RÉSUMÉ

Protein Peeling 2 (PP2) is a web server for the automatic identification of protein units (PUs) given the 3D coordinates of a protein. PUs are an intermediate level of protein structure description between protein domains and secondary structures. It is a new tool to better understand and analyze the organization of protein structures. PP2 uses only the matrices of protein contact probabilities and cuts the protein structures optimally using Matthews' coefficient correlation. An index assesses the compactness quality of each PU. Results are given both textually and graphically using JMol and PyMol softwares. The server can be accessed from http://www.ebgm.jussieu.fr/~gelly/index.html.


Sujet(s)
Conformation des protéines , Logiciel , Infographie , Internet , Pliage des protéines , Protéines/composition chimique , Interface utilisateur
3.
Bioinformatics ; 19(3): 345-53, 2003 Feb 12.
Article de Anglais | MEDLINE | ID: mdl-12584119

RÉSUMÉ

MOTIVATION: Our aim is to develop a process that automatically defines a repertory of contiguous 3D protein structure fragments and can be used in homology modeling. We present here improvements to the method we introduced previously: the 'hybrid protein model' (de Brevern and Hazout, THEOR: Chem. Acc., 106, 36-47, (2001)) The hybrid protein learns a non-redundant databank encoded in a structural alphabet composed of 16 Protein Blocks (PBs; de Brevern et al., Proteins, 41, 271-287, (2000)). Every local fold is learned by looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position corresponds to a cluster of similar 3D local folds. RESULTS: In this paper, we describe improvements to our method for building an optimal hybrid protein: (i) 'baby training,' which is defined as the introduction of large structure fragments and the progressive reduction in the size of training fragments; and (ii) the deletion of the redundant parts of the hybrid protein. This repertory of contiguous 3D protein structure fragments should be a useful tool for molecular modeling.


Sujet(s)
Algorithmes , Bases de données de protéines , Modèles moléculaires , Protéines/composition chimique , Séquence d'acides aminés , Intelligence artificielle , Données de séquences moléculaires , Fragments peptidiques/composition chimique , Conformation des protéines , Pliage des protéines , Contrôle de qualité
4.
Proteins ; 46(3): 243-9, 2002 Feb 15.
Article de Anglais | MEDLINE | ID: mdl-11835499

RÉSUMÉ

Knowledge of the disulfide bonding state of the cysteines of proteins is of major interest in designing numerous molecular biology experiments, or in predicting their three-dimensional structure. Previous methods using the information gained from aligned sets of sequences have reached up to 82% of success in predicting the oxidation state of cysteines. In the present study, we assess the relative efficiency of different descriptors in predicting the cysteine disulfide bonding states. Our results suggest that the information on the residues flanking the cysteines is less informative about the disulfide bonding state than about the amino acid content of the whole protein. Using a combination of logistic functions learned with subsets of proteins homogeneous in terms of their amino acid content, we propose a simple prediction approach, starting from a single sequence, that reaches success rates close to 84%. This score can be improved by avoiding predictions regarding cysteines for which the decision is not well marked. For example, we obtain a score close to 87% correct prediction when we exclude predicting 10% of the cysteines.


Sujet(s)
Cystéine/composition chimique , Disulfures/composition chimique , Protéines/composition chimique , Acides aminés/composition chimique , Simulation numérique , Modèles logistiques , Modèles chimiques , Structure tertiaire des protéines
5.
Bioinformatics ; 17(2): 196-7, 2001 Feb.
Article de Anglais | MEDLINE | ID: mdl-11238079

RÉSUMÉ

UNLABELLED: MOSAIC is a set of tools for the segmentation of multiple aligned DNA sequences into homogeneous zones. The segmentation is based on the distribution of mutational events along the alignment. As an example, the analysis of one repeated sequence belonging to the subtelomeric regions of the yeast genome is presented. AVAILABILITY: Free access from ftp://ftp.biomath.jussieu.fr/pub/papers/MOSAIC


Sujet(s)
Alignement de séquences , Analyse de séquence d'ADN , Logiciel , Génome fongique , Saccharomyces cerevisiae/génétique , Analyse de séquence d'ADN/méthodes
6.
Proteins ; 41(3): 271-87, 2000 Nov 15.
Article de Anglais | MEDLINE | ID: mdl-11025540

RÉSUMÉ

By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.


Sujet(s)
Théorème de Bayes , Simulation numérique , Modèles moléculaires , Fragments peptidiques/composition chimique , Conformation des protéines , Intelligence artificielle , Analyse de regroupements , Bases de données factuelles , Prévision , Ligases , , Fragments peptidiques/classification , Structure secondaire des protéines , Ubiquitines/métabolisme
7.
Protein Eng ; 12(12): 1063-73, 1999 Dec.
Article de Anglais | MEDLINE | ID: mdl-10611400

RÉSUMÉ

The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.


Sujet(s)
Protéines/composition chimique , Séquence d'acides aminés , Acides aminés/composition chimique , Protéines de transport/composition chimique , Bases de données comme sujet , Protéines Escherichia coli , Chaines de Markov , Modèles moléculaires , Données de séquences moléculaires , Fragments peptidiques/composition chimique , Conformation des protéines , Structure secondaire des protéines
8.
Bioinformatics ; 15(2): 176-7, 1999 Feb.
Article de Anglais | MEDLINE | ID: mdl-10089205

RÉSUMÉ

UNLABELLED: PredAcc is a tool for predicting the solvent accessibility of protein residues from the sequence at different relative accessibility levels (0-55%). The prediction rate varies between 70. 7% (for 25% relative accessibility) and 85.7% (for 0% relative accessibility). Amino acids are predicted in four categories: almost certainly hidden and almost certainly exposed with a given a posteriori prediction error, probably hidden and probably exposed otherwise. AVAILABILITY: http://condor.urbb.jussieu.fr/PredAccCfg.html CONTACT: tuffery@urbb.jussieu.fr


Sujet(s)
Protéines/composition chimique , Logiciel , Acides aminés/composition chimique , Simulation numérique , Modèles logistiques , Modèles chimiques , Conformation des protéines , Solvants
9.
Comput Appl Biosci ; 13(5): 497-508, 1997 Oct.
Article de Anglais | MEDLINE | ID: mdl-9367123

RÉSUMÉ

MOTIVATION: The approaches usually used for building large genetic maps consist of dividing the marker set into linkage groups and provide local orders that can be tested by multi-point linkage analysis. To deal with the limitations of these approaches, a strategy taking the marker set into account globally is defined. RESULTS: The paper presents a new approach called 'Bi-Dimensional Scaling Map (BDS-Map) for inferring marker orders and distances in genetic maps based on the use of an additional dimension orthogonal to the map into which markers are projected. Dynamical forces based on a two-point analysis are applied to tend to optimize the marker locations in space. The efficiency of the approach is exemplified on real data (16 and 70 markers on chromosomes 6 and 2, respectively) and simulated data (50 maps of 70 markers).


Sujet(s)
Cartographie chromosomique/méthodes , Logiciel , Algorithmes , Infographie , Systèmes de gestion de bases de données , Liaison génétique , Marqueurs génétiques , Langages de programmation , Interface utilisateur
10.
Hum Biol ; 69(3): 419-25, 1997 Jun.
Article de Anglais | MEDLINE | ID: mdl-9164051

RÉSUMÉ

When analyzed by origin, the frequency of the G542X cystic fibrosis (CF) mutation (the second most common CF mutation in Europe after DF508) varies between population groups in Europe. We show here that the frequency of G542X varies among different towns or regions of origin, being lower in northeastern Europeans than in southwestern Europeans. The G542X mutation mapping that we have defined by a multiple regression of G542X frequencies covers 28 countries (53 geographic points) and is based on data from 50 laboratories. The more elevated values of G542X frequency correspond to ancient sites of occupation by occidental Phoenicians.


Sujet(s)
Protéine CFTR/génétique , Mucoviscidose/génétique , Émigration et immigration , Fréquence d'allèle/génétique , Mutation/génétique , Europe , Humains , Analyse de régression
11.
Protein Eng ; 10(4): 361-72, 1997 Apr.
Article de Anglais | MEDLINE | ID: mdl-9194160

RÉSUMÉ

We have studied the effect of backbone inaccuracy on the efficiency of protein side chain conformation prediction using rotamer libraries. The backbones were generated by randomly perturbing the crystallographic conformation of 12 proteins and exhibit C alpha r.m.s.d.s of up to 2 A. Our results show that, even for a perturbation of the backbone fully compatible with the temperature factors of the proteins, the predicted side chain conformations of approximately 10% of the buried side chains remain variable. This fraction increases further for larger backbone deviations. However, for backbone deviations of up to 2 A r.m.s.d., the predicted side chain r.m.s.d. varies only in a ratio of < 1.4. Moreover, a possible strategy for obtaining side chain conformations close to the experimental ones consists of extracting the consensus conformations of the side chains from a series of backbone conformations. Such a procedure allows the computation of the side chain conformations with no loss of accuracy for backbones exhibiting r.m.s.d.s of up to 1 A from the crystallographic coordinates. For larger backbone deviations (up to 2 A r.m.s.d.) the r.m.s.d. of the buried side chains increases from 1.33 up to 1.60 A. We also discuss the influence of the size of the rotamer library on the quality of the prediction.


Sujet(s)
Modèles chimiques , Conformation des protéines , Cristallographie aux rayons X , Modèles moléculaires
12.
Hum Biol ; 69(2): 253-62, 1997 Apr.
Article de Anglais | MEDLINE | ID: mdl-9057348

RÉSUMÉ

The apolipoprotein E gene (APOE) is located on chromosome 19. The three most common APOE alleles account for most of the corresponding peptide chain variations in most human populations. APOE*3 is the most common allele, coding for the product E3; APOE*2 codes for an Arg-158-->Cys substitution (E2), and APOE*4 codes for a Cys-112-->Arg product (E4). We completed a meta-analysis of APOE allele frequencies from 30 geographically defined populations in Europe, including Iceland and Turkey. We performed a weighted multiple regression using normalized geographic coordinates and a fourth-degree polynomial. Next, we constructed maps showing isofrequencies of the *4 allele in Europe. We found a clear north to south decline in *4 allele frequency for continental Western Europe. No such clinal pattern was apparent for the *2 allele frequencies, but for *3 we found an inverse south to north decreasing gradient. Symmetry between the clines of the *4 and *3 alleles is due to a negative correlation coefficient (r = -0.89). We also plotted APOE allele frequencies against latitude; a decreasing cline was evident for *4 frequencies (y = -0.152 + 0.006x, r = 0.904) and an increasing cline was evident for *3 frequencies (y = 1.087 - 0.006x, r = 0.809). Clines for the APOE alleles could be the result of natural selection.


Sujet(s)
Apolipoprotéines E/génétique , Fréquence d'allèle , Allèles , Europe/épidémiologie , Génétique des populations , Humains , Incidence , Analyse de régression
13.
Genet Couns ; 8(2): 77-81, 1997.
Article de Anglais | MEDLINE | ID: mdl-9219003

RÉSUMÉ

We have collected 76 parent-offspring (CAG)n values in 60 French Huntington's disease (HD) pedigrees. The analysis of intergenerational alterations in CAG repeat length shows that there is a correlation between repeat instability and parental repeat length. Paternally inherited cases are characterized by a preferential trend towards an increase in range of repeat sizes in offspring of HD patients.


Sujet(s)
Maladie de Huntington/génétique , Méiose/génétique , Protéines de tissu nerveux/génétique , Protéines nucléaires/génétique , Répétitions de trinucléotides/génétique , Adulte , ADN/génétique , Femelle , Marqueurs génétiques/génétique , Dépistage génétique , Variation génétique , Humains , Protéine huntingtine , Maladie de Huntington/diagnostic , Mâle , Séquences répétées d'acides nucléiques
14.
Ann Hum Genet ; 61(Pt 1): 37-47, 1997 Jan.
Article de Anglais | MEDLINE | ID: mdl-9066926

RÉSUMÉ

The GM immunoglobulin (Ig) allotype distributions of 49 native Amerindian populations from North to South America were analysed by a new technique called 'Mobile Sites Method' (MSM). This allows the global interpretation of genetic diversity in space by means of a distorted geographic map called a 'genetic similarity map'. This approach has been improved by superimposing in the distorted geographic map both the haplotype set (represented by hypothetical populations having a 100% frequency of the haplotype considered) and the 'geography-genetics discontinuities' (i.e. the zones between homogeneous population clusters). This bidimensional representation completes the interpretation of the genetic distances between populations in terms of local genetic diversity and possible migrations. Our results concerning the spatial distribution of the Amerindian populations show: (i) a great interdependence of the geographic locations and the GM haplotype distributions (the importance of the geographic factor was checked with the usual technique of 'random sampling' and the percentage of explained distance variability decreases from 78% with the observed data to a level less than 67% with the random data); (ii) a parallelism between genetics and linguistics groups as indicated by the population clusters in the similarity map, and (iii) a complex distorted map revealing the presence of multiple population migrations and admixtures in the course of time. A particular distortion of South America suggests possible migrations by sea along the western and eastern coasts of Central America, or multiple migration waves without population admixture across Central America.


Sujet(s)
Génétique des populations , Allotypes Gm des immunoglobulines/génétique , Indien Amérique Centrale/génétique , Indiens d'Amérique Nord/génétique , Indien Amérique Sud/génétique , Inuits/génétique , Analyse de regroupements , Variation génétique , Haplotypes , Humains , Statistiques comme sujet
15.
J Mol Evol ; 42(4): 472-5, 1996 Apr.
Article de Anglais | MEDLINE | ID: mdl-8642617

RÉSUMÉ

One Y-specific DNA polymorphism (p49/ TaqI) was studied in a sample of 97 French Basques and compared with those found in 7 other French, Iberian, and Italian populations. A particularly high frequency (72.2%) of Y-haplotype XV was observed in Basques, compared to values (mean of 41%) obtained in other Western Europeans. Basques were also characterized by virtual absence, or presence at a low level, of the South or Near Eastern haplotypes XII, VII, and VIII. Considered together, these results confirm that Basques are a very ancient European population which has had little previous contact with the Neolithics.


Sujet(s)
Haplotypes , Polymorphisme génétique , /génétique , Chromosome Y/génétique , Évolution moléculaire , France/ethnologie , Humains , Italie/ethnologie , Portugal/ethnologie , Espagne/ethnologie
16.
Hum Biol ; 67(5): 797-803, 1995 Oct.
Article de Anglais | MEDLINE | ID: mdl-8543293

RÉSUMÉ

The frequencies of DF508, the main cystic fibrosis mutation, vary among different populations in Western Europe; they are higher in northwestern Europeans than in southeastern populations. Our new analysis is based on results from 66 different laboratories on 17,886 cystic fibrosis chromosomes (from 70 locations and 26 countries). The correlation between DF508 frequency values and cystic fibrosis incidence is calculated in the corresponding groups.


Sujet(s)
Protéine CFTR/génétique , Mucoviscidose/génétique , Fréquence d'allèle , Mutation , Mucoviscidose/épidémiologie , Europe/épidémiologie , Délétion de gène , Génétique des populations , Humains , Incidence , Analyse de régression
17.
Hum Biol ; 67(4): 562-76, 1995 Aug.
Article de Anglais | MEDLINE | ID: mdl-7649531

RÉSUMÉ

Examination of the European geographic patterns of the 10 relatively most frequent cystic fibrosis mutations, other than the DF508 one, shows that a founder effect is apparent for a number of them. The most evident examples are for the W1282X mutation in Jews, with a probable Asian origin, and the G551D and R117H mutations in Celts. Geographic distributions indicate that the main focus of the 621 + 1 G-->T and DI507 mutations is probably located in Wales. Also, the R1162X mutation probably originates from a circumscribed north Italian region. The N1303K mutation has a wide range in Europe with a clear preponderance in southern countries. Even the relatively common G542X and 1717.1 G-->A mutations have a local preponderance in Spain and Sicily and in northern Italy, respectively. Likelihood estimates for recurrent mutation and identity by descent strongly support the hypothesis of recurrence for the (mainly German) mutation R553X.


Sujet(s)
Mucoviscidose/génétique , Effet fondateur , Europe , Fréquence d'allèle , Humains , Mutation ponctuelle/génétique , Polymorphisme génétique
18.
Ann Hum Biol ; 22(3): 183-98, 1995.
Article de Anglais | MEDLINE | ID: mdl-7574444

RÉSUMÉ

The distribution of surnames in 90 distinct regions in France during two successive periods, 1889-1915 and 1916-1940, is analysed from the civil birth registers of the 36,500 administrative units in France. A new approach, called 'Mobile Site Method' (MSM), is developed to allow representation of a surname distance matrix by a distorted geographical map. A surname distance matrix between the various regions in France is first calculated, then a distorted geographical map called the 'surname similarity map' is built up from the surname distances between regions. To interpret this map we draw (a) successive map contours obtained during the step-by-step distortion process, revealing zones of high surname dissimilarity, and (b) maps in grey levels representing the displacement magnitude, and allowing the segmentation of the geographical and surname maps into 'homogeneous surname zones'. By integrating geography and surname information in the same analysis, and by comparing results obtained for the two successive periods, the MSM approach produces convenient maps showing: (a) 'regionalism' of some peripheral populations such as Pays Basque, Alsace, Corsica and Brittany; (b) the presence of preferential axes of communications (Rhodanian corridor, Garonne valley); (c) barriers such as the Central Massif, Vosges; (d) the weak modifications of the distorted maps associated with the two periods studied suggest an extension (but limited) of the tendency of surname uniformity in France. These results are interpreted, in the nineteenth- and twentieth century context, as the consequences of a slow process of local migrations occurring over a long period of time.


Sujet(s)
Noms , France , Génétique des populations , Humains , Cartes comme sujet
19.
Hum Biol ; 67(2): 231-49, 1995 Apr.
Article de Anglais | MEDLINE | ID: mdl-7537245

RÉSUMÉ

GM haplotype frequencies were examined in 49 Amerindian tribes (from North, Central, and South America) to investigate the congruence of genetic variation with that observed in language and geography. We used two approaches: (1) the mobile site method, which allows a two-dimensional representation of genetic variation where the distances between reference points (i.e., the locations of the populations in the geographic map after displacements) are close to the genetic distances, and (2) a multivariate analysis (factorial correspondence analysis), which permits a visual interpretation of the geographic distribution of GM haplotypes on a map, completed by a cluster analysis. The results show a strong gradient from the Bering Strait to South America. The Eskimo and Na-Dene are genetically different from all other Amerindians, reflecting their more recent migrations. The orientation of most trajectories of the tribes from Central and South America can be interpreted as earlier migrations along the Pacific and Atlantic coasts. We conclude that geographic and linguistic factors played a part in the genetic diversity of Amerindian tribes.


Sujet(s)
Population d'origine amérindienne , Émigration et immigration , Ethnies , Variation génétique , Allotypes Gm des immunoglobulines/génétique , Linguistique , Amériques , Analyse de regroupements , Fréquence d'allèle , Haplotypes , Humains , Analyse multifactorielle
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...