Search
Close this search box.
Aucune catégorie trouvée pour cet article.

Polys Glycan Builder:  An Online Application for Intuitive Construction of 3D Structures of Complex Carbohydrates

Serge Perez

University of Grenoble Alpes, CNRS, Centre de Recherches sur les Macromolecules Vegetales, Grenoble, France spsergeperez@gmail.com; glycopedia.eu

Abstract

Polys-Glycan-Builder is a web-based, user-friendly interface for building 3D structures of complex glycans and polysaccharides. The construction follows an intuitive scheme as close as possible to how glycoscientists draw their structures’ sequences. The software translates a primary carbohydrate structure into the coordinate set of the corresponding tertiary structure in one or several low-energy conformations. The model building involves dragging and dropping monosaccharide units to the canvas or workspace grids. The construction relies on a comprehensive database of more than 150 monosaccharides and the low-energy conformations of 400 disaccharide segments, ensuring you have all the necessary resources. The application is suitable for building three-dimensional structures for algal, bacterial, glycosaminoglycans, plant polysaccharides and N- O-linked glycans. As a complement to the 3D structures, several options are implemented to meet the requirement of glyco-informatics, such as the GlycoCT codes and the SNFG convention and to generate formats compatible with Virtual Reality immersions.

IIntroduction

Biological molecules express their functions through their three-dimensional structures. Their function derives from their structure, particularly their biomechanical and biophysical properties that influence interactions with partner molecules. For this reason, structural biology emphasizes the three-dimensional structure as a central element in characterizing the function of biological macromolecules. Understanding and visualising these structures is crucial for advancing our knowledge in the field.

The process of conducting structural biology involves several distinct yet closely related steps. The first task involves collecting structural information using experimental and theoretical techniques. The second task revolves around modelling structures derived from the collected data and molecular simulation, providing a dynamic dimension essential for studying phenomena. The third task focuses on analysing and interpreting the results, allowing for the formulation of hypotheses about the events studied and the refinement of the modelling steps.

These steps involve different tools and methods. Experimental and theoretical investigations to study the three-dimensional structure of biomacromolecules require three main approaches. The experimental techniques, such as NMR, X-ray crystallography, and Electron microscopy, seek to obtain the three-dimensional structure in vitro. The modelling/simulation approaches generate computer models of three-dimensional structures based on physicochemical and/or statistical information used as parameters for calculation software. Finally, molecular visualisation programs allow to observe and analyse the structures obtained from the two previous approaches thanks to accurate graphic applications to extract, illustrate or communicate new scientific knowledge.

Structural glycobiology follows the same paradigm, except that gaining detailed information from X-ray diffraction is much more limited than in the case of proteins and nucleic acids [1]. Nevertheless, the accumulation of fragmented, experimental evidence gathered from the crystallographic resolution of small carbohydrate molecular crystals and complex protein-carbohydrate structures provided robust knowledge that extended to characterizing solutions by NMR. In parallel with these advancements, the utilization of accumulated data has led to the expansion of computational methods, such as molecular mechanics and dynamics, to achieve a high level of predictability.

The accumulation of this information into structural databases sets up the route to molecular glyco-bioinformatics. Unique opportunities exist to characterise and predict the three-dimensional features a given oligosaccharide molecule can take in different environments, i.e., in vacuo, crystalline state, interacting with various proteins with distinct biological functions.  Many carbohydrate sequences have been determined through extensive work in chemical and biochemical fragmentations, followed by mass spectroscopy and nuclear magnetic resonance analysis. The primary impetus behind the growth of glyco-informatics has been the construction of large-scale repositories to store, organise, and disseminate the data rapidly generated through experiments and theoretical calculations about glycan sequence and structure. Data visualisation remains a challenge in glycoscience. Some initiatives have pushed the development of visual tools to improve some aspects of glycan identification and quantification. The visualisation tools allow data representation at the sequence level, i.e., the nature of the constituent monosaccharides and their glycosidic linkages. Going from such a level of representation to constructing the three-dimensional structure is a task that only a small number of groups involved in the molecular simulation of complex carbohydrates perform. Computational tools have been developed to build preliminary 3D structures starting from a sequence. However, the limited number of three-dimensional builders dedicated to carbohydrate structures are produced with a specific application, force-field. This is the case for the carbohydrate builder for GLYCAM [2], GROMACS [3] and CHARMM [4][5]. Three other carbohydrate builders offer the possibility to construct three-dimensional structures from the sequence: SWEET II [6], Carb Builder [7] and POLYS [8][9].

POLYS can translate a primary carbohydrate structure into the coordinate set of the corresponding tertiary structure in one or several low-energy conformations. This is achieved by aiming for a structural energy minimum defined by an external parameter set or by utilizing some features embedded in the program to optimize secondary structures as complex as multiple helices. Despite these functionalities, the use of the POLYS software has been limited. Its syntax was generally considered too complex to set up and manipulate. Consequently, we developed a user-oriented graphical interface to construct complex carbohydrate-containing three-dimensional structures following a scheme that is as close as possible to the way glyco-scientists draw their structures. Several options have been added to cope with the contemporaneous requirement of glyco-informatics, such as the GlycoCT codes [10] and the SNFG conventions [11], and generate formats compatible with Virtual Reality immersions.

II. Founding Principles & Preamble

Figure 1. Schematic representation of the process that translates a primary carbohydrate structure into the coordinate set of the corresponding tertiary structure in one or several of its low-energy conformations.

The theory underlying the application of molecular modelling to the field of glycosciences postulates that the individual building blocks, the monosaccharides, are themselves relatively stable, rigid molecules to such a degree that the glycosidic linkage geometry largely determines the overall structure of a complex carbohydrate. Consequently, knowledge of the sequence of residues and a list of the corresponding glycosidic torsion angles can generate a 3D model of the polysaccharide or the glycan structure. These torsion angles can either be systematic in forming ordered structural elements or less ordered, as seen for many unstructured polysaccharide chains in dilute solution.

The builder, POLYS, is the central engine that draws information from several sources to generate a three-dimensional model. The primary data source is a vast database that includes optimised molecular structures for most monosaccharides (over 150 at the time). These structures contain Cartesian coordinates for each atom, atomic symbols, atom labelling and numbering based on accepted carbohydrate nomenclature, and the corresponding GlycoCT code [10]. The structural details in the SNFG representation of glycans enable the characterisation, construction, and manipulation of three-dimensional structures. The SNFG (Symbol Notation for Glycans), a widely accepted graphical representation, stems from an international agreement [11]. This expansion and application of glycans’ graphical representation mark a significant achievement, providing a unified method to depict the diversity of glycan and polysaccharide structures visually.

The description of a monosaccharide obeys the following rules :<anomeric prefix><prefix for absolute configuration><the monosaccharide code><suffix for ring configuration>_[<O-ester and O-ether substitutions>].

SNFG cartoons were generated to include O-esters and ethers, which are attached to the Symbol with a number (e.g. 3S for 3-O-sulfate groups, …), and the nature of the absolute (D or L) and anomeric configurations (α or β). All pyranoses in the D configuration are assumed to have 4C1 chair conformation, whereas those in the L configuration are assumed to have 1C4 chair conformation. As regards to the several conformations that may occur for idopyranoses, their descriptors (2S0, 1C4, 4C1) were included within the monosaccharide symbol. _<specific conformation> (e.g. α LIdopA_2S_2S0). Figure 2 represents some symbols used.

Figure 2. Selected examples showing the relationships between the symbol representation used for monosaccharides by Polys-Glycan-Builder

The five families were selected to cover a significant part of these complex carbohydrate structures: Algae, Bacterial, Glycosaminoglycans, N- and O-linked Glycans and Plant polysaccharides. Apart from the glycan present in bacteria, the occurrence of the main monosaccharide components of the other four families is well established. (Figure 3).

For each disaccharide, an exhaustive search was conducted using the MM3 molecular mechanics force field [12]. These procedures provide a comprehensive sampling of the conformational space, creating a relaxed adiabatic energy map based on PHI and PSI torsion angles. For 1-6 linkages, relaxed adiabatic maps have been established for the three low-energy orientations of the OME torsion angle. Exploring each energy map reveals the presence of two to four energy minima.

As with monosaccharides, we considered the occurrence of the disaccharide segments derived from a thorough examination of the sequences reported for the five families mentioned above. Such a set provides a significant database for building complex carbohydrate glycans in 3D without being exhaustive. Information is stored in the database for each disaccharide, including a pictorial representation of the potential energy surface that displays conformational space. Four colours characterize the regions based on the magnitude of the computed energy, ranging from Blue for the lowest to Red for the forbidden, through Green and Yellow, as shown in (Figure 4).

The second database contains the 3D structural information of about more than 300 entries of disaccharides. The magnitude of torsion angles at the glycosidic linkage controls the orientation of two successive monosaccharides. In crystallography, <232> ALT232 it is defined as O5-C1-O1-Cx and Y as C1-O1-Cx-Cx+1. With x is the number of the carbon atom of the second monosaccharide with which the 1-x glycosidic bond is formed. An alternate definition used in NMR spectroscopy refers to the hydrogen atoms with FH = H1-C1- O1-Cx and YH=C1-O1-Cx-Hx. In the case of 1-6 linkage, a third torsion angle OME completes the description of the conformation.

Figure 3. Summary of the monosaccharides available to build three-dimensional structures of Bacterial, Plant, N-O glycans and Glycosaminoglycans.
Figure 4.  Selected examples of the potential energy surfaces that pop up to help the user select the value of the torsional angles to use in the construction process.

III. Access to the Application

Polys-Glycan-Builder is part of the Glyco3D portal [13] under the following address: http://glyco3d.cermav.cnrs.fr/builder.php

IV. Before Starting

A glycan’s complete sequence, or the polysaccharide’s repeating unit, must be known before construction.  The user must know the nature of the constituting monosaccharides and the glycosidic linkages along with the sequence, as well as the name of the monosaccharides, their configurations (D or L), and their shapes (furanose or pyranose). Ex: b-d-GlcpA; b-d-Glcp; a-l-Fucp

V. Choosing an Area of Application

The application provides a way to build 3D structures of most polysaccharides and glycans.

To access the application, click on the image.  POLYS GLYCAN BUILDER. (Figure 5).

Figure 5. Upon accessing the web-based application, the user can select to use the general menu (click on the image POLYS GLYCAN BUILDER). Otherwise, click on the sub-modules.

The application offers access to sub-modules for natural handling families of carbohydrate-containing (macro) molecules according to their occurrence (Algae, Bacteria, GAG, N-O linked glycan, Plant).  Click on the box according to your choice.

VIThe Home Page & Instructions The following numbers refer to those displayed in Figure 6.

Figure 6.  POLYS-GLYCAN-BUILDER displaying the steps -indicated with numbers) to follow to achieve three-dimensional construction.

1. QUICK HELP: Summarise the order of several steps to follow. Clicking on « More » gives access to a detailed description of the principles underlying the basis of the construction. It describes the nomenclature to describe the monosaccharides, the disaccharides, and their 3-dimensional structures.  

2. HELP: Give access to guide

3. SELECT A FAMILY. The following families and classes cover most of the occurrences of carbohydrate-containing molecules: All, Algae, Bacteria, Conjugate, GAG, Miscellaneous, N-, O linked, Plants, Virus. Selecting a family will call the set of monosaccharides belonging to the chosen family; they will appear on the window on the right-side panel.

4. THE MONOSACCHARIDES are displayed using the extended SNFG representation, ready to be selected as a function of the sequence. These monosaccharides can be manipulated via the mouse in a pick-and-drop fashion and placed in the panel shown in 5

5. THIS PANEL allows storing all the monosaccharides required for three-dimensional construction. The monosaccharides can belong to several families.

Figure 7. The first steps. Selecting a family will open a window containing the monosaccharides available for that family

6. A SWITCH provides a way to change from the Symbol to the Chemical representation.  Delete will clear off the content of the input on the grid.

Show/Hide GRID. The interactive construction of the glycan takes place on the grid in the centre of the screen (Figure 7). Provision is given to hide the grid.

7. GRID. The graphical construction occurs on the central (12 x 8) grid.  A simple game of mouse select and drag operations controls the step-by-step construction.

The POLYS formalism requires the definition of the main backbone, which is recommended to be the longest possible, making the building description as simple as possible. For polysaccharide structures, the backbone construction starts from the non-reducing residue (at the left) of the glycan to the reducing residue (at the right). Branches are handled by attaching their reducing end to a specified atom belonging to a monosaccharide in the backbone. In this way, the construction of the branches follows the same manner as the backbone. As such, the syntax is highly systematic, and the input files are easy to decipher (Caution: Placing the side chains above or below the main backbone generates two different structures).

A glycosidic linkage dictates the formation of a bond between two neighbouring monosaccharides on the grid. A small menu appears on the screen inviting the user to define the nature of the glycosidic linkage (1-1, 1-2, 1-3, 1-4, 1- 6, 2-1, ….. 2-6, 2-8) and specify the value of the glycosidic linkages: PHI, PSI and OME. The magnitude of these glycosidic torsion angles dictates the relative orientation of two contiguous monosaccharides in a disaccharide. In the so-called “Heavy Atom Definition” commonly used in crystallography, PHI (F) is the torsion angle O5-C1-O-Cx, and PSI (Y) is the torsion angle C1-O1-Cx-Cx+1, where x is the number of the carbon atom of the second monosaccharide involved in the formation of 1-x glycosidic bond.

Figure 8. A soon as a disaccharide segment is created, the conformation at the glycosidic linkage is selected from the window  « Choose the rotation angle ».

Once a disaccharide segment has been defined (Figure 8), a small « Choose the rotation angle » window pops up. The disaccharide sequence and the corresponding Potential Energy Surface show the low-energy regions on this window. The values of PHI and PSI are given within a range from -180° to 180° to proceed with the construction.  If these values are known, for example, from experiments, they have to be input in the corresponding boxes (Figure 9a). When such values are unknown, clicking « choose data » will open a small menu showing the list of colour-coded glycosidic torsion angles indicating the location in the low energy zones. The user can use this information to complete the input file generation (Figure 9b).

Figure 9. Three examples of how to select the values of the glycosidic torsion angles. a)The user knows which value to input b) The values are chosen from the invite « Available data. Choose data ». c) Same as in b) for 1-6 (or 2-6) linkages.

For two monosaccharides linked by a 1à 6 linkage, another parameter (OME) is required.  It describes the orientation of the exo-cyclic bond C5-C6. Two torsion angles O5-C5-C6-O6 and C4-C5-C6-O6, customarily define the conformation gauche-trans (OME=60°), gauche-gauche (OME=-60°) and trans-gauche conformations (OME=180°). In this case, the list of colour-coded torsion angles appears with PSI and PSI followed by a set of three values « -60°/60°/180° », the three low energy conformations of the OME angle. Selecting one of these values will complete the data set required to build the disaccharide segments. Without experimental data, the choice of OME is not straightforward. As a rule of thumb (-60° or 60°) are the most likely values to consider for a gluco-type of configuration at C-4. For a galacto type of configuration, the values (180° or 60°) could be the first choices. (Figure 9c).

 8. Generating repeating motifs. After completing the construction of the repeating unit of a polysaccharide. This option is a practical way to construct the complete polysaccharide structures having n repeating units. 

The Repetition. Clicking on the box activates this option. (Caution: do not forget to unclick if other constructions are needed). Indicate the number of repetition constructs.

As in 7, the glycosidic linkage’s nature must be defined. The values of the glycosidic torsion angles PHI (F), PSI (Y), OME, have to be input. The Potential Energy Surface of the corresponding disaccharide pops up on the grid.  It provides access to the data needed to set up the orientation of the glycosidic linkage).

Figure 10. Using the option « Generating repeating motifs ».

9. SYNTAX. The input (INP), as generated by the previous series of manipulations, translates the information with the proper syntax, which creates the input files for the builder engine (POLYS). These are the so-called INP format. Using a simple text editor provides a way to edit the input file. Most users might not need to intervene at this stage. Otherwise, the users may consult the POLYS article [9].

10. BUILD and Visualize The build is the POLYS engine command.  It generates a 3D model from the indications of the INP file, which is constructed from the series of manipulations described above.

Visualise. The 3D structure, defined from the INPUT sequence, is generated and displayed in a separate window using the LITEMOL application [14].

The visualisation provides a step-by-step way to assess the correctness of the construction.

Download PDB downloads the atomic coordinates of the 3D model (PDB format) to the user’s computer.

The construction can proceed stepwise, possibly making corrections at any time. An option is also given to display the constructed sequence either with the pictorial representation or with the abbreviated letter codes of the monosaccharides.

11. OPTIMISE. Upon inspection of the 3D structure, some steric conflicts might appear. The “OPTIMIZE” option might be invoked to remove minor conflicts, by simple molecular mechanics field. The user should scrutinise the resulting structure!

12. SAVE INPUT. INP is the file name created by the graphical interface as an input to the POLYS engine.  Once the generation is considered satisfactory, several files (INP, IUPAC) are available for further use, particularly as input files to other constructions. See 3. 

Download INP: 

Download IUPAC:

13. GLYCOCT. The generation of the 3D structure generates a GlycoCT code [10]. The code describes the residue entities (RES) and the linkages at the glycosidic bonds (LIN). (This option may not work for repeat units and complex architectures.) A GlycoCT code generates a graphical representation of the constructed structure following the SNFG representation and transferred to the user’s computer.

VII.  Format of the Coordinate File

The successful construction of the three-dimensional structure generates five different files, which can be used for further uses. They are downloaded to the Downloads directory.

The PDB file contains the atomic coordinates in the PDB format. The constituting monosaccharides are given a three-letter code, to which the SNFG colour coding can be automatically associated with a depiction by the Sweet Unity Mol program [15].

The GlycoCT file contains the encoded description of the structure’s sequence using residue naming (RES) and linkages (LIN).

The SNFG file is a graphical representation of the molecule in SVG vector image format.

The IUPAC file is a text file describing the molecule according to the simplified IUPAC Nomenclature.

The INP file is a text file describing the molecule according to the POLYS input format.

VIII.  Examples

Figure 11 summarises the different steps to build the glycan, Disialyl Core 2 with SLex on Core 2, and describes the several output files.

Figure 11. Building: Disialyl Core 2 with SLex on Core 2. Neup5Ac a2-3 Galp b1-3 (Neup5Ac a2-3 Galp b1-4 (Fucp a1-3) GlcpNAc b1-6) GalpNAc.

Figure 12 summarises the different steps to construct the repeating and five consecutive units of the exopolysaccharide, Erwinia chrysanthemi [16] and describes the several output files. 

Figure 12.  Building the repeat unit and five consecutive repeats of Erwinia chrysanthemi.
 

IX. Software

The three-dimensional structures of the constituting monosaccharides have been constructed and optimised using the molecular mechanics MM3 force-field [17]. The potential energy surfaces of all the disaccharides have been computed following the protocol described by Frank et al. [12] using the Conformational Analysis Tools (CAT) software (www.md-simulations.de/CAT/) for data processing and analysis.

The website is built with PHP and JavaScript languages and uses a MySQL database. According to the MVC approach, the application is divided into three parts: Model, View and Controller. The Model part contains the main interface, the temporary data and all the PHP scripts used with AJAX (Asynchronous JavaScript and XML) requests. The scripts, the CSS, and all the graphical resources are in the View section. The Controller part is used when the user connects to the database and when he reloads the application. Different folders like Algae, Bacteria, Gag, N-O linked, and Plant, which are sub-modules of the main application, are present. They contain the same structure, but they are connected to different databases.

Two external APIs are used. The first allows transforming an IUPAC code into a GlycoCT code (https://glyconnect.expasy.org/api/structures/translate/iupac/glycoct). The second one converts a GlycoCT code into an XML code, which generates an SVG image (https://glycoproteome.expasy.org/glycoUtils/utils/image/generate). These two APIs are used with an AJAX request. An AJAX request is made at almost every user interaction to translate from one code (INP, GlycoCT, IUPAC, SVG) to another.

X.  Acknowledgements

Continuous support from Alain Rivet is greatly appreciated. This work benefited from the Cross-Disciplinary Program Glyco@Alps within the framework of the “Investissement d’Avenir” program [ANR-15IDEX-02].

XI.  References

[1] Perez, S., De Sanctis, D., (2017). Glycosciences@Synchrotron: Synchrotron radiation applied to structural glycoscience, Beilstein J. Org. Chem, 13, 1145-1167, 

[2] Woods Group, 2005-2020, Complex Carbohydrate Research Centre, University of Georgia, Athens, GA. (http://glycam.org/).

[3] Danne, R., Poojari, C., Martinez-Seara, H., Rissanen, S., Lolicato, F., Rog T., Vattulainen, I. (2017)  doGlycans–Tools for Preparing Carbohydrate Structures for Atomistic Simulations of Glycoproteins, Glycolipids, and Carbohydrate Polymers for GROMACS,  J Chem Inf Model, 57, 2401-2406.

[4] Jo, S., Im, W.,  (2013) Glycan fragment database: a database of PDB-based glycan 3D structures. Nucleic Acids Res., D470-474.

[5] Jo, S., Cheng, X., Lee, J.,  Kim, S.,   Park, S-J., S. Patel, D.S.,  Beaven, A.H., Lee, K-I.,  Rui, H., Roux, B., MacKerell, Jr. A.D.,  Klauda, J.B., Qi, Y.,  Im, W. (2017) CHARMM-GUI 10 Years for Biomolecular Modeling and Simulation, J. Compt. Chem., 38, 1114-1124.

[6] Bohne, A., Lang, E., and von der Lieth, C.W. (1999) SWEET – WWW-based rapid 3D construction of oligo- and polysaccharides. Bioinformatics, 15, 767–768.

[7] Kuttel, M.M., Stahle, J.,Widmalm, G . (2016), CarbBuilder: Software for building molecular models of complex oligo- and polysaccharide structures.  J. Comput. Chem., 37, 2098-2105.

[8] Engelsen SB, Cros S, Mackie W, Perez S (1996) A molecular builder for carbohydrates: application to polysaccharides and complex carbohydrates. Biopolymers 39, 417–433.

[9]Engelsen, S.B., Hansen, P., S. Perez (2014) POLYS 2.0: An open source software package for building three-dimensional structures of polysaccharides, Biopolymers, 101, 733-745.

[10] Herget, S., Ranzinger, R., Maass, K., & von der Lieth, C. (2008). Glycoct – a unifying sequence format for carbohydrates. Carbohydrate Res., 343, 2162-2171.

[11] Varki, A., Cummings, R.D., Aebi, M., Packer, N.H.,  Seeberger, P.H.,  Esko, J.D., Stanley, P., Hart, G., Darvill, A., Kinoshita, T.,  Prestegard, J.J., Schnaar, R.L., Freeze, H.H.,  Marth, J.D., Bertozzi, C.R., Etzler,  M.E., Frank, M.,  Vliegenthart, F.G., Lütteke, T.,  Perez, S.,  Bolton, E.,  Rudd, P., Paulson, J.,  Kanehisa, M., Toukach,  P., Aoki-Kinoshita, K.F., Dell, A.,  Narimatsu, H., York, W.,  Taniguchi, N. & Kornfeld, S. (2015) Symbol Nomenclature for Graphical Representations of Glycans, Glycobiology, 25, 1323-1324.

[12] Frank, M., Lutteke, T., von der Lieth, C.-W., (2007) GlycoMapsDB: a database of the accessible conformational space of glycosidic linkages, Nucleic Acid Res., 35, 287-290.

[13] Perez, S., Sarkar, A., Rivet, A, Breton, C. Imberty, A. (2015) GLYCO3D: A Portal for Structural Glycosciences Methods in Molecular Biology 1273, 241–258.

[14] Sehnal, D., Deshpande, M., Svobodova Varekova, R., Mir, S., Berka, K., Midlik ; A., Pravda, L., Velankar, S., Koca, J. (2017), LiteMol suite: interactive web-based visualisation of large-scale macromolecular structure data, Nature Methods, 14, 1121–1122.

[15] Perez, S.; Tubiana, T.; Imberty, A; Baaden, M. (2015) Three-dimensional representations of complex carbohydrates and polysaccharides—SweetUnityMol: A video game-based computer graphic software. Glycobiology, 25, 483–491.

[16] Gray, J.S.S., Brand, J., Koerner, A.W., Montgomery, R. (1993), structure of an extracellular polysaccharide produced by Erwinia chrysanthemi, Carbohydr. Res., 245, 271-287

[17] Allinger, N.L., Li, F., Yan, L. & Tai, J.C. (1990)  Molecular mechanics (MM3) calculations on conjugated hydrocarbons, J. Comput. Chem., 11 868-89