The Chemical Component Dictionary is an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules.
BioJava utilizes the Chem. Comp. Dictionary to achieve a chemically correct representation of each group. To make it clear how this can work, let's take a look at how Selenomethionine and water is dealt with:
Structure structure = StructureIO.getStructure("1A62");
for (Chain chain : structure.getChains()){
for (Group group : chain.getAtomGroups()){
if ( group.getPDBName().equals("MSE") || group.getPDBName().equals("HOH")){
System.out.println(group.getPDBName() + " is a group of type " + group.getType());
}
}
}This will give this output:
MSE is a group of type amino MSE is a group of type amino MSE is a group of type amino HOH is a group of type hetatm HOH is a group of type hetatm HOH is a group of type hetatm ...
As you can see, although MSE is flaged as HETATM in the PDB file, BioJava still represents it correctly as an amino acid. They key is that the definition file for MSE flags it as "L-PEPTIDE LINKING", which is being used by BioJava.
By default BioJava ships with a minimal representation of standard amino acids, which is useful when you just want to work with atoms and a basic data representation. However if you want to work with a correct representation (e.g. distinguish ligands from the polypeptide chain, correctly resolve chemically modified residues), it is good to tell the library to either
- fetch missing Chemical Component definitions on the fly (small download and parsing delays every time a new chemical compound is found), or
- Load all definitions at startup (slow startup, but then no further delays later on, requires more memory)
You can enable the first behaviour by doing using the FileParsingParameters class:
AtomCache cache = new AtomCache();
// by default all files are stored at a temporary location.
// you can set this either via at startup with -DPDB_DIR=/path/to/files/
// or hard code it this way:
cache.setPath("/tmp/");
FileParsingParameters params = new FileParsingParameters();
params.setLoadChemCompInfo(true);
cache.setFileParsingParams(params);
StructureIO.setAtomCache(cache);
Structure structure = StructureIO.getStructure(...);If you want to enable the second behaviour (slow loading of all chem comps at startup, but no further small delays later on) you can use the same code but change the behaviour by switching the ChemCompProvider implementation in the ChemCompGroupFactory
ChemCompGroupFactory.setChemCompProvider(new AllChemCompProvider());Navigation: Home | Book 3: The Protein Structure modules | Chapter 5 : Chemical Component Dictionary
