org.openscience.cdk.isomorphism
Class UniversalIsomorphismTester

java.lang.Object
  extended by org.openscience.cdk.isomorphism.UniversalIsomorphismTester

public class UniversalIsomorphismTester
extends Object

This class implements a multipurpose structure comparison tool. It allows to find maximal common substructure, find the mapping of a substructure in another structure, and the mapping of two isomorphic structures.

Structure comparison may be associated to bond constraints (mandatory bonds, e.g. scaffolds, reaction cores,...) on each source graph. The constraint flexibility allows a number of interesting queries. The substructure analysis relies on the RGraph generic class (see: RGraph) This class implements the link between the RGraph model and the the CDK model in this way the RGraph remains independent and may be used in other contexts.

This algorithm derives from the algorithm described in [Tonnelier, C. and Jauffret, Ph. and Hanser, Th. and Jauffret, Ph. and Kaufmann, G., Machine Learning of generic reactions: 3. An efficient algorithm for maximal common substructure determination, Tetrahedron Comput. Methodol., 1990, 3:351-358] and modified in the thesis of T. Hanser [Unknown BibTeXML type: HAN93].

With the isSubgraph(IAtomContainer, IAtomContainer) method, the second, and only the second argument may be a IQueryAtomContainer, which allows one to do SMARTS or MQL like queries. The first IAtomContainer must never be an IQueryAtomContainer. An example:

  SmilesParser sp = new SmilesParser(NewDefaultChemObjectBuilder.getInstance());
  IAtomContainer atomContainer = sp.parseSmiles("CC(=O)OC(=O)C"); // acetic acid anhydride
  IAtomContainer SMILESquery = sp.parseSmiles("CC"); // acetic acid anhydride
  IQueryAtomContainer query = IQueryAtomContainerCreator.createBasicQueryContainer(SMILESquery);
  boolean isSubstructure = UniversalIsomorphismTester.isSubgraph(atomContainer, query);
  

WARNING: As a result of the adjacency perception used in this algorithm there is a single limitation: cyclopropane and isobutane are seen as isomorph. This is due to the fact that these two compounds are the only ones where each bond is connected two each other bond (bonds are fully connected) with the same number of bonds and still they have different structures The algorithm could be easily enhanced with a simple atom mapping manager to provide an atom level overlap definition that would reveal this case. We decided not to penalize the whole procedure because of one single exception query. Furthermore isomorphism may be discarded since the number of atoms are not the same (3 != 4) and in most case this will be already screened out by a fingerprint based filtering. It is possible to add a special treatment for this special query. Be reminded that this algorithm matches bonds only.

NoteWhile most isomorphism queries involve a multi-atom query structure there may be cases in which the query atom is a single atom. In such a case a mapping of target bonds to query bonds is not feasible. In such a case, the RMap objects correspond to atom indices rather than bond indices. In general, this will not affect user code and the same sequence of method calls for matching multi-atom query structures will work for single atom query structures as well.

Author:
Stephane Werner from IXELIS mail@ixelis.net
Created on:
2002-07-17
Requires:
java1.4+
Belongs to CDK module:
standard
Source code:
HEAD

Constructor Summary
UniversalIsomorphismTester()
           
 
Method Summary
static RGraph buildRGraph(IAtomContainer g1, IAtomContainer g2)
          Builds the RGraph ( resolution graph ), from two atomContainer (description of the two molecules to compare) This is the interface point between the CDK model and the generic MCSS algorithm based on the RGRaph.
static List<RMap> checkSingleAtomCases(IAtomContainer g1, IAtomContainer g2)
          Checks for single atom cases before doing subgraph/isomorphism search.
static BitSet getBitSet(IAtomContainer ac)
          Transforms an AtomContainer into a BitSet (which's size = number of bond in the atomContainer, all the bit are set to true).
static List<RMap> getIsomorphAtomsMap(IAtomContainer g1, IAtomContainer g2)
          Returns the first isomorph 'atom mapping' found for g2 in g1.
static List<RMap> getIsomorphMap(IAtomContainer g1, IAtomContainer g2)
          Returns the first isomorph mapping found or null.
static List<List<RMap>> getIsomorphMaps(IAtomContainer g1, IAtomContainer g2)
          Returns all the isomorph 'mappings' found between two atom containers.
static List<IAtomContainer> getOverlaps(IAtomContainer g1, IAtomContainer g2)
          Returns all the maximal common substructure between twp atom containers.
static List<RMap> getSubgraphAtomsMap(IAtomContainer g1, IAtomContainer g2)
          Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure of g1.
static List<List<RMap>> getSubgraphAtomsMaps(IAtomContainer g1, IAtomContainer g2)
          Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure of g1.
static List<RMap> getSubgraphMap(IAtomContainer g1, IAtomContainer g2)
          Returns the first subgraph 'bond mapping' found for g2 in g1.
static List<List<RMap>> getSubgraphMaps(IAtomContainer g1, IAtomContainer g2)
          Returns all the subgraph 'bond mappings' found for g2 in g1.
static boolean isIsomorph(IAtomContainer g1, IAtomContainer g2)
          Tests if g1 and g2 are isomorph.
static boolean isSubgraph(IAtomContainer g1, IAtomContainer g2)
          Tests if g2 a subgraph of g1.
static List<RMap> makeAtomsMapOfBondsMap(List<RMap> l, IAtomContainer g1, IAtomContainer g2)
          This makes a map of matching atoms out of a map of matching bonds as produced by the get(Subgraph|Ismorphism)Map methods.
static List<List<RMap>> makeAtomsMapsOfBondsMaps(List<List<RMap>> l, IAtomContainer g1, IAtomContainer g2)
          This makes maps of matching atoms out of a maps of matching bonds as produced by the get(Subgraph|Ismorphism)Maps methods.
static IAtomContainer project(List<RMap> rMapList, IAtomContainer g, int id)
          Projects a list of RMap on a molecule.
static List<IAtomContainer> projectList(List<List<RMap>> rMapsList, IAtomContainer g, int id)
          Projects a list of RMapsList on a molecule.
static List<List<RMap>> search(IAtomContainer g1, IAtomContainer g2, BitSet c1, BitSet c2, boolean findAllStructure, boolean findAllMap)
          General RGraph parsing method (usually not used directly) This method is the entry point for the recursive search adapted to the atom container input.
static void setTimeout(long timeout)
          Sets the time in milliseconds until the substructure search will be breaked.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UniversalIsomorphismTester

public UniversalIsomorphismTester()
Method Detail

isIsomorph

public static boolean isIsomorph(IAtomContainer g1,
                                 IAtomContainer g2)
                          throws CDKException
Tests if g1 and g2 are isomorph.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
true if the 2 molecule are isomorph
Throws:
CDKException - if the first molecule is an instance of IQueryAtomContainer

getIsomorphMap

public static List<RMap> getIsomorphMap(IAtomContainer g1,
                                        IAtomContainer g2)
                                 throws CDKException
Returns the first isomorph mapping found or null.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the first isomorph mapping found projected of g1. This is a List of RMap objects containing Ids of matching bonds.
Throws:
CDKException

getIsomorphAtomsMap

public static List<RMap> getIsomorphAtomsMap(IAtomContainer g1,
                                             IAtomContainer g2)
                                      throws CDKException
Returns the first isomorph 'atom mapping' found for g2 in g1.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the first isomorph atom mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
Throws:
CDKException - if the first molecules is not an instance of IQueryAtomContainer

getIsomorphMaps

public static List<List<RMap>> getIsomorphMaps(IAtomContainer g1,
                                               IAtomContainer g2)
                                        throws CDKException
Returns all the isomorph 'mappings' found between two atom containers.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the list of all the 'mappings'
Throws:
CDKException

getSubgraphMaps

public static List<List<RMap>> getSubgraphMaps(IAtomContainer g1,
                                               IAtomContainer g2)
                                        throws CDKException
Returns all the subgraph 'bond mappings' found for g2 in g1. This is an List of Lists of RMap objects. Note that if the query molecule is a single atom, then bond mappings cannot be defined. In such a case, the RMap object refers directly to atom - atom mappings. Thus RMap.id1 is the index of the target atom and RMap.id2 is the index of the matching query atom (in this case, it will always be 0). Note that in such a case, there is no need to call makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer), though if it is called, then the return value is simply the same as the return value of this method.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the list of all the 'mappings' found projected of g1
Throws:
CDKException
See Also:
makeAtomsMapsOfBondsMaps(List, IAtomContainer, IAtomContainer)

getSubgraphMap

public static List<RMap> getSubgraphMap(IAtomContainer g1,
                                        IAtomContainer g2)
                                 throws CDKException
Returns the first subgraph 'bond mapping' found for g2 in g1.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the first subgraph bond mapping found projected on g1. This is a List of RMap objects containing Ids of matching bonds.
Throws:
CDKException

getSubgraphAtomsMaps

public static List<List<RMap>> getSubgraphAtomsMaps(IAtomContainer g1,
                                                    IAtomContainer g2)
                                             throws CDKException
Returns all subgraph 'atom mappings' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned. This is an List of Lists of RMap objects.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - substructure to be mapped. May be an IQueryAtomContainer.
Returns:
all subgraph atom mappings found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
Throws:
CDKException

getSubgraphAtomsMap

public static List<RMap> getSubgraphAtomsMap(IAtomContainer g1,
                                             IAtomContainer g2)
                                      throws CDKException
Returns the first subgraph 'atom mapping' found for g2 in g1, where g2 must be a substructure of g1. If it is not a substructure, null will be returned.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - substructure to be mapped. May be an IQueryAtomContainer.
Returns:
the first subgraph atom mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.
Throws:
CDKException

isSubgraph

public static boolean isSubgraph(IAtomContainer g1,
                                 IAtomContainer g2)
                          throws CDKException
Tests if g2 a subgraph of g1.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
true if g2 a subgraph on g1
Throws:
CDKException

getOverlaps

public static List<IAtomContainer> getOverlaps(IAtomContainer g1,
                                               IAtomContainer g2)
                                        throws CDKException
Returns all the maximal common substructure between twp atom containers.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
the list of all the maximal common substructure found projected of g1 (list of AtomContainer )
Throws:
CDKException

getBitSet

public static BitSet getBitSet(IAtomContainer ac)
Transforms an AtomContainer into a BitSet (which's size = number of bond in the atomContainer, all the bit are set to true).

Parameters:
ac - IAtomContainer to transform
Returns:
The bitSet

buildRGraph

public static RGraph buildRGraph(IAtomContainer g1,
                                 IAtomContainer g2)
                          throws CDKException
Builds the RGraph ( resolution graph ), from two atomContainer (description of the two molecules to compare) This is the interface point between the CDK model and the generic MCSS algorithm based on the RGRaph.

Parameters:
g1 - Description of the first molecule
g2 - Description of the second molecule
Returns:
the rGraph
Throws:
CDKException

search

public static List<List<RMap>> search(IAtomContainer g1,
                                      IAtomContainer g2,
                                      BitSet c1,
                                      BitSet c2,
                                      boolean findAllStructure,
                                      boolean findAllMap)
                               throws CDKException
General RGraph parsing method (usually not used directly) This method is the entry point for the recursive search adapted to the atom container input.

Parameters:
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
c1 - initial condition ( bonds from g1 that must be contains in the solution )
c2 - initial condition ( bonds from g2 that must be contains in the solution )
findAllStructure - if false stop at the first structure found
findAllMap - if true search all the 'mappings' for one same structure
Returns:
a List of Lists of RMap objects that represent the search solutions
Throws:
CDKException

project

public static IAtomContainer project(List<RMap> rMapList,
                                     IAtomContainer g,
                                     int id)
Projects a list of RMap on a molecule.

Parameters:
rMapList - the list to project
g - the molecule on which project
id - the id in the RMap of the molecule g
Returns:
an AtomContainer

projectList

public static List<IAtomContainer> projectList(List<List<RMap>> rMapsList,
                                               IAtomContainer g,
                                               int id)
Projects a list of RMapsList on a molecule.

Parameters:
rMapsList - list of RMapsList to project
g - the molecule on which project
id - the id in the RMap of the molecule g
Returns:
a list of AtomContainer

checkSingleAtomCases

public static List<RMap> checkSingleAtomCases(IAtomContainer g1,
                                              IAtomContainer g2)
                                       throws CDKException
Checks for single atom cases before doing subgraph/isomorphism search.

Parameters:
g1 - AtomContainer to match on. Must not be an IQueryAtomContainer.
g2 - AtomContainer as query. May be an IQueryAtomContainer.
Returns:
List of List of RMap objects for the Atoms (not Bonds!), null if no single atom case
Throws:
CDKException - if the first molecule is an instance of IQueryAtomContainer

makeAtomsMapsOfBondsMaps

public static List<List<RMap>> makeAtomsMapsOfBondsMaps(List<List<RMap>> l,
                                                        IAtomContainer g1,
                                                        IAtomContainer g2)
This makes maps of matching atoms out of a maps of matching bonds as produced by the get(Subgraph|Ismorphism)Maps methods.

Parameters:
l - The list produced by the getMap method.
g1 - The first atom container. Must not be a IQueryAtomContainer.
g2 - The second one (first and second as in getMap). May be an IQueryAtomContainer.
Returns:
A List of Lists of RMap objects of matching Atoms.

makeAtomsMapOfBondsMap

public static List<RMap> makeAtomsMapOfBondsMap(List<RMap> l,
                                                IAtomContainer g1,
                                                IAtomContainer g2)
This makes a map of matching atoms out of a map of matching bonds as produced by the get(Subgraph|Ismorphism)Map methods.

Parameters:
l - The list produced by the getMap method.
g1 - first molecule. Must not be an IQueryAtomContainer.
g2 - second molecule. May be an IQueryAtomContainer.
Returns:
The mapping found projected on g1. This is a List of RMap objects containing Ids of matching atoms.

setTimeout

public static void setTimeout(long timeout)
Sets the time in milliseconds until the substructure search will be breaked.

Parameters:
timeout - Time in milliseconds. -1 to ignore the timeout.