Posts Tagged ‘JGraphT’

Proposal for graph readers in JGraphT

Thursday, July 8th, 2010

One part of my development on Hypergraph which I recently started again after a long break is to replace my own graph implementation with JGraphT. JGrapht is a nice framework to create graph structures in Java and it offers various algorithms to process graphs. For the hyperbolic graph visualisation I do with Hypergraph, this is a very good alternative to my self-written “graph library”. It also includes the possibility to serialise graphs using for example the GraphViz dot format or the GraphML XML format. However, it doesn’t offer any possibility to read these graph formats.

The following is a proposal how to create readers for JGraphT. Please feel free to leave a comment, an additional proposal etc.

The GraphReader interface

The parsing of a file and the creation of the graph should be done by a instances of a common GraphReader interface. This interface could be:

public interface GraphReader<V, E> {
    public void setComponentAttributeProvider(ComponentAttributeProvider<?> attributeProvider);

    public Graph<V, E> read(String filename,
        VertexFactory<V> vertexFactory, EdgeFactory<V, E> edgeFactory);

    public Graph<V, E> read(InputStream inputStream,
        VertexFactory<V> vertexFactory, EdgeFactory<V, E> edgeFactory);

    public Graph<V, E> read(URL url,
        VertexFactory<V> vertexFactory, EdgeFactory<V, E> edgeFactory);

}

The ComponentAttributeProvider can be provided before creating the graph; in this case, attributes like weight, color or other information provided in the file will be stored in the ComponentAttributeProvider. This is not mandatory – if not available, the attributes are simply ignored. For a discussion about storing the attributes, see below.

The read methods return a Graph instance; the actual type of the graph depends on the data in the file. For example the graph format can define a graph as directed/undirected (for example “digraph” in the dot format); the created graph may hence be a DirectedGraph or an UndirectedGraph.

Questions:

  1. Which read methods should be implemented? Are the three methods I mentioned above enough?
  2. Which parameters should the read methods have? Shold we offer read methods without an EdgeFactory, which default to ClassBasedEdgeFactory?
  3. How should exceptions (IOException etc) be handled? This includes parsing errors (wrong format).

Generic Reader

There should be a generic class which hides the complexity of choosing the correct parser. When this generic reader is used, the only information passed to the reader might be a file name or even only an input stream. The reader should derive information about the format by analysing for example the file name, the mime type or the format of the actual data, based on some heuristics.

Some example for heuristics:

  1. if the file name is given and ends with “.xml” or “.dot” => the file is an xml or dot file and the corresponding reader is used
  2. if the mime type is available, use xml reader for the xml mime type
  3. if the first bytes “look like” a dot file (i.e. the file starts according to the dot grammar [strict] (graph | digraph) ), use the dot reader

Example how to use such a generic reader:

    GraphReader<V, E> reader = new GenericReader<V, E>();
    ComponentAttributeProvider attributeProvider = new DefaultComponentAttributeProvider();
    reader.setAttributeProvider(attributeProvider);
    Graph<V, E> graph = reader.read(“some_file.dot”, vertexFactory);

In this case, the read(..) methods would assume that the file has the dot format because of the file name. It would instantiate a corresponding reader (say DOTReader) and delegate the actual parsing to this reader.

Storing attributes on vertices and edges

Most graph formats include attributes for vertices and edges, for example a weight or color information. There must be a way to store these attributes either on the graph elements (i.e. vertices or edges) themselves or by maintaining a map for each element which stores the attribute keys and corresponding values. I prefer the second approach for two reasons:

  1. There exists already a ComponentAttributeProvider which has been introduced to support attributes when dot files are created.
  2. Storing the attributes on the elements would imply that the elements implement some interface, probably an interface which defines something like “Object getAttribute(String key)”. This would contradict the philosophy of JgraphT since the JgraphT code makes no assumption on the vertices and edges at all.

Hence the readers should store any attributes using ComponentAttributeProvider (probably in a slightly enhanced version) and should make no assumptions on the elements. Even then, there are different ways to work with the attributes:

Variant 1:
The vertex and edge classes don’t know about the attributes; to access the attributes, the program has to access the ComponentAttributeProvider.
Variant 2:
The elements are instances of a special class which “knows” about the attribute provider. The vertex/edge class has accessors to for example a label (i.e. public String getLabel()) and the implementation of this method delegates to the attribute provider. In particular, the elements don’t store the actual attribute values.
Variant 3:
The elements store the actual attribute values and offer appropriate accessors. The attribute provider knows the type of the element and when getAttributes() is called, the attribute provider creates a map and initialises it with the attributes read from the vertex.

Both variant 2 and 3 assume that there is a special vertex or edge class; this is possible because the Reader accepts a VertexFactory and EdgeFactory.