Plan-X: Multiset Discrimination for Internal and External Data Management
Overview
XML Store is a distributed mobile persistence layer with a value-oriented interface for storing and retrieving semi-structured data. In particular, XML documents can be stored as trees. The value-oriented approach provides the ability to perform transparent sharing of data without observable side-effects.
Multiset discrimination is a collection of algorithmic techniques for partitioning a set of structured input values into equivalence classes given an equivalence relation. It provides the ability to identify multiple copies of semantically equivalent data structures and replacing them with a single shared instance. A Java library of multiset discriminators is implemented that provides a flexible way of solving problems using multiset discrimination techniques.
This thesis demonstrates that multiset discrimination is well-suited for use in asynchronous sharing of on-disk data in XML Store. The implemented sharer uses very little space, achieves a high degree of sharing very fast, will eventually discriminate the whole store, and performs implicit garbage collection.
A theoretical analysis of region-based multiset discrimination of directed acyclic graphs is given. This relies on the notion of weak-discrimination, which is essentially discrimination modulo inter-region pointers. It is shown that a store will, worst-case, be discriminated after O(h) weak-discriminations of all combinations of regions, where h is the length of the longest paths of redundant nodes.
Experiments reveal region-based multiset discrimination to be very slow due to the high number of region combinations needed to discriminate a store. When not striving at complete discrimination, the implemented XML Store is seen to be scalable in terms of storing and retrieving XML documents larger than main memory. For practical documents, experiments also show that performing only local discrimination of regions attains a high degree of sharing comparable to the optimal sharing of a completely discriminated store.
Documentation
Software
- xmlstore0.4.12.jar (binary)
- xmlstore0.4.12_src.zip (source)
The implementation uses Java 1.5.0 beta 2 and will not be compilable under previous versions of Java.
People
tsa)