Plan-X

Project Proposals

Status: January 22, 2004

Bachelor project proposals

The following projects are also offered in Master's level term project versions (projekter på kandidatuddannelsen/2.delen).  

Unix file system interface for XML Store

This project is aimed at providing a Unix/Linux file system interface for mounting local or remote XML Stores. This makes it possible to provide a Unix file system interface to a distributed, replicated, mobile XML Store and provides backwards compatibility for applications operating on the Unix file I/O. Special emphasis should be placed on correctness, well-specified/predictable semantics and implementation efficiency, incl.comparison with NFS (both in terms of semantics and efficiency).  It is suggested that the VFS (virtual file system) abstraction for implementing file systems in Linux is used.  

Native XML Store disk implementation

(For Linux hackers) The basic XML Store 'raw disk' storage module is presently implemented on top of the file system operations of the underlying operating system.  Unix file system interface and semantics are, however, considerably more complex than XML Store semantics.  Furthermore, Unix file systems have weak and implementation-dependent transactional guarantees (if any), which may compromise XML Store semantics.  Design and prototype a 'native' XML Store raw disk module that implements the simple load/save interface of XML Store directly on a standardized disk controller interface.  Compare with file system based implementation, evaluate performance and recommend implementation techniques for a full-scale implementation of a native XML Store. (NB: The above two projects in combination layer a Unix file system on top of XML Store, as opposed to the other way round, as presently implemented.)  [This project also exists in a Master's thesis version.] 

XQUERY interface for XML Store

XML Store is based on accessing XML documents through a value-oriented interface. This project is aimed at extending the functionality of a (remote) XML Store to provide database functionality by adding an interface that processes XQUERY programs on the state of the XML Store.  Compare with (existing) data for executing XQUERY queries in a commercial object-relational  (Oracle 9i) and native XML (Tamino) database.   NB: There is an existing XPATH implementation for XML Store that can be used as a basis.  XQUERY is a superset of XPATH.)

XML parser/unparser generator

Many programming languages (Java, Standard ML, etc.) and other languages (LaTeX, BibTeX, etc) can be specified by context-free grammars.  XML is increasingly used as a universal format to encode tree-structured data.  This project is aimed at generating converters from a grammar specification G (parser/unparser) that convert inputs from L(G), the language defined by G, to XML and conversely.  The grammar G should be transformed to an XML Schema S(G) for validation of XML documents to ensure that they represent inputs from L(G).  In particular, a converter from Java to XML should be constructed.  (This project enables storing Java source code and other structured input in non-XML  formats in XML Store.)

Tree update merging

Semi-structured documents such as collections of computer programs or legal texts making up a software/document configuration may be updated indpendently and concurrently by multiple users and thus need to be synchronized by merging the multiple updates to arrive at joint .  An example is using cvs for merging directories, where files are treated as lists of strings (each string representing a line in the input).  Design and implement a simple domain specific language (DSL) for specifying how to merge multiple updates of tree-structured data into a single compound update.  Exemplify and demonstrate update merging of document-oriented and data-oriented XML documents.  [NB: Building on the previous project this could also be used for merging software and comparing it with the line-oriented semantics of cvs. This project also exists in a Master's thesis version.] 

Transaction manager for XML Store

XML Store is storage manager for semi-structured (graph-oriented) data that distinguishes between immutable data (can only be allocated, read and garbage collected, but are never in-place updated) and mutable data (like immutable data, but may also be in-place updated).   Since immutable data are much cheaper to manage in a distributed and mobile network setting, XML Store applications are best and most efficiently programmed in value-oriented style as in functional programming: with many boxes (nonupdatable pointers) and few refs (updatable pointers).  Since operating on XML Stores is inherently concurrent, collections of refs still need to be updated transactionally to guarantee atomicity.  Consider concurrency control based on locking ("pessimistic") and reconciliation ("optimistic").  Design, implement and evaluate a transaction manager based on your analysis.   [This project also exists in a Master's thesis version where wait-free data structures, distribution and replication should also be considered.]

Replication of mutable data

Consider active (peer-to-peer style) and passive (master-slave style) replication schemes (and others, if you wish) for replicating mutable data (called cells) in XML Store that get updated concurrently in a distributed setting.  Analyze what semantics these replication schemes have in the presence of network and node failures as well as concurrent updates to replicated data.  (NB: Replication of immutable data is easy and need not be considered.)

Simple Grid

XML Store is a highly configurable distributed peer-to-peer storage system with a simple load/save interface (load data, save data).  Extend it with an "execute" (visitor) interface that enables (remote) executing arbitrary (Java or other) program code stored as compute jobs performed in the XML Store (instead of the client using the XML Store).  In particular, design and implement a simple load balancing approach for allocating a complete compute job for execution on individual peers in the peer-to-peer system.  Analyze and design a generalization of this scheme to concurrent, load-balanced distributed execution by allowing the submitted compute jobs to invoke the execute interface asynchronously.  Demonstrate the execute interface by some simple applications (e.g.: storing XPATH queries in XML Store and invoking them; simple branch-and-bound grid computing application; or similar). [This project also exists in a Master's thesis version.] 

Compressed serialization of semi-structured data

Compression of linear (string-based) data can conceptually be understood as a four-stage process: (1) parse the input into a (usually shallow) tree with short strings at the leaves and the append operator at its leaves; (2) contract the tree into a directed acyclic graph (dag) by identifying all structurally isomorphic nodes in the tree; (3) perform Huffman coding on the dag to represent nodes with most inedges by the shortes binary strings; and finally (4) serialize (write out as string).  Ordinary parsing may be used for (1) for serialized representations of tree-structured data, e.g. for XML documents ; (2) corresponds to detecting sharing and is performed automatically in XML Store.  This project is intended to design and implement (3) and (4).  All 4 phases pipelined together yield an XML compressor.    Analyze the resulting XML compressor and compare it with other compressors with respect to usability (the four phases may be implemented and used as individual modules), performance (time and space consumption during compression and decompression) and compression rate.  Compare with ordinary string-based compressors (e.g. Lempel-Ziv and Huffman coding based ones) and XMill, an XML compressor that obtains substantially greater compression for data-centric XML documents than gzip and the like.

DVM serialization for Java objects

Serialization in Java refers to the automatic conversion of instances of classes implementing the Serializable interface into a byte array, and conversely.  It is mostly used for storing objects in persistent storage and for transmitting them from one Java Virtual Machine (JVM) to another.  This project aims at 'serializing' Java objects into Document Value Model (DVM) format instead of byte arrays.  Such DVM format objects can then be stored in XMLStore.  Serialization into XMLStore in this fashion has the advantage that multiple objects from multiple serializations (in the same execution)  may share storage across objects.  This is to be done by generating conversion code from class definitions; that is generating the conversions from an object state to its DVM format and back at compile time.  Evaluate your design and compare your implementation to both standard Java serialization, to DVM serialization by Java reflection (earlier project) and possibly to other serialization formats you may find.

Peer-to-peer backup

XML Store provides automatic sharing of unique data, peer-to-peer distribution, high fault tolerance through replication and automatic garbage collection.  This makes it an ideal platform for disk-based backup.  Using XML Store, build a peer-to-peer, fault tolerant peer-to-peer system for data backup that can be used by user groups as a mutual backup platform.

Plagiarism detection tool

As part of its identification of sharing opportunities XML Store provides automatic discovery of structural isomorphic data in collection of data;  that is, it discovers rapidly and automatically parts of documents that occur multiple times in different documents and does so even up to commutation of parts.  Use this facility to build a copy/plagiarism detection tool: Given a collection of structured documents (e.g., software programs) identify unusually or suspicously large parts of documents that occur in multiple documents.  Apply it to a collection of Standard ML program parts and analyze to which degree it can be used as a filtering tool for identifying potential plagiarism in solutions to programming exercises.

Browser for XML documents: XML source, XSLT stylesheet

Design and implement a functional XML browser based on transformation from XML to HTML, where both source and target documents are stored in an XML Store. Focus is on exploiting caching and memoization to avoid repeated computations/transformations.

Email storage manager

XML Store stores the same data in principle only once (which, however, may then be replicated for performance and availability) .  Build an email client/retrofit an existing email client with an XMLStore interface for allowing users to store general (nonprivate) mail, potential spam and identified spam in an XMLStore shared by a large user base.  Analyze how much space savings this gives.  [NB: The results of the Unix file system interface for XMLStore project, first project listed above, may be used here.]

Graduate project proposals

(Preliminary!  Will be updated in February 2004.)  See also "Bachelor project proposals".

A functional language for XML processing

A programming language for mobile applications

Code as semi-structured data

The aim of this project is to allow an XML Store to store not only data in the traditional sense of the word, but also program code. Having represented code as semi-structured data, the XML Store can be extended with an operation for executing the stored program. Focus in on designing the representation and implementing the proposed extended XML Store.

Checkpointing

The aim of checkpointing is to capture the execution state and store it for latter perusal. This can, for example, be realized representing code as semi-structured data and using CPS transformations to reify the computation state as a continuation argument. The focus of this project is to design and implement a library for adding checkpointing instructions to a distributed, mobile programming language.

Distributed debugger

Based on the checkpointing system above, a sophisticated distributed debugger can be built. If the distributed program to be debugged is instrumented to store checkpoints at regular intervals in an XML Store, the debugger, running concurrently or "after the fact", can inspect and visualize the execution steps to the user. By moving backwards and forwards in the list of checkpoints one gets the functionality of a "replay debugger"; the availability of the complete list of checkpoints allows for "time travel": jumping to a particular time in the execution of the program; etc.

Concurrency control and reconciliation

Distributed garbage collection for XML Stores

Routing for XML Stores

Eternity service

Collaborative system (document sharing)

Email system

Calendar system