Plan-X > Plagiarism Detection Tool

Project completed:

The source code can be downloaded here.
The paper about this tool can be found here - PDF

Plan-X: Plagiarism Detection Tool

Overview

XMLStore is an application configurable, distributed, mobile persistence layer with a value-oriented interface for storing and retrieving semi-structured data, in particular XML documents. Transparent sharability and actual sharing of stored data is a central  benefit of the value-oriented approach in XMLStore, embodied in its  enforced data immutability and exploited in XMLStore by storing a unique data item only once and having all occurrences represented by pointers to the stored item.

The general goal of this project is to develop a  tool that exploits the degree of sharing detected and represented in XMLStore to flag suspicious similarity of parts of tree-structured documents; to apply this to measuring similarity of Standard ML programs authored by different persons; and to compare this sharing detection technique with 'fingerprinting' (hashing) techniques used otherwise for copy detection [1].

 

Goals

  1. to design and implement a modular, configurable copy detection tool for Standard ML programs  that can be used to flag suspicious similarities between groups of programs as a filter for (manual) plagiarism analysis. The tool should allow configuration by, e.g., parameter setting to use it interactively and 'tune it' to a particular input set (of programs). The tool should be modular to enable (re)use with other data than Standard ML programs. 
     

  2. to evaluate the tool on Standard ML programs that are representative of solutions to small to mid-size programming exercises (20 to 1000 lines of code). 
     

  3. to identify weakness of the method proposed and to pinpoint promising remedies. 
     

  4. to compare this sharing detection method with fingerprinting techniques, notably winnowing, with respect to expressive power (what kinds of similarities can be discovered, what kinds of counter measures can foil copy detection; which counter measures can copy detection 'see through'?) and efficiency. 
     

  5. to give an appraisal of the viability of copy detection by sharing detection and to present important problems that should be solved to increase its applicability.
     

Method

 

Milestones & Dates

This section contains a list of the milestones we have found. Each milestone represents what we think is an important step towards the completion of our plagiarism detection tool. Each item in the below list is described more closely earlier in this project plan. The dates should rather be seen as guidelines than as final dates.

 

Literature

  1. Alex Aiken, Saul Schleimer, Daniel Wilkerson, "Winnowing: Local Algorithms for Document Fingerprinting." Proceedings of the ACM SIGMOD International Conference on Management, June 2003
    http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
  2. Kasper Bøgebjerg Pedersen and Jesper Tejlgaard Pedersen, "Value-oriented XML Store", Master's thesis, ITU and DTU, 2002.
    http://www.it-c.dk/~kasperp/xmlstore/pdf/thesis.pdf
  3. XMLStore source code - http://plan-x.org/xmlstore/
  4. Moss (Measure Of Software Similarity) - http://www.cs.berkeley.edu/~aiken/moss.html
  5. Lawrence C. Paulsen: "ML for the working programmer," 2nd Edition
  6. Comparator - http://www.catb.org/~esr/comparator/comparator.html
  7. Cormen, Leiserson, Rivest, Stein: "Introduction to Algorithms," 2nd Edition, MIT Press, Cambridge 2001.
  8. Paul Clough: "Plagiarism in natural and programming languages: an overview of
    current tools and technologies," July 2000. - http://www.dcs.shef.ac.uk/~cloughie/papers/Plagiarism.pdf
     

People

Christa Fotel, stud.scient - christa @ diku.dk
Lars Langer, stud.scient - langer @ diku.dk

Supervisor: Prof. Fritz Henglein, Department of Computer Science, Copenhagen University.

 

Appendix

Figure 1: The procedure of our detection tool - first we parse and normalize the SML-program and dump XML code by using the MosML compiler. The generated XML documents are saved by XMLStore. After this we analyse the references computed by XMLStore and based on the input parameters given, we generate a list of possible plagiarism.

 

Last updated 2004-05-31