Central articles: Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry, "A Fast File System for UNIX". ACM Transactions on Computer Systems, Vol. 2, No. 3, 1984, pp. 181--197. Mendel Rosenblum and John K. Ousterhout, "The Design and Implementation of a Log-Structured File System", ACM Transactions on Computer Systems, vol. 10(1), 26-52, 1992. Questions: 1. What should be avoided if good disk bandwidth is required? 2. Why does the UNIX file system suffer badly from fragmentation? 3. What allocation strategy do log-structured file systems use? 4. What is the cleaner in log-structured file systems? What is a cleaner called in programming language terminology? 5. Which of the ACID properties do you get/can you get from the filesystes presented in the articles? 6. Consider the original Unix file system, Fast File System (FFS), and log-structured file systems (SPRITE). In each of these file systems, under which circumstances, if any, is it possible for a data block to have: 0 references (not be referenced at all)? 2 or more references? (Consider both directly and indirectly referenced data blocks.) 7. What is a cost/benefit cleaner? Programming exercise: Implement simulations of simplified file systems with UNIX API as follows. (a) We model a disk as an integer-indexed array Arr[0..N-1] of bytes, with a read/write head that points into the array. Thus the state of a disk is characterized by: the contents of the array, plus the position of the read/write head into the array. A simple interface for disks is: interface Disk { void seekTo(int newPos); // set R/W head to point to Arr[newPos...] void seekRelative(int distance); // set R/W to point to Arr[currPos+distance...] int getPos(); // return currPos int write(byte[] byteBuffer, int length); // update Arr[currPos..currPos+length] and set currPos to currPos+length int read(byte[] byteBuffer, int length); // read Arr[currPos..currPos+length] and set currPos to currPos+length } We make the simplifying assumption that execution cost is correlated with the seek distances, that the differences between current position and new position in seek operations executed. Separately we count how many bytes are read and written. This is expressed in the following extended interface: interface InstrumentedDisk extends Disk { void reset(); // reset counters int getSeekDistance(); // Sum of all distances (currPos to newPos) executed by seeks since last reset int getReadBytes(); // Sum of numbers of bytes read by reads since last reset int getWriteBytes(); // Sum of numbers of bytes written by writes since last reset } Implement InstrumentedDisk in your favorite language. Note that we only simulate disks --- it is thus a good idea to put them into main memory in the implementation. (b) A UNIX style file system can be given the following simplified interface: interface FileSystem { FileDescriptor open(String fileName); // open a file for reading and writing; create new file if necessary void close(FileDescriptor fd); // close file int read(FileDescriptor fd, byte[] byteBuffer, int length); // read length bytes into byteBuffer, advancing current file position int write(FileDescriptor fd, byte[] byteBuffer, int length); // write length bytes from byteBuffer, advancing current file position void seekTo(int newPos); // move file pos to newPos void seekRelative(int distance); // move file pos by distance void fsync(FileDescriptor fd); // flush file contents to disk } Build a simple implementation for InstrumentedDisk. Build a simple file system, parameterized by an InstrumentedDisk, in the style of the original UNIX file system in a naive fashion; that is, do not use any buffering, but ensure that all operations are executed synchronously. (Don't confuse the fact that the Disk implementation is a simulation and keeps its data actually in main memory in your implementation -- actually we don't know that due to paging -- with the fact that we are thus modeling keeping data on a disk. You should think of everything that the (Instrumented)Disk operations do as happening 'on disk'. We'll use the instrumentation in InstrumentedDisk to count the costs relative to our idealized disk cost model.) Build a simple log-structured file system in the style of SPRITE, again in a naive fashion; that is without any form of buffering. (c) Exercise the two file systems for a variety of uses: sequention I/O, random I/O, some mixed I/O. Use the instrumentation to compare performance figures. (This is to gauge the effect of buffering, not to suggest that file systems without buffering are viable.) (d) We model random access memory (which we'll use for simulating in-memory file buffering) as an integer-indexed array Arr[0..N-1]. There is no read/write head since addressing any part of that memory is equally fast, independent of the current state of the memory (which is why we call it random access). A simple interface for random access memory is thus: interface RAM { int write(int offset, byte[] byteBuffer, int length); // Copy byteBuffer to Arr[offset..length]. int read(int offset, byte[] byteBuffer, int length); // Copy Arr[offset..length] to byteBuffer. } Since performance of read/write access is (modeled to be) independent of previous operations and offset in the argument, the main performance characteristic we are interested in is how many bytes have been read and how many written. To this end we define interface InstrumentedRAM extends RAM { void reset(); // reset counters int getReadBytes(); // Sum of numbers of bytes read by reads since last reset int getWriteBytes(); // Sum of numbers of bytes written by writes since last reset } Extend your simple UNIX file system with file buffering in the following fashion. Build a simple file system, parameterized by an InstrumentedDisk and an InstrumentedRAM, where the InstrumentedRAM is used to model file buffering, and the InstrumentedDisk to model actual disk storage. Use the RAM as a cache for reads, but do writes still by synchronous 'write-through'. Build a log-structured file system, parameterized by an InstrumentedDisk and an InstrumentedRAM, where the InstrumentedRAM is used to model file buffering, and the InstrumentedDisk to model actual disk storage. Use the RAM as a cache for reads, but do writes still by synchronous 'write-through'. Do the same performance tests as under (c). (e) Elaborate your two implementation from (d) by adding asynchronous, buffered writing; that is, writes are just stored in RAM, not written through to Disk, until a large enough block of storage can be written in one go unless an fsync necessitates doing so earlier. Do the same performance tests as under (c). (f) Do whatever else you feel like; e.g., buffering of read and write requests (not answering them one at a time) and scheduling them to minimize the cumulative cost of seeks.