System Architecture

IVLE is a complex piece of software that integrates closely with the underlying system. It can be considered part web service and part local system daemon. Due to the implementation of these parts it is tied to Apache Web Server (mainly due to the use of mod_python) and Linux.

Dispatch

IVLE uses mod_python to allow Python scripts to be called from Apache. We register the ivle.dispatch module as the PythonHandler in the associated VirtualHost, allowing us to intercept all HTTP requests to the web server.

The ivle.dispatch module is responsible for mapping requests from the client to the correct application plugin. Plugins can be specified by placing a *.conf file into the /etc/ivle/plugins.d/ directory containing lines of the form [plugin_module#classname].

In future, this may be ported to a WSGI (PEP 333) based dispatch to allow IVLE to be run on web servers other than Apache.

Templating

IVLE uses the Genshi XHTML template system to generate all HTML pages. We have an inheritance-based “views” system. BaseView is a class from which all views derive.

There are 3 sub-types of BaseView (more can be implemented if necessary):

  • XHTML-Templated
    • browser, console, debuginfo, diff, forum, groups, help, home, logout, settings, subjects, svnlog, tos, tutorial
  • Raw byte streaming
    • download, server
  • JSON service
    • consoleservice, fileservice, tutorialservice, userservice

The apps each derive from one of the above.

Note

IVLE used to write its HTML output as a raw stream to an output file, until it was refactored to use Genshi. All apps which haven’t yet been refactored properly were ported to use the “raw byte streaming” view.

Jail System

One of the main features of IVLE is it’s ability to execute user’s code in a customised environment that prevents access to other users files or underlying file system as well as placing basic resource limits to prevent users from accidentally exhausting shared resources such as CPU time and memory.

Trampoline

To each user, it appears that they have their own private Unix filesystem containing software, libraries and a home directory to do with what they please. This is mainly done by the setuid root program trampoline which mounts the users home directory, sets up the users environment, jumps into the user’s jail using the chroot(2) system call and finally drops privileges to the desired user and group.

To prevent abuse, trampoline can only be used by root or one of the uids specified when trampoline is built by setup.py build (defaults to UID 33, www-data on Debian systems). Since it’s one of two C programs involved in IVLE and runs setuid root it is rather security sensitive.

See also

Source code bin/trampoline/trampoline.c

Base Image Generation

All user jails share a common base image that contains the files required for both IVLE’s operation and for executing user code. This base image is generated automatically by the ivle-buildjail script. This then calls the distribution dependant details in ivle.jailbuilder module. At present we only support building jails for Debian derived systems using debootstrap.

The contents of the base image contains a few core packages required for the operation of IVLE - Python and the Python CJSON and SVN libraries. Other options that can be configured in /etc/ivle/ivle.conf are the file mirror that debootstrap should use, the suite to build (such as hardy or jaunty), extra apt-sources, extra apt keys and any additional packages to install.

To prevent users from altering files in the base image we change the permissions of /tmp, /var/tmp and /var/lock to not be world writeable and check that no other files are world writeable.

Finally we make the user dependent /etc/passwd and /etc/ivle/ivle.conf symlinks to files in the /home directory so that they will be used when we mount a user’s home directory.

Mounting Home Directories

To give the appearance of a private file system we need to merge together a user’s local home directory with the base image. To achieve this, IVLE uses the bind mount feature of Linux, which allows directories to be accessible from another location in the file system. By carefully bind-mounting the jail image as read-only and then bind-mounting the user’s /home and /tmp directory data over the top, we create a jail with only three bind mounts and at virtually no file system overhead.

Note

IVLE has historically used numerous solutions to this problem, which are chronicled here to avoid the same mistakes being made again.

In the first release of IVLE this was done offline by hard-linking all the files into the target directory, but for a large number of users, this process can take several hours, and also runs the risk of exhausting the number of inodes on the underlying file system.

The second solution was to use AUFS to mount the user’s home directory over a read-only version of the base on demand. This was implemented as part of trampoline and used a secondary program timount (see bin/timount/timount.c), run at regular intervals, to unmount unused jails. This used the MNT_EXPIRE flag for umount(2) (available since Linux 2.6.8) that only unmounts a directory if it hasn’t been accessed since the previous call with MNT_EXPIRE.

While quite effective, AUFS appeared to cause NFS caching issues when IVLE was run as a cluster, and as its inclusion status in future Linux distributions is questionable, the developers elected to use the much older bind mount feature instead.

Entering the Jail

Before running the specified program in the users jail we need to chroot(2) into the users jail and update the processes environment so that we have the correct environment variables and user/group ids.

At this stage we also may apply a number of resource limits (see setrlimit) to prevent run away processes (such as those containing infinite loops or “fork bombs”) from exhausting all system resources. The default limits are on maximum address space (RLIMIT_AS), process data space (RLIMIT_DATA), core dump size (RLIMIT_CORE), CPU time (RLIMIT_CPU), file size (RLIMIT_FSIZE) and number of processes that may be spawned (RLIMIT_NPROC).

Unfortunately due to glibc’s malloc(2) implementation being able to allocate memory using mmap(2), RLIMIT_DATA does not provide an effective limit on the amount of memory that a process can allocate (short of applying a kernel patch). Thus the only way to limit memory allocations is by placing limits on the address space, but this can cause problems with certain applications that allocate far larger address spaces than the real memory used. For this reason RLIMIT_AS is currently set very large.

Python Console

IVLE provides a web based programming console, exposing similar features to Python’s command line console. It is built around the services/python-console script, which opens up a socket on a random port to which JSON encoded chat requests can be made.

A new console is typically launched on demand by the web client to the HTTP API, which in turn calls the wrapper class ivle.console.Console to start a new console in the user’s jail.

Subsequent requests from the same in-browser console connect to the existing console process. This is achieved by storing a string on the client which identifies the server address and port. The client then makes requests through the load balancer, sending this string through to an arbitrary slave which forwards the request to the identified console.

This means that all slaves need access to all ports on every other slave.

User Management Server

The User Management Server is a daemon responsible for handling privileged actions on IVLE and should be launched along with IVLE. It is primarily responsible for:

  • Creating user jails, Subversion repositories, and Subversion authentication credentials.
  • Creating group Subversion repositories.
  • Rebuilding Subversion authorization files.

Communication with the Server is done using the Chat Protocol. To prevent unauthorized use, communication with the User Management Server requires that a shared secret be used to communicate with the server. This secret is stored in the magic variable in the [usrmgt] section of /etc/ivle/ivle.conf.

The User Management Server is called almost exclusively from the ivle.webapp.userservice module.

See also

Source code services/usrmgt-server

Chat Protocol

Chat is our JSON-based client/server communication protocol used in communicating to Python Console processes and User Management Server. Since it is JSON-based it can be called from either Python or JavaScript.

Protocol

The protocol is a fairly simple client/server based one consisting of a single JSON object. Before communication starts a shared secret MAGIC must be known by both parties. The shared secret is then used to form a ‘keyed-Hash Message Authentication Code’ to ensure that the content is valid and not been modified in transit.

The client request takes the following form:

{
    "content": DATA,
    "digest": HASH
}

where DATA is any valid JSON value and HASH is an string containing the MD5 hash of the DATA appended to MAGIC and then hex encoded.

The server will respond with a JSON value corresponding to the request. If an error occurs then a special JSON object will be returned of the following form:

{
    "type": NAME,
    "value": VALUE,
    "traceback": TRACEBACK
}

where NAME is a JSON string of the exception type (such as ‘AttributeError’), VALUE is the string value associated with the exception and TRACEBACK is a string of the traceback generated by the server’s exception handler.

See also

Source code ivle/chat.py

Version Control

Along with traditional file system access, IVLE allows users to version their files using Subversion. Much like how Subversion workspaces are used on a standard desktop, workspaces are checked out into users home directories where they can be manipulated through a series of AJAX requests to the fileservice app.

Like all other user file system actions, version control actions need to be executed inside the user’s jail. Requests are made to the fileservice app in ivle.webapp.fileservice which then calls the fileservice CGI script using trampoline. This script is simply a wrapper around ivle.fileservice_lib which actually contains the code to handle each of the actions.

Manipulation of the Subversion workspaces is done using the pysvn library.

Repositories

Each user is allocated a Subversion repository when their Jail is created by the User Management Server. Repository are stored in the location specified by paths/svn/repo_path in /etc/ivle/ivle.conf (by default /var/lib/ivle/svn/repositories/). User repositories are stored in the users/USERNAME/ subdirectory and group repositories in groups/SUBJECT_YEAR_SEMESTER_GROUP.

Warning

While it would be possible to give users direct access to their repository using Subversion’s file backend, this would allow users to potentially modify the history of any repository that they had access to. To ensure repository integrity, all Subversion interaction must be done remotely.

Subversion WebDAV

These repositories are served by Apache using mod_dav_svn allowing access over Subversion’s WebDAV HTTP or HTTPS backends. Users are authenticated using a randomly generated key which is stored in the database and is made available to each user inside their jail (svn_pass property inside /home/.ivle.conf). This key is automatically provided when doing Subversion actions, but can be manually entered when accessing a user’s repository from an external Subversion client such as with svn checkout svn_addr/users/USERNAME/ workspace.

Repository permissions for AuthzSVNAccessFile are automatically generated and placed in the file specified by the paths/svn/conf config option (usually /var/lib/ivle/svn/svn.conf) for user repositories and the paths/svn/group_conf option for group repositories (usually /var/lib/ivle/svn/svn-group.conf). User authentication keys for AuthUserFile are stored in the file specified by paths/svn/auth_ivle, usually /var/lib/ivle/svn/ivle.auth. These will be regenerated each time user or group repository settings change.

Worksheets

Worksheets provide a way for users to be able to attempt a set of coding exercises along with accompanying instructions. In the past worksheets were created directly using an XML format, but this has been deprecated in favour of being generated automatically from reStructuredText.

Worksheets are now stored in the database as a Worksheet object (see ivle/database.py). This allows them to be treated with the same access permissions available to other objects and lays down the ground work for providing versioned worksheets.

Exercises

When users submit an exercise, the user’s solution is tested against a series of test cases which can be used to check if a solution is acceptable. Almost all the behavior for exercises is contained within ivle/webapp/tutorial/test/TestFramework.py.

Note

The TestFramework module is one of the oldest and most complicated in IVLE, largely taken directly from the IVLE prototype. As such it has a design that doesn’t quite match the current architecture of IVLE, such as using slightly different terminology and having a few testing facilities that are untested or untested. It requires a substantial rewrite and comprehensive test suite to be developed.

At the top level exists the Exercise object (known as TestSuite in TestFramework.py). This object encompasses the entire collection of tests for a given exercise and details such as the exercise name, provided solution and any “include code” (Python code available for all test cases, but not the user’s submission).

Each exercise may contain one or more TestSuite objects (known as TestCase in TestFramework.py. A test suite is a collection of tests that run with some sort of common input - be that stdin contents, a virtual file system configuration (presently disabled), inputs to particular function or defining the contents of one or more variables. A test suite will typically run until the first test case fails, but can be configured to continue running test cases even after one has failed. Exceptions raised by submitted code will typically cause the test to fail except if it is marked as an “allowed exception”.

Individual units to be tested (something that can pass or fail) are contained within TestCase objects (known as TestCaseParts in TestFramework.py). A test case can test the value of source code text, the function return value (Will be None for scripts), stdout contents, stderr contents, name of any raised exception and contents of the virtual file system (presently disabled) of code submitted by users. These checks are contained in a TestCasePart. In addition, a normalisation function or custom comparison function can be used instead of comparing the raw values directly. By default, the value of each check will be ignored unless overidden by a test case part.

Database

Object Publishing

URLs are resolved with a small IVLE-specific object publishing framework – that is, resolution is implemented as traversal through an object graph. The framework lives in ivle.webapp.publisher, and has an extensive test suite.

This object graph is constructed by the dispatcher. Any plugin class deriving from ViewPlugin will be searched for forward_routes, reverse_routes and views sequences. Everything is class-based – an object’s routes and views are determined by its class.

Forward routes handle resolution of URLs to objects. Given a source object and some path segments, the route must calculate the next object. A forward route is a tuple of (source class, intermediate path segments, function, number of subsequent path segments to consume), or simply a reference to a decorated function (see ivle.webapp.admin.publishing for decoration examples). The function must return the next object in the path.

A reverse route handles URL generation for an object. Given just an object, it must return a tuple of (previous object, intermediate path segments). This creates a chain of objects and path segments until the root is reached. Due to IVLE’s lack of a utility framework, reverse routes at the root of the URL space need to refer to the root object with the magical ivle.webapp.publisher.ROOT.

Views are registered with a tuple of (source class, intermediate path segments, view class).

In all of the above, “intermediate path segments” can either be a single segment string, or a sequence of multiple strings representing multiple segments.

Note

While many applications prefer a pattern matching mechanism, this did not work out well for IVLE. Our deep URL structure and multitude of nested objects with lots of views meant that match patterns had to be repeated tediously, and views required many lines of code to turn a match into a context object. It also made URL generation very difficult.

The simple object publishing framework allows views to be registered with just one line of code, getting their context object for free. URL generation now comes at a cost of approximately one line of code per class, and breadcrumbs are easy too. The reduced code duplication also improves robustness.