home
contents
background
models description
design
GUIs
events
calculations
publishing

What is it all about

Computational biology presents many intriguing problems which have not been confronted in other areas of computational science, ranging from technical to conceptual, philosophical and sociological. The directions taken in Catacomb development reflect one view of how some of these issues can be addressed.

The problems

Experience in other areas of computational science, often leads to the impression that straightening out computational biology is only a matter of finding enough people with the right technical skills. This could hardly be further from the truth - technical skills are of course necessary, but far more important is to work out what to do. The following is a partial and subjective list of key problems.

What to do about them

Catacomb development is motivated by ideas about how these problems might be solved. See the design summary to see what it actually does: this is what it would like to do.

Data publishing

The problem here is that you can't get a database design right until you have a representatively large quantity of data to work on. And you can't get the data until you have a database. But in fact, it is not clear that you need to design a database. This concept comes from computer science, not biology and is designed to make the software engineering easy. A librarian would be horrified to see a cataloging system which involved putting returned books in the first available space and using cameras everywhere and a character recognition systems to work out where each book was at any given time so they could be fetched as needed. But, depending on the relative cost of filing staff versus computers, it may yet prove to be the most cost effective approach. Likewise, it may be sufficient just to get biological data documented and on-line rather than forcing it into some predefined set of formats. The rest would be the work of computes and some clever software.

Catacomb proposes an anarchic approach to data documentation where anyone can design a data format (what keywords there are, what their acceptable values are), anyone can use whatever data format they like for documenting a particular chunk of data, and anyone can extend an exiting format to meet their needs. But in all these cases, the fruit of their labor gets put on their own website, and one or more servers is notified of its existence. Anyone can then run a web spider to catalog, index and publicise whatever is available.

Model description

This is very close to the data publishing problem, Indeed, may be the two are really the same problem? - models are smallish but complete sets of data about a hypothetical system. Catacomb provides a similar (GUI based) scheme for anyone to construct a model format, including how data types should be presented (log or linear scales, max, min values etc) and how th models is split into distinct components. Without a calculation behind them, such model formats may seem futile, but they can work very well as a glorified jotting pad or flow diagram, helping the user to see structural problems and come up with a more coherent format. Once they exist, such formats can act as juicy bait for software engineers who can't or do not want to learn too much biology. Ideally, they provide well proposed mathematical and software problems distinct from the biology on which they are based.

Software development

Model description is the key to developing software which is not immediately irrelevant. The greatest problem in biological modeling is in committing oneself to a path through parameter space which is almost certainly biologically irrelevant. Awareness of such paths can lead to a sort of paralysis before the development of more complex models, because all the available openings seem to commit the developer to a path which is both hard to escape from and irrelevant. Part of the solution is to separate the description of the model entirely from its implementation. Once the form of the description has been decided, one can implement software to compute the behavior of any model consistent with the given form. It is likely to be harder than implementing a particular model, but it is also more likely to be useful.

In its early days, Catacomb focused on small models that run quickly. The numerical code was mixed with the model description objects, and executed whenever a quantity was changed allowing the user to explore the behavior of a model interactively. This is being replaced by a more "design-heavy" system which splits off calculations into semi-autonomous components which can be used not only by Catacomb but also from other systems through a small set of programming interfaces. See the design summary for details.

Biological Modeling

Ultimately, it is all about using computers to help understand biology - developing intuition, testing hypotheses, interpreting experimental results, condensing knowledge into predictive models and deducing integrative properties. Very often, modeling is presented as cookery. This is fine for exploring a parameter space, or making a working system to prove that an idea is not absurd, and Catacomb tries to make such cookery easy. But the real gains are to be had from rigorous modeling, where the testing set is not the same as the training set, and best of all, where the training (usually parameter tweaking to get a desired result) is replaced by the use of real data, complete with error estimates, tolerances, and reliability assessments. Hence the emphasis above on ways of making such data available in machine readable form.