Department of Electrical and Computer Engineering


This National Science Foundation (NSF) funded project aims to build a model of a high-quality integrated development environment in which hypermedia technology can provide improved tools for managing the full range of documents produced during the software life cycle. The project’s aim is to help software developers better maintain conformance
between these many documents as they and the software that they describe change over time. The project is focused on research issues:


The many documents produced by the software development process can be broadly divided into two categories: formal and informal. Formal documents include program source code and formal specifications. Their common
characteristic is that their syntactic and semantic structure can be determined by analysis of a text stream (with the obvious exception of visual programs). Formal documents are written using ASCII text editors or specialized environments. Even the advanced environments (for example, Ada-Assured) are restricted to text, albeit with font and color variations, and do not support documentation in other media or connections with other software documents. The
limitation to textual documentation can prevent programmers from expressing important ideas about their code that are better expressed in other media. The fact that programmers cannot link their source code to its supporting documents is just as serious a limitation, since the code is often a direct expression of ideas in those other documents.

All other software documents are informal. Any syntactic or semantic structure they have is either specified directly by the user, obtained from a shared template or form, or is implicit in the natural language content of the document. Examples include requirements documents, design documents, testing and bug reports, and user documentation. Informal documents are commonly produced using commercial office software suites, such as Microsoft Office.
MS Office provides extensive interoperability between the different types of documents it supports: any document can include active fragments of other documents. Furthermore, MS Office documents can import a wide variety of multimedia objects. However, these inclusions are not marked with information about their semantics.

In practice, formal and informal documents do not interoperate well. The central problem is that, in formal documents, the text stream is used both for analysis and for presentation. The lexical analysis phase of program analysis requires that the text stream adhere to the language specification, which allows only textual comments. Thus, it is not possible to embed objects composed of arbitrary byte streams (such as compressed images) inside program source code.
It is possible to conceive of program editors that search for special comments pointing to pieces of non-text documentation held externally, but there are no examples of such an editor. Such an editor would require a special formatting model to correctly display the non-text documentation.


Relationships between ideas are critical to the process of software development. The life cycle of a software system produces a tremendous variety of documents — requirements specifications, design documents of many types, program source code, testing and bug reports, and user documentation are examples. These documents embody a great number of ideas which are connected by a complex network of relationships.

There are many types of relationships between software development documents. Without claiming to present a complete taxonomy, these are some examples:

  • The requirements motivate the design.
  • The design requires the implementation.
  • A test report evaluates the implementation.
  • A bug report complains about a mismatch between the requirements and the implementation.
  • A change to the implementation responds to a bug report.
  • The user manual documents the design and implementation.

In general, these relationships are persistent, lasting days, weeks, or years, but they are not necessarily permanent. Because the documents in a system are dynamic and can be created, altered, and removed, the set of active relationships in a system is also likely to change over time.

Let us consider an imaginary software system whose documents are in perfect harmony with each other. We might say that its documents are conformant, because they conform to each other. If we then alter a requirement, such as the number of users to be supported, but make no other change, it becomes possible that the system does not meet its requirements. We might then say that the system’s documents are non-conformant, because the system’s design
does not conform to its requirements. Barring major advances in natural language processing research, completely automatic testing for conformance between software documents will not be possible for some time. However,
if the relationships between software documents were explicitly recorded, it might be possible to automate detection of possible non-conformance. Such automated detection could be used to guide developers to potential problems.
It is important to note that similar relationships exist among source code and specification documents. We use the term conformance analysis to refer to the process of determining whether software documents and their logical relationships are in agreement. Programming languages have commands like “include” or “require” that describe a dependency between pairs of files. The difference is that these relationships between formal documents can be found, without any ambiguity, by automated analyses. In fact, relationships like these are an important information source for re-engineering tools.

Each of the above document relationships carries with it an implied logical ordering of its documents. For example, testing and bug reports cannot be produced until an implementation is available, and while it is not necessarily the case that requirement documents will be written before designs, there is certainly a logical relationship between them that makes design depend on requirements. Ordered relationships like these have been used for many years to automate efficient compilation. However, these techniques have yet to be applied to informal software documents.


The relationship management must be done in the context of a large software system with hundreds of artifacts and thousands of explicit and implicit relationships, that all evolve over time. Documents under developments or maintenance are changed and updated to produce the next revision in their evolutionary process. Therefore, the set of active logical relationships can also change over time as a result. For a particular task, developers may need to record the history of some subset of software documents and their relationships. They may want to navigate, manage, query, or access the information in prior states of that subset. It is necessary to record the history of changes of both documents and their relationships. Although there are several approaches addressing the relationship management problem, existing software engineering tools do not provide sufficiently powerful functionality to record changes of documents and their logical relationships at the same time in a cohesive way.


This source code documents will require the development of a novel formatting model that properly integrates the formal and informal material. The central problem is that automatic pretty-printers operate not from lines of text, but rather from an abstract syntax tree. But in this new representation, the leaves of the abstract syntax tree will be intermixed with some other tree-based representation for the informal material. No existing formatter or editor must
coordinate between two tree representations, so a new formatting model will be required.