Electronic Submission, Managing and Approval of Grant Proposals at the German Research Foundation based on Standard Internet and Ofﬁce Tools TUBSCG-1999-02

Today, grant proposals submitted to the German Research Foundation (DFG) are paper documents. They are received by ordinary mail, manually entered into a proprietary software system and, ﬁnally, information relevant to the speciﬁc task is extracted manually and sent to other de-partments involved in the reviewing/approval process. Of course, all these activities are purely paper based. This paper gives a ﬁrst report on a research project “GoldenGate” which focuses on the development of a prototype system for a complete electronic workﬂow including submission, managing and approval of applications for research funding at the DFG. Typically one would use one of the available Information/Workﬂow Management Systems, but after careful consideration we made the decision to use a set of standard software tools and formats (i.e. Hyperwave Information Server , MS Ofﬁce’97 , XML ) as the key components of our new system and combine them with minimal but ﬂexible interfaces. These ideas, the situation at the DFG, technical details of our present implementation and preliminary results are presented later in this paper.


Introduction
The German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) is the largest noncommercial provider of research funding in Germany and supports several thousand projects from every field of scientific research and education with an annual volume of about 1.2 billion $ US.
Up to now the DFG interacts with the outside world only by paper documents.Internally, electronic support is provided by an integrated, proprietary database system (called "All-in-One") that can only handle ASCII-documents.This system is accessed with a VT220 terminal emulation software, running on modern PCs with MS Windows NT, which today are available in every office.
The DFG has acknowledged the need for a new, modern system, that can handle multimedia documents, is open to external users (especially reviewers) and is flexible enough to be used by several departments.It also has to provide a modern GUI, must run on today's predominant office architecture (MS Windows NT) and should include the ongoing developments in the fields of workflow and document management.
This situation lead to our research project "GoldenGate" presented in detail in Section 3. One main issue of this project is the new approach of using standard software to model information and workflow management (see Sec. 2).Section 4 gives some interesting technical details of the current prototype implementation (using Hyperwave Information Server and MS Office'97).In Section 5, we finally describe some first experiences and outline the next level of functionality.

Information Management with Standard Software
One possible approach (and the one typically used) to replace paper documents by their electronic counterparts and to support conventional office processes with a computer system is to use one of the commercial Information Management Systems -or Workflow-/Document Management, one is most often part of the other -available today.The core of the functionality of these software packages can be described as access, authentication and workflow.
These three features are the most important ones (besides storage) that Information Management Systems have to implement and do implement first, some more, some less.One cannot expect to convert traditional work processes into their digital counterparts without them.Because people are used to them, they will recognize problems in these areas before getting to know or starting to appreciate the positive aspects of electronic workflows.Of course, these systems also include many of the advantages that an electronic environment can offer: the ability for several people at several places to access the same information simultaneously, the possibility to automatically build indexes or to compute results, queries (perhaps even full-text). . .The point we are trying to make here is: any system that implements these three important features can be used as an Information (or Workflow) Management System, even if this is not obvious or commonly agreed on by others.
Today's standard software (meaning applications/systems that are widely available and widely used, especially in office environments) is powerful enough to fulfill the task of implementing the core features of an Information Management System.A system based on these products has the advantage that their features and their flexibility can be exploited, even their file-formats can be used, so there is, for example, no need to convert already existing electronic documents.Another important point is that these products are already in use in most offices, which means that people know how to work with them.So the reuse of already existing software is, especially in this case, a matter of cost reduction and -often more important -a reduction of time.
We have to mention two special software packages here, as these are used as basic components in our project "GoldenGate" (see Secs. 3 and 4): Microsoft Office'97 and Hyperwave Information Server.
The applications of MS Office'97 do not need a detailed discussion, so we only want to stress two important points here.First, all major components (i.e.Excel, Word, PowerPoint) are fully programmable using Visual Basic for Applications (VBA) and, second, they are able to act as clients and servers for other software components with the help of (D)COM [10], the Microsoft (Distributed) Component Object Model.These abilities are exploited in our prototype implementation (see Sec. 4).
Hyperwave Information Server (formerly known as Hyper-G) [8,9] can be described as a second generation WWW-server as it is a combination of a database and a web-server.It is based on modern concepts of database design and information retrieval as well as multimedia storage and Internet access.Some of its features (which are important for the ideas described in this paper) are: The integrated database stores any kind of (multimedia) data-object in one or several hierarchies of objects and collections of objects (which can contain collections themselves).
In any case, the data is stored only once, several instances are modeled by reference.
These references or "links" are again database objects and they are secure, meaning that the moving or renaming of an object, will not invalidate the associated hyperlinks.
Any object can be described by arbitrary metadata ("attributes").You can build indexes over these attributes to execute boolean or ranked queries.
Every database object is access controlled with read, write and delete rights for single users, groups or everyone, similar to UNIX file-system rights.
All these features can be used and even administrated via a conventional WWW-browser from anywhere on the world (given the proper access rights).
Additionally, the full functionality can be used from other programs with the help of a TCP/IP based protocol (Hyper-G Client-Server-Protocol, HG-CSP [4]).
A full-text engine of Verity Inc. is integrated, that allows full-text queries not only for "readable" data, but also for popular formats like Adobe Portable Document Format (PDF) [2] or Postscript [1] and several formats of Microsoft Word.
Summing up, Hyperwave is a good solution for the central component of an Information and Workflow Management System, based on standard software and therefore we are using it in that way.

Research Project "GoldenGate"
GoldenGate [3] is a cooperative research project of the German Research Foundation and the Digital Library Lab of the Computer Graphics Group at the TU Braunschweig.The goal of this project is the design and prototype development of a complete electronic workflow at the DFG, that allows proposals to be submitted in electronic form by researchers and to be stored and managed (e.g. made available to reviewers) in a database until the final approval and, of course, over the full life-cycle of each project including knowledge engineering related tasks executed well past the active period of individual research projects.
Requirements for the new system are, besides a modern GUI and the use of modern paradigms from the fields of Workflow-and Document Management, the ability to handle multimedia documents and the efficient and cost effective customization for different departments (with different structures and needs) of the DFG.Another important feature is the controlled access to some of the material from the "outside", be it other departments or external reviewers from any location, without special hard-or software.Finally, existing structures, processes and data have to be integrated and the platform is prescribed as MS Windows NT running on PCs.
One problem is, that the current work processes in the individual DFG departments are not formally defined in a complete and consistent way.As we could not find one single person that was able to define all rules and formalisms, we had to develop our system using an iterative approach.On the other hand, the advantage of this situation was, that we could freely choose an architecture that is powerful as well as flexible enough to be easily adapted to any new requirement in each iteration step (see an example for one Use-case [12] in Figure 4).
Following the ideas of Section 2 and knowing that the requirement of world-wide access can only be fulfilled with Internet technologies, the GoldenGate system consists of a Hyperwave Information Server as a central (though distributed) database and access point, MS Office applications as well as ordinary WWW browsers (e.g.MS Internet Explorer, Netscape Navigator,. . .), used as access and administration tools.
The final system will allow possible applicants to download (via a Web-browser) a template for the favorite word-processor (L A T E X, Adobe FrameMaker, Microsoft Word, . . . ) which includes support and help (e.g. by "wizards" and selection lists) when creating the application.The resulting (electronic) document is uploaded (again, via a Web-browser) to the GoldenGate server at the DFG, where it is converted into an internal format (i.e.XML [13]) and stored in a collection at a special location of the database hierarchy.
Upon arrival of the document at the DFG's server, an officer in charge is informed that a new grant application has arrived.From this application, a "dynamic view" (a Microsoft Excel sheet) is generated automatically and also stored in the database.It is referenced from other locations within the hierarchy, e.g. according to one or more area-specific thesauri, and it is supplemented by metadata that was automatically extracted from the application.Those dynamic documents can then be checked, manipulated or passed on with Excel'97.Many features of Excel like pop-up menus, forms or COM support the user in the subsequent handling of that grant proposal.For example he can access the address database (consistently maintained with the same Hyperwave server) or choose standard salary amounts from various lists.During the whole process, so called "static views" (currently Adobe PDF documents, but pure text or anything else is possible) can be generated at any time to freeze a special state or to act as read-only documents for other departments or reviewers.All the time, the document remains stored in the database hierarchy of Hyperwave."Free" browsing or non-standard queries to the database i.e. access to the DB which is not part of a dedicated module, directly accessible through the user interface, can be done with each user's favorite Web-browser.For example looking at all proposals by applicant X or at all accepted applications in research field Y is possible -given the proper access rights, of course.In case someone clicks on a dynamic view to work with it, the document is checked-out by assigning the read-write access to this user exclusively until it is checked-in again.Hyperwave's programmable templates, its internal locking mechanisms and CGI scripts are used to implement this behavior.
A prototype of the system described above is already running.Technical details of the current implementation will be given in the next Section.

Technical Details
The main components of GoldenGate are a Hyperwave Information Server, MS Excel'97 and MS Word'97 (see Sec. 2).They are combined with and accompanied by several tools and interfaces, programmed in the scripting language Python [11].Additionally the technologies HTTP, TCP/IP and COM are used.Figure 1 gives an overview of the current package.
As you can see, everything is centered around a Hyperwave-server holding the Golden- Gate database.The hierarchical organization of this DB is schematically displayed in Figure 2. Using Hyperwave's features (see Sec. 2), other "virtual" hierarchies are added according to the demands of the individual DFG departments, e.g. an alphabetical hierarchy of persons or a hierarchy for several research programs.As Hyperwave offers reference links to objects and collections, no data entity has to be duplicated (see, for example, the dashed arrows in Figure 2).The system is initialized with an address database and all documents relevant for currently active research grants.Both are extracted from the current system ("All-in-One"), parsed and inserted by Python scripts, using Hyperwave's native Client-Server protocol (HG-CSP).Following the concept of modularization, we implemented this task by creating an API for the HG-CSP as a C++-extension to Python that is embedded via dynamic runtime linking (a DLL).
This API is also used by other Python tools like the address dialog (see Fig. 3) that acts as a graphical frontend to search, edit or update the address data.Other examples are several hierarchy browsers that are used by MS Excel and MS Word to directly access objects on the server.These tools are implemented with Mark Hammonds's Win32-extensions to Python [6], that, for example, allow to use Windows/Microsoft GUI elements and system calls.
A visualization of database structures and objects is presented by Hyperwave.Thus, one can simply use any WWW browser to access database objects and information (see Fig. 3).We modified Hyperwave's templates (using a combination of JavaScript [7] and PLACE [5], Hyperwave's own template language), that are responsible for this visualization to suit our needs.For example, on the Web-page in Figure 3, you can see how metadata is extracted from database objects and displayed, when the user looks at the contents of a collection.

Figure 3: Some screen shots showing components of GoldenGate at work: on the top left a view with a browser on a collection holding a grant proposal; below a dynamic view with MS Excel of the same "document". On the top right you see the MS Word document with the grant proposal and below a (Python-) tool to directly access the address database.
An important role in our system plays the so called "TCP/IP -COM Interface".The Win32extensions allow a Python script to act as a COM client, i.e. a software component that uses functionality (methods, data) which other components (libraries, processes, applications) -so called COM servers, which can also be implemented easily with Python -offer via the COM protocol.One Python COM server is registered for the address dialog (therefore MS Excel can use it easily) and several other servers form the "TCP/IP -COM Interface" mentioned above.This interface offers methods, objects and GUI dialogs to access Hyperwave.In a way, COM invocations are translated into the HG-CSP.This setting allows MS Word and MS Excel to access, load, store and even organize documents and collections in the Hyperwave-database.
After a new grant proposal has been received and converted (see below), the user creates a new Excel sheet from a special template.This template -besides including an empty default sheet and initial structures -activates buttons and menu entries that can initiate the execution of several VBA (Visual Basic for Applications) modules.Upon execution of the "import" module, the user decides which new grant proposal to read.The according XML document is parsed and the newly created sheet is filled out automatically.The user only needs to assign a new grant ID (used by DFG as an internal reference), check the plausibility and store this new "dynamic document", as we call it, into the database (Figure 3 contains examples for an application and the resulting Excel sheet).The "storage"-module additionally builds a collection for the whole new process, references it from all applicable places and creates (with the help of Adobe PDFWriter) a "static document" in PDF, which acts as a snapshot of the application's state.This last step is repeated every time the dynamic document is saved.
Other VBA modules access Python tools via COM to access the integrated address database (see above), offer support while editing, implement a simple document history and care for the locking of documents.The current locking mechanism acts as a proof of concept how revision control and workflow management can be implemented with the features of Hyperwave, i.e. shifting of access rights and document write-locks.While one user works with a dynamic document, it is locked in the database and an attribute marks it as "in use by X".This is also visible in the WWW-browser and is implemented by an CGI script and a VBA module, as you can open a document from Excel or the browser.Using this mechanism, we effectively prevent two people from editing the same file at the same time, which would otherwise be possible as it is downloaded from the server and uploaded later.Remember, read-only access is continuously provided by access to the latest "static" document.
An important feature of the document templates for grant applications is the hidden markup, i.e. invisible "hints" or "tags" embedded into the document, which makes it possible to safely extract important data automatically.In the MS Word'97 version of the application template this markup is implemented with paragraph and character styles.For example, when the applicant's name is entered, it is marked with style ggName.After the document is stored as RTF, the resulting file is parsed, relevant information is extracted and an XML-file is generated, which will, for example, include the text with style ggName as content of the tag <ggName>.We can then use one of several available XML-parsers to check the new application and -possibly -automatically reject it if something is missing or the structure is incorrect.

Preliminary Results and Future Work
A prototype of the GoldenGate system as described in the previous two sections has been implemented and is currently in use.It allows a researcher, after registering for electronic submission once, to identify himself and to download a template document for MS Word'97.This document, when opened, guides the author through the completion of the essential parts of an application form, via a sequence of dialog boxes (today often called a "wizard").The resulting document can be printed and on paper looks like an application for research funding to the DFG has to.As an alternative, the document can be saved in RTF (Rich Text Format) and uploaded to the GoldenGate-server with the help of a web-form (see the according UML [12] Use-case diagram in Figure 4).
On the server side, a dynamic document is created (see Sec. 4) that allows the officers in a DFG department to fulfill all typical tasks.See Figure 3 for screenshots of an application and the resulting Excel sheet.
During first demonstrations and iterations we learned quickly that we had to increase our systems' abilities: support for standard letters (like for acknowledging the receipt of the proposal, asking for missing details,. . . ) and mail merges, storage for internal and external documents (like letters to the applicant) and statistical analyses of elements in the database became necessary.Particularly in this situation our ideas did prove their power and their flexibility, because we had not to reinvent anything.We just integrated more of the existing features of those tools we already used and only had to update some interfaces.Now, Microsoft Word'97 is used to generate mail merges by accessing the address database on the Hyperwave server and to store arbitrary documents (which Hyperwave is able to, out of the box).The statistical components of Microsoft Excel (tables, bar graphs, pie charts. . . ) are used.Data has just to be grabbed from the attributes of the database hierarchies (which, of course, is done with the help of another Python COM server).
By the way, during this improvement process we made one of DFG's currently used software packages superfluous.Its whole task was to create mail merges.With the help of our approach several thousand lines of code were replaced by something that was already there and just had to be used.
The resulting system as it exists now has shown in several demonstrations and evaluations that it works efficiently.In addition to the one department which has already made a decision in favor of our system, several other departments at the German Research Foundation are planing to introduce it, because they are convinced that their amount of routine work can be significantly reduced with the help of GoldenGate.
At the moment we are still learning and analyzing how the conventional processes and workflows at DFG work.As those were never written down formally and nobody really knows all of them, we have to use an iterative approach.Whenever we implement something new, people at the DFG have new ideas and discover new possibilities how GoldenGate could help them.Up to now, we were able to realize those with a minimum amount of implementation work.
The most important lesson learnt during this (still ongoing) project is that because of our approach to combine standard (office) tools with thin interfaces, the gain of flexibility goes hand in hand with a reduction of effort (time and money) for maintenance and adaption to new environments or needs, as compared to proprietary or existing commercial solutions.While the complexity of the implementation phase is similar to other systems, we have the advantage, that the actual amount of "code" (including templates and interfaces) we have to maintain or to update is only a small portion of the whole product.The by far larger part (Hyperwave Information Server, MS Office'97) is maintained and improved by other providers.
Our future tasks are the extension of the system to cover more steps in the fields of reviewing and approval of grant proposals.So far we see no problems to integrate them using the techniques and ideas that have proven to work (see above).Another very important point is additional support for the applicants, e.g.templates for other common word-processing systems like L A T E X or Adobe FrameMaker, which we already have and which will be integrated into the system next.
Once the described system is fully operational, an obvious extension is the reuse of the available information in the context of data mining as well as cross media publishing.As all grant proposals (and additional material documenting the applicant's research competence) are electronically available in well-structured and well-defined standard formats (e.g.XML, PDF), relevant information can be extracted automatically and new "documents" like annual reports, overview articles for selected research areas and others on potentially different and varying media like CD-ROMs or WWW contents can be created with a minimum amount of additional work.

Figure 1 :
Figure 1: The structure of the current prototype implementation of GoldenGate.

Figure 2 :
Figure 2: A schematic model of the initial GoldenGate database hierarchy.

Figure 4 :
Figure 4: An UML Use-case diagram for "electronic submission"