DocumentationWhitePaper Comment | Edit | Print | Subscribe to this page

This page aggregates comments on the white paper Initial Software User Documentation Needs in the Cancer Bioinformatics Grid (caBIG) authored by Jim Harrison, Ken Smith and Lynette Grouse. This white paper was originally developed in the User Documentation SIG of the Training Strategic Workgroup, but it also has received significant input from the Best Practices SIG and others.

Current version

Release: Version 1.62. UserDocumentation-v162.pdf (136K PDF; Mar 10, 2005)

Version 1.6 added commentary on the benefits of a centralized, single source text document that may be processed on demand into multiple forms for delivery to users based on immediate needs. It recommends that caBIG experiment with this documentation management approach, while recognizing that initial documentation will probably need to be created with tools that are currently widely available such as MS Word or RTF editors. It also more clearly mentions the situation where Developers and Adopters may collaborate on creating documentation during the initial deployment process. Version 1.62 revises some of the section names and descriptions in the Architectural Description content outline and includes several minor revisions for clarity in the paper and the content outlines for the Administration and Users' Manuals.

Document content outlines for annotation

These wiki pages contain the annotated outlines for the design (architectural) description, administration manual and user manual from the appendices of the white paper above. Please visit these content outlines and annotate, comment or leave questions.

Draft versions

Version 1.6 Documentation-16.pdf (128K PDF; Mar 8, 2005)

Version 1.52 Documentation_1-52.pdf (116K PDF; Feb 28, 2005)

Version 1.51 Documentation_FC_1-51.pdf (117K PDF; Feb 11, 2005)

Version 1.4 Documentation_DR_1-4.pdf (112K PDF; Jan 17, 2005)

Version 1.3 Documentation-DR1_3.pdf (100K PDF; Jan 4, 2005)

Documentation White Paper FAQ

This set of questions and comments was extracted from the discussions of the draft versions of the white paper. Responses to questions and issues are shown in italic.

Documentation Lifecycle

...what is not captured is the lifecycle of the documentation process - creating the documentation, reviewing, updating, disseminating, presenting (web, paper, etc) and other uses.

Will these standards meet our needs now (and in the future) or do we need to describe a migration path to a more robust standard (like CMM)?

This white paper purposely does not attempt to establish a standard documentation lifecycle for caBIG. We believe that there are a number of issues related to that topic that deserve detailed discussion among the developer groups and the Architecture WS, particularly the Best Practices subgroup of Architecture. We propose a short-term documentation guideline for initial software development in caBIG and we hope that some of our proposals may provide a basis for a longer-term plan. Some of these proposals may also stimulate discussion in key areas. The primary goal, however, is to provide immediate guidance while discussion of longer term strategies is initiated.

That said, note that our proposals are based on selection and integration of recommendations from existing vertical market, national and international standards for software lifecycle documentation. We also address the need for presentation through multiple channels, though a complete plan for meeting that and other needs should be developed through a more general discussion.

Collaborative Documentation

The document does not seem to recognize the role of community collaborative platforms for documenting. It seems to have a very static view of what documentation should be. From a user perspective, I find some of the best information on archived mailing lists and on Wiki's. Similarly, semi-formal documents, like FAQ have a long track record as effective means of user feedback, but I don't see reference to them either.

Open source seems to be of strategic importance, but there does not appear to be much effort here to adapt open source approaches for completing documentation. In many cases, it is the users of open source projects who write documentation as their way to contribute. It would be helpful if the documentation approach were to embrace these community-based ways to get useful documents. I.e. it should not be the sole responsibility of the developer to write documentation, but of the community around a particular tool.

Is there really a need for a centralized documentation repository? E.g. is it more important that all of the user manuals for caBIG be found centrally, or is it more important for the user manual to be accessible wherever the tool is to be found?

The paper does recommend a centralized server for current documentation (see below) but does not specify a particular software approach, or updating and distribution workflow for this server. Some commentary touching collaboration software such as wiki has been added in the drafts. However, the primary goal of the white paper is to specify the nature, content and general structure of the documentation itself. We felt that further discussion of workflow was beyond the scope of the initial paper. We also considered that developers would most likely adapt their existing workflows in the initial stages of development for which this white paper is written. While the white paper is written primarily for developers, if an adopter group worked in partnership with a developer group to create documentation, this white paper would also be applicable as a guide for their efforts.

The recommendation for a central repository for documentation isn't meant to totally preclude other sources for documentation, and in particular local copies that are optimally accessible for users. However, if broad deployment of caBIG tools is the goal, then the need for access to documentation extends to the entire caBIG community, not just current or identified adopters. We favor centralization of canonical copies of documentation as the primary documentation resource for all projects, with secondary local caches if they better meet a particular need.

Flexibility in Document Types and Names

In some cases the full complement of three document types (Architecture document, Admin manual, Users manual) may not make sense. Can these documents be combined if that fits a particular application better? What about creating other documents if material exists for a particular application that doesn't fit these document types and structures? Can the document and section or subsection names be changed?

Documents should be optimized for their particular application. If this is done reasonably, the document will make sense to its users and will seem natural. We divided the documentation into three units because in the larger systems (tissue banking, pathology systems and clinical trials) there are traditionally three classes of people that work with the systems: system managers ("administrators"), programmers who write extensions or interfaces into the system, and end users. In our systems these groups have different goals. The use cases for them don't overlap much and thus it made sense to create documents targeted to each group--as the most general case. For some kinds of software, you might omit a document or combine them in various ways. If the use cases overlap or are more limited than the general case, combining the documents is reasonable.

If additional information must be documented, but including that information in the recommended manuals would be awkward or inappropriate for the audiences mentioned above, separate documents can be created based on the needs of the project.

The names of the documents, sections and and subsections were chosen to be descriptive to documentation authors. If other terminology would be more meaningful to readers of the documents in the context of a particular project, then the most meaningful names should be used. We recommend that the structure of the documentation be generally similar across projects, but the terminology used can be optimized for the project. As noted above, the document should make sense to its users and seem natural.

Architectural Description

The Architecture document seems to cover a lot of territory that does not appear to be needed by the pilot adopters, or required for the developers. What is its purpose? Shouldn't the application documentation should be focused on published interfaces? Applications should not need to expose all of their inner workings to be integrated on the grid.

What does "dynamic behavior" mean?

What is "conceptual framework" and why does it come at the end of the architectural description?

The Architecture document really isn't standard user documentation. That should be covered by the Administration Manual, the Users' Manual and (if appropriate, with the substitution of section 8 of the User's Manual) a "Technical Users' Manual." The Architecture document is meant to be a relatively complete description of software that is made available for general use within caBIG and which will most likely be open source. It should support evaluation of the software by potential users or developers who need to know the technology, structure and processing strategy of a system for compatibility or quality assessment, or for debugging, revising or extending an open source code base. It is difficult to generate a document like this after the fact, so we've recommended that it be created along with the other initial documentation. The outline for this document is influenced most strongly by Hewlett-Packard's Template for Documenting Software and Firmware Architectures (ref 13 in the white paper), with some streamlining.

Dynamic behavior is how the components of the system work together. A number of UML model types are intended to depict software behavior and would naturally be included in this section, such as Sequence Diagrams, Collaboration Diagrams, State Charts and Activity Diagrams. Workflow is also related to system dynamics. This is an optional section that's meant to provide a place for specific descriptions of system features that result from components working together and that are not well depicted in static representations like class diagrams. Not all systems need this description, but dynamic behavior is very important in some, such as production systems supporting patient care and other work processes. There are two possible subsections suggested for Dynamic Behavior, 1) use cases/scenarios, which describe system processes that are associated with particular workflows or use cases, and 2) other mechanisms, which can include processes that are not clearly associated with a particular workflow or may be associated with multiple workflows.

Conceptual Framework is a section that provides a place for medium-length descriptions of domain knowledge that are crucial for understanding the strategy for construction of the system. The Architecture document is intended to focus on the software and an extended discussion of a domain area early in the document would disrupt this focus and flow. Conceptual Framework essentially provides an appendix where this discussion can be flexibly structured and presented as a unit. The main document should provide a very brief overview of the domain area and refer the reader to the Conceptual Framework chapter as appropriate. Note that in an actual document, this section need not be named "Conceptual Framework" and may instead be named based on its subject matter.

Versioning

CVS does not have sophisticated enough review, access control, and authentication to support the lifecycle of documentation production.

Whether a central repository for documents and other artifacts would address the synchronization question depends very much on what kind of repository is envisioned. If a dynamic repository that developers can actually use, e.g. something like Gforce or SourceForge, then a central repository could be a very good thing. A repository with references to actual projects, like http://freshmeat.net/, that could work as well. If it is large static repository, then it is likely to be a source of synchronization problems, not a solution to them. In this case, developers would need their own repository.

Is there a file format that can be "frozen" or versioned?

A formal approach to versioning as part of an overall documentation lifecycle strategy would be beneficial. We cannot answer directly whether CVS or other versioning systems such as Subversion (http://subversion.tigris.org/) may have particular strengths and weaknesses for managing documentation, since our primary focus was on documentation content and format rather than management. It is possible that these types of systems may provide greater benefits with text-based files than binary word processor files, since the former are more similar to source code. A SourceForge-like repository of both source code and versioned documentation should be considered as part of the larger development lifecycle discussion.

There are are editors that can password and encrypt their files as "read only," but we believe this really should be handled in management software and workflow. A number of formats provide the ability to clearly label files with versions and dates, but enforcement of good practices in this respect is beyond the ability of a file format.

Licensing

Should the documentation contain a reference to licensing and usage permission for both the software and the documentation? Should there be a requirement of an open source documentation license (e.g. something from the Creative Commons or the like?) If reuse of the documentation is intended as a goal of these recommendations, do we need to make the IP implications of that reuse clearer?

We added a brief discussion in the white paper about licensing and we have added a licensing page to the documentation outlines in the appendixes. It's probably appropriate to refer to both the software and documentation licenses and permissable uses there. In addition to Creative Commons, there is the GNU Free Documentation License from the GNU Project and Free Software Foundation.

Documentation File Format

Although reST may be a perfectly suitable open source format for documents, does it make it easier for the developers to create the documentation? Or is it "yet another text editor" that will require training, support, and a learning curve?

I think RTF is a more viable standard than !reStructured Text as a main target documentation platform. RTF has very wide-support across a wide array of tools and vendors. I perceive reStructured Text as a niche tool at this stage of the game. This tool may be technically feasible and be able to do neat stuff, but it seems like it incurs additional development and adoption costs.

I can see how reST could be useful as part of the processing stream, but as a target format, RTF would seem to offer more implementation flexibility. Folks who want to do clever documentation processing through reST could do that. Folks who wanted to save Word docs as RTF could do that too.

We should find a solution that minimizes the barriers to producing the stuff, or else, as usual it will be done late and poorly b/c no one wants the headache. As a consequence of this - it's important that we choose something which people understand, have used before, and for which alot of shape-shifting tools already exist.

Whatever we pick must be used for many purposes (online, local files, etc) and must be convertable to XML.

RTF is unworkable because RTF files that contain images are huge (i.e., a users manual with numerous screenshots could easily be several hundred megabytes).

It may be that the effort to use text editors that are open source will create more work than simply interchanging the documents created in word with acrobat or XML, etc. I think the goal of caBIG is to create open source intellectual resources, and to, when ever possible, make sure we do not commit to one proprietary system, but I worry that unless we keep the focus on interchange documents, we will be asking developers to write their documentation using unfamiliar, not-widely-used software.

Why not choose PDF as the deployment format? It is multi-site, and multi-platform, the spec is published, royalty free, and there are a variety of implementations available.

It might be helpful to make a clearer distinction between recommended "source" and "deployment" formats for documentation.

File format is a legitimate point of discussion at this time because the choice of format may define the software that must be used to edit the documentation, the overall workflow for documentation production, the forms in which documentation can be provided to users and the systems that must be used to distribute the documentation.

The "source" is the actual format of the documentation file; a "deployment" format is the format of a file that is delivered to the user. These don't need to be the same. For example, a source file could be maintained in DocBook? XML or reStructured Text but converted on demand to XHTML for electronic display or to PDF for downloading and printing. This avoids the need to maintain and edit multiple versions of documentation for different uses. One edit of the Docbook XML source file is automatically reflected in the appropriate change in both the XHTML and PDF files with no additional work. We've tried to make this situation clearer in the revised white paper.

ReStructured Text (reST) is a useful straw man in this discussion and it might also be a good solution. It provides a way to allow editing with any software, from word processors to programming editors, on any combination of operating system and hardware. It can also be processed directly into its own XML format, or an XML format created specifically for documentation (DocBookXML Lite). Alternatively, it can be served directly without prior processing, with dynamic conversion to XHTML or PDF. Since it is a pure text format, it may be more effectively managed in versioning systems that are designed for source code. Thus it has strengths in the areas word processing file formats do not and can serve as a useful counterpoint in the discussion.

PDF is fine for deployment of documentation that is intended primarily for printing. PDF formatted for printing can be inconvenient to read online on anything but very large screen sizes, and insertion of internal and external links for convenient electronic use generally requires manual editing with proprietary editing software. We favor XHTML or XML with stylesheets allowing browser display rather than PDF for online display. Of course it is possible to generate PDF and XHMLT on demand from a single source file, and that may be the most efficient way to implement documentation in caBIG.

A number of commentors prefer Microsoft Word's native format, but a number also prefer that Word not be used and suggest RTF as an alternative. Some who prefer to use Word would refuse to use stylesheets in Word, which would subtantially limit the ability to convert Word documents to useful XML. Some suggest FrameMaker as a better alternative, but the accessibility (and cost) of FrameMaker software is limiting. The white paper contains some comments about proprietary word processor file formats, including Word.

Although RTF is transportable to all platforms, it has several disadvantages. RTF can be converted into "XML" with existing tools, but it is unclear at this point whether "advanced" features of RTF such as footnotes and tables-of-contents can be converted to XML with automatic internal linking as occurs with reST. It's possible that DocBook? converters might handle this, but these features could require a skillful use of styles or additional structuring of the text. RTF as implemented in MS Word also has some problems as alluded to by the comment regarding file sizes. Word's native format embeds the images in compressed form but Word does not compress images in RTF files, leading to very large files. The RTF format does support insertion of compressed images and some word processors do take advantage of this (eg., AbiWord) to create reasonably sized files with embedded images. Additional information is in Appendix (#6) of the white paper.

Developer and User Needs

It might be better to focus the survey (Appendix 1) on the user's perceptions of their documentation needs. For example:

  • Type of deployment anticipated (installation at your sites, API accessible over the net, user interface accessible over the net)
  • Please indicate the types of documentation you expect to need?
  • In what formats do you find documentation most useful?
  • Are the any particular documentation standard that you find useful?

This is worth considering. The current thinking is that the Best Practices SIG in Architecture will follow the initial projects and attempt to track and compile sharable information about problems and successes. This may be the best approach for getting at the kind of information that this survey was designed for and this question addresses.


comments:

Content Structure -- Mon, 14 Mar 2005 15:17:21 -0500 reply
In my experience of reading various Software Architecture, Implementation Spec,etc documents from various different groups, I have always that similar content can be placed in different sections. For example, the class diagrams could be put either in "Architecture Overview" or "Detailed Design" section.

It's very difficult for other people to look at the Table of Contents and figure out the actual section where the information is presented.

A better way (which we used in my previous company) was to develop several templates of these documents targeting different areas of software engg. e.g. Server, Database, Algorithms, Client, etc. Each template had a unique "Table Of contents" geared to the specific area. The idea was that all documentation in a certain area will have the same "standard" set of sections and each of the sections will have similar content. Of course, each specification document could add sub-sections to include project specific information.

With this standardization, it became easy for reviewers to quickly figure out the desired sections and at the same time, kept the document flexible enough for writers to provide information.

The “Best Practices” could come up with standard templates for various types of services such as: 1. Data service 2. Analytical Service

This page was last edited 4 years ago by JimHarrison. View page history | Edit this page
Subject:


Comment:


    with signature
  change all links  leave placeholder


Powered by Zwiki, Zope, Python, and Mac OSX