Edit Rename Changes History Upload Download Back to Top

Documentation Standards Proposal

Title:

Smalltalk Documentation Standards: Draft Proposal

Creator:
Mark Roberts, Cincom Systems, Inc.
Date Issued:
2002-12-30
Identifier:
Replaces:
Not applicable
Latest Version:
Status of Document:
This is a Working Draft.
Addendum: Cincom plans to realize product support for some features described in this proposal under the SmalltalkDoc initiative. For details, see:
http://www.cincomsmalltalk.com/CincomSmalltalkWiki/SmalltalkDoc
Description of Document: This document is a draft proposal of a set of documentation standards for the Smalltalk language. It reports decisions reached by the author(s) through online discussions with interested parties in the Smalltalk community. A Final Working Draft Proposal is planned for Spring 2003, at which time endorsement will be sought from the Smalltalk Industry Council (STIC).
Addendum: during 2003, Cincom has been working to build product support for improved code/doc integration, with the aim to release a proof of concept for evaluation by the Smalltalk community. This new initialive is known as SmalltalkDoc.

General issues:

  1. Why should code include comments?
  2. What are the various types of comments?
  3. What should be included in a comment?
    1. Packages
    2. Classes
    3. Methods
    4. Name spaces
    5. Shared Variables
  4. What guidelines should be followed when writing a comment?
  5. How should comments be formatted?
  6. How should comments be represented internally?
  7. Summary

  1. Why should code include comments?
    The value and importance of comments cannot be overemphasized. To understand the value of code comments, several general points need to be kept in mind.

    First, when we speak about "comments" in Smalltalk, we don't just mean comments in method bodies, but comments that might be attached to a variety of different constructs (e.g., classes, name spaces, shared variables, etc). Each construct has a slightly different function in the larger scheme of things: what is appropriate for a package comment may not be appropriate for a method comment.

    Second, code comments aren't just for developers. They speak to different audiences and must therefore play a variety of roles. Not only are comments read by both seasoned and new developers (who tend to have different needs), but they are also used by writers, QA, and technical support staff.

    Role depends upon audience. Comments are essential for new developers as they try to orient themselves with a package or class. Comments make it possible for technical and marketing writers to produce product documentation, white papers, tutorials, and marketing literature. For new developers and writers, a significant part of work time is spent reading, and code comments at all levels (package, class, method) play a key role in the process of reading and understanding code. For more experienced developers, comments explain what a component, class, or method does, and in this way they make it easier to re-use, debug, and maintain code.

    Uncommented code exacerbates a variety of immediate- and long-term problems. For example, if a component or class has no comment, it may be difficult or prohibitively time-consuming to determine what it does, or whether it even meets basic project requirements. Similarly, classes which lack comments can become difficult or too costly to maintain and debug. Without comments in public methods, it may become too time-consuming to answer simple questions about the interface provided by a class.

    Finally, code comments should be viewed as a first step in a larger and more open-ended process of product development and maintenance. Comments play a role not just in the evolution of code, but in a product development cycle. The successful entry and positioning of a product in the market, the use of library classes, their re-use, and maintenance -- all of these activities depend either directly or indirectly upon development practices that make systematic and consistent use of comments.

    All of this raises a series of questions about what an effective comment looks like. To understand this, we first need to distinguish the various types of code definitions that are unique to Smalltalk.

  2. What are the various types of comments?
    Different code definitions need different types of comments. For example, there are five basic types of comments in VisualWorks:

    1. Component (Packages/Bundles)
    2. Class
    3. Method
    4. Name space
    5. Shared variable

    Note: other dialects of Smalltalk may only define some of these, or additional types (e.g., ENVY components). For the moment, this proposal only considers VisualWorks. Interested parties familiar with other dialects are encouraged to clarify the kinds of comments they use (needed for specifying a cross-dialect documentation scheme).

    VisualWorks allows a single comment string to be attached to each of these definitions. Of these five comments, the first three (component, class, method) are the most essential for communicating the functionality of the code. These comments correspond, roughly, to different levels of documentation, from general to particular.

  3. What should be included in a comment?
    This issue concerns the content of each type of comment.

    Let's start with component and class comments, since these tend to resemble each other. Method comments generally have a different organization, so they need to be considered separately.

    The most important distinction to make in a comment is between what a component does, and how it is implemented. The question "what?" is generally posed before the question "how?", so the explanation of what the component is, what it does, or what service it provides should always be included before the explanation of how it is implemented or why it has been implemented in a certain way.

    1. Package Comments
      1. Summary Sentence (required)
        The first sentence of the comment is generally a one-line summary of what the component is or does. The summary sentence ends with a period. At the very least, every package should include this summary.

      2. Deprecated/Obsolete (optional)
        Deprecated and/or obsolete components should be indicated.

      3. Description (optional, but strongly suggested)
        The first, summary sentence should be followed by a longer description of the component. The description may range from a few sentences to a few paragraphs. It should briefly explain the function or purpose of the package, mentioning explicitly any larger programming frameworks, standards, protocols, algorithms, or general concepts that may be needed to understand the package.

        If the package is bundled with others, the relationship between the packages should be noted. Roughly, this is the reason for dividing a larger component into several packages. What do the classes grouped in this package have in common? Why are they in this package instead of another?

        The organization of the package should also be described. For example, if the package includes one or two public classes, and a dozen or so private classes, then it might be worth mentioning the public classes by name.

      4. References to External Documentation (optional)
        If the package's functionality is documented in more detail elsewhere, a stable link (URL) might be included. For example, a package that implements a network protocol might include a link to the RFC that documents the protocol specification. A package that is documented in a User's Guide might include a link to that guide.

      5. Notes on Usage (optional)
        The description might be followed by some notes on usage. For example, if the package named "Phynance SQL Extensions" should only be loaded as a prerequisite to "Phynance SQL EXDI", the comment might indicate that "Phynance SQL EXDI" is the component that should always be loaded first. Similarly, a comment for the package containing the "Phynance Table Editor" might include a code snippet that can be evaluated to open the Table Editor Tool.

        Notes on usage might include short examples, or links to overviews, tutorials, or walk-throughs.

      6. Limitations or Implementation Notes (optional)
        Any important limitations or implementation peculiarities in the package should be noted.

      7. Metadata / Keywords (required)
        To have any hope of finding this component using automated tools, it needs to have some metadata attached. This might be expressed using something like (for example) the Dublin Core element set. This topic needs further examination to yield a satisfactory solution.

      8. Author and Creation/Modification Date (optional)

      9. Copyright and Licensing information (optional)
        This often appears first, but there is an argument for placing it as the last paragraph(s) of the comment string. To make the comment most accessible to somebody who is unfamiliar with the component, it is helpful if a one-line functional summary appears first. The first question a comment needs to answer is generally "Does this component do what I want?" and then "Are the licensing terms acceptable?". By following this order, new developers won't be subjected to the exasperating experience of scrolling through paragraph after paragraph of licensing information, just to find a functional description of the component.

    2. Class Comments
      1. Summary Sentence (required)
        As in the case of comments for components, the first sentence of a class comment must succinctly express the purpose, function, or service provided by the class. This comment should not be viewed as being somehow incidental to code. The summary should note whether the class is abstract or concrete. The summary sentence ends with a period. Every class should include a summary description.

      2. Deprecated/Obsolete (optional)
        Deprecated and/or obsolete classes should be indicated.

      3. Description (optional, but strongly suggested)
        The first, summary sentence should be followed by a longer description of the class. The description may range from a few sentences to a few paragraphs. It should briefly explain the function or purpose of the class, mentioning explicitly any larger programming frameworks, standards, protocols, algorithms, or general concepts that may be needed to understand the class.

        Important relationships with other classes should be noted. For example, a text scanner that is always used in conjunction with a parser might mention the parser class by name.

      4. Description of variables (required)
        The comment string generally includes: descriptions of instance variables; descriptions of class variables; descriptions of inherited instance variables; instance variables redefined from a superclass; and descriptions of shared variables. Each variable should be mentioned by name, the class of the variable is mentioned, and a short description of its purpose is given.

      5. Interface Descriptions (optional)
        The comment may include interface descriptions (e.g., "Subclasses must implement the following messages/protocols..."; "Subclasses should not implement ...").

      6. References to External Documentation (optional)
        If the class's functionality is documented in more detail elsewhere, a stable link (URL) might be included. For example, a class that implements a specific network protocol might include a link to the RFC that documents the protocol specification. A package that is documented in a User's Guide might include a link to that guide.

      7. Notes on Usage (optional)
        The description might be followed by some notes on usage. This might include code snippets and examples of use. It could include references to overviews, tutorials, or walk-throughs, though generally these should be in the package comment.

      8. Limitations or Implementation Notes (optional)
        Any important limitations or implementation peculiarities in the class should be noted. This includes implementation- or platform-specific behavior.

      9. Author and Creation/Modification Date (optional)

      10. Copyright and Licensing information (optional)
        This often appears first, but as in package comments (see above), there is a strong argument for placing it as the last paragraph(s) of the class comment string.

    3. Method Comments
      A fair amount of ink has been spilled over the notion of using self-documenting code in methods. During the last few years, this has increased as XP methodology now routinely discourages method comments on the grounds that they indicate that the method is unfinished and in need of refactoring. The general rationale has been that with careful refactoring, methods can be made smaller, until they don't need to be sprinkled with explanatory comments.

      It is nevertheless worth distinguishing between comments that appear in the body of a method and the comment that generally appears in the method's heading. As proponants of XP have argued, there are reasons to try to avoid comments in method bodies, but the headings of public methods generally have a different function.

      As a rule, public methods should contain a summary sentence in the method heading.

      The rationale for this is twofold: first, these summary sentences are used by automated documentation tools. Without a summary sentence, the usefulness of such tools decreases sharply. Second, in a language like Smalltalk, where public methods are typically only distinguished from private ones through the (often inconsistent or sloppy) use of protocols, the method comment becomes an important way to understand the class' public interface. In the absence of this comment, it becomes more difficult for new developers to understand the class, and it becomes more difficult for writers to produce product documentation. If a comprehensive description of the API can't be captured without digging through the code, then the time required to understand the class or produce product documentation begins to become prohibitive.

      1. Summary Sentence (required)
        In public methods, the comment should start with a one-sentence explanation of the function, purpose, or service provided by the method. Generally, the sentence also briefly describes the value returned by the method.

        Although summary sentences are often omitted from accessor methods, they should be included for use by automated documentation tools.

        It is important to keep this first sentence concise, as automated tools are likely to use it as a summary. As a rule, the summary sentence needs to make sense independent of the remainder of the comment. This sentence (and the others that follow) should try to avoid allusions to instance variables in the class to which the method belongs. Instance variables are implementation-specific, and may or may not be exposed to external protocol. More specific guidelines for the composition of this summary sentence appear below.

      2. Deprecated/Obsolete (optional)
        Deprecated and/or obsolete methods should be indicated.

      3. Description (optional but often desirable)
        The summary sentence may be followed by a longer description comprised of a few sentences or, occasionally, a few short paragraphs. This description may include details about parameters, details about the return value, and exceptions that may be thrown.

        Additionally, the description may include:

        • Dependencies or preconditions that must hold before the method can be executed
        • Important limitations or aspects of the code that are unfinished (i.e., unhandled exceptional conditions)
        • Implementation details

      4. Example Code (optional but common)
        Often, public methods and utility methods contain a small piece of code that may be evaulated as a DoIt inside the browser's Code Tool.

        Note that if these code fragments are included in the method, they should appear after the summary sentence, so that they do not get placed in API documentation produced by automated tools. Generally, the code fragment is placed within a separate pair of quaotation marks, so that it may be selected easily.

      Standard practice is now to avoid line-by-line comments in methods.

    4. Name Space Comments
      1. Summary Sentence (required)
        The first sentence of the comment is generally a one-line summary of the purpose of the name space. The summary sentence ends with a period.

    5. Shared Variable Comments
      1. Summary Sentence (required)
        The first sentence of the comment is generally a one-line summary of the purpose of the shared variable. The summary sentence ends with a period.

  4. What guidelines should be followed when writing a comment?

    When actually writing a comment sentence-by-sentence, you should consider the following guidelines:

    1. Package Summary

      It is generally most economical to write the summary sentence starting with a verb. For example:

      	Provides support for the Fonebone Database Connect.	(suggested)
      	This package provides support for ...		(discouraged)
      
    2. Method Summary

      For methods, it is generally best to write the summary sentence starting with a verb. For example:

      	Answers whether the receiver is equal to the argument.	(suggested)
      	This method answers ...				(discouraged)
      
      As a general rule in professional writing, the overly familar second-person form ("you...") should be avoided in favor of the third-person form ("it..."). I.e.,
      	Answers the lookup key of the receiver.		(suggested)
      	Answer the lookup key ...				(discouraged)
      
  5. How should comments be formatted?

    Many programming languages suggest keeping comments less than 80 characters long, to avoid line wrapping. Unlike many other toolsets, Smalltalk tools tend to wrap lines automatically. This does not completely eliminate the problem of line wrapping in package and class comments, as developers often insert extra hard returns to achieve formatting effects.

    These formatting characters become an issue with various documentation tools. For example, when converting comments for presentation in HTML, tabs, spaces, and hard returns create problems.

    In package and class comments, it is strongly recommended that hard returns only be used to separate paragraphs.

    For the moment, Smalltalk implementations tend to represent comments using plain-text strngs. It may be desirable to be able to italicize text for emphasis, or to make selective use of a special character style that highlights class, package, and method names. This could (if the user so desires) be displayed using a distinctive type face or color, making the comment easier to read in summaries.

    It is also desirable to be able to embed hyperlinks into package and class comment strings.

  6. How should comments be represented?

    The fact that comments can contain so many different types of information poses a problem. How can a summary be generated automatically by documentation tools? How can descriptions of components and classes be formatted consistently? To resolve these problems, we must consider the comment's internal representation.

    One could imagine the comment being represented as:

    1. A single, plain-text string
    2. a string that can include HTML or XML tags for formatting
    3. an HTML or XML-formatted string, using tags to identify the different parts of the comment (e.g., with elements like: summary, description, comments on usage, licensing, copyright informaton, etc)
    4. a dictionary of strings that belongs to the code definition (component/class/etc), possibly tagged with HTML or XML for formatting.

    The advantages of choosing options (3) or (4) are that documentation tools can be used to present neatly formatted and very precise summaries of components and classes. In the absence of imposed structure, it becomes difficult to build robust documentation tools.

    At the very least, support for hyperlinks or any special formatting of comments, suggests that an alternate representation will be needed.

  7. Summary

    The following specific issues are raised by this proposal:

    1. Identify practices
      For each type of code definition, which practices should be adopted?
    2. Use of Dublin Core tags
      What would be involved in using them as a classification mechanism for packages (and possibly classes)? Would it be difficult to develop an open-source package that provides support for DC tags?
    3. Use of XHTML encoding
      What would be involved in using this encoding for all comment strings? This would be natural for VisualWorks, but would it pose any issues for the other dialects?


Review Comments:

Mark,

Thank you for taking on this important issue. The lack of adequate documentation is a significant barrier to what Richard Gabriel called "code habitability"

"Habitability is the characteristic of source code that enables programmers, coders, bug-fixers, and people coming to the code later in its life to understand its construction and intentions and to change it comfortably and confidently."
While I agree with the XP notions of self-documenting code, some amount of architectural documentation is almost always needed at the higher component levels.

I am very much in favor of XHTML encoded comments - a subset of XHTML plus a subset of the Dublin Core tags plus Smalltalk-specific tags, plus extension tags defined by additional tools (such as modeling tools). The XHTML subset should be rich enough to support text highlighting, links, paragraphs, lists, tables, and images (often worth a thousand words!).

It should be possible to validate comments against an XML Schema that describes the allowed/required tags. Whether or not this validation is done is a separate issue - Smalltalkers tend to resist authoritarian mandates.

If comments are encoded with tags, then browers and other tools need to support two options for viewing them: an editable, unformatted version for developers that shows all tags, and an uneditable, formatted version for code readers.

I also have the following specific comments on your proposal:

  1. A "depricated/obsolete" indicator is also needed for higher level components.
  2. Class comments should include specific information about each Event signaled by instances of the class and that can be observed by its clients, including the event symbol, when it is signaled, and parameters passed with the signal. If an event is signaled by multiple classes, it would be sufficient to provide a link to a description of the event.
  3. Method comments should include specific information about each Exception raised by the method - by itself or by a method called to raise the exception. This does not include exceptions raised by lower level methods because of bugs, only those specifically defined to be part of a public method's external interface.

Regards...Rich Demers

December 31, 2002



Mark,

Thank you. This really impressed me. Though I fall into that XP freak category and might find some of the parts "more than necessary", I at this point have a very clear view of what it is you want to do, what the vision is, and that's worth a lot. Thanks again.

Travis Griggs


Edit Rename Changes History Upload Download Back to Top