refdoc/xml.txt

                 +--------------------------------------+
                 | Pike autodoc markup - the XML format |
                 +--------------------------------------+

======================================================================
a) Introduction
----------------------------------------------------------------------

When a piece of documentation is viewed in human-readable format, it 
has gone through the following states:

  1. Doc written in comments in source code (C or Pike).

  2. A lot of smaller XML files, one for each source code file.

  3. A big chunk of XML, describing the whole hierarchy.

  4. A repository of smaller and more manageable XML files.

  5. A HTML page rendered from one such file. 
     (Or a PDF file, or whatever).

The transition from state 1 to state 2 is the extraction of 
documentation from source files. There are several (well, at
least two) markup formats, and there are occasions where it is
handy to generate documentation automatically &c. This document 
describes how a file in state 2 should be structured in order to
be handled correctly by subsequent passes and presented in a
consistent manner.

======================================================================
b) Overall structure
----------------------------------------------------------------------

Each source file adds some number of entities to the whole hierarchy.
It can contain a class or a module. It can contain an empty module, 
that has its methods and members defined in some other source file,
and so on. Suppose we have a file containing documentation for the
class Class in the module Module. The XML skeleton of the file 
would then be:

  <module name="">
      <module name="Module">
          <class name="Class">
              ... perhaps some info on inherits, members &c ...
              <doc>
                  ... the documentation of the class Module.Class ...
              </doc>
          </class>
      </module>
  </module>

The <module name=""> refers to the top module. That element, and its 
child <module name="Module">, exist only to put the <class name="Class"> 
in its correct position in the hierarchy. So we can divide the elements
in the XML file into two groups: skeletal elements and content elements. 

Each actual module/class/whatever in the Pike hierarchy maps to at most
one content element, however it can map to any number of skeletal elements. 
For example, the top module is mapped to a skeletal element in each XML
file extracted from a single source file. To get from state 2 to state 3 
in the list above, all XML files are merged into one big. All the elements
that a module or class map to are merged into one, and if one of those 
elements contains documentation (=is a content element), then that
documentation becomes a child of the merger of the elements.

======================================================================
c) Grouping
----------------------------------------------------------------------

Classes and modules always appear as <module> and <class> elements. 
Methods, variables, constants &c, however, can be grouped in the 
source code:

  //! Two variables:
  int a;
  int b;
  
Even a single variable is considered as a group with one member. 
Continuing the example in the previous section, suppose that Module.Class
has two member variables, a and b, that are documented as a group:

  <module name="">
    <module name="Module">
      <class name="Class">
          ... perhaps some info on inherits, members &c ...

        <docgroup homogen-type="variable">
          <variable name="a"><type><int/></type></variable>
          <variable name="b"><type><int/></type></variable>
          <doc> 
            ... documentation for Module.Class.a and Module.Class.b ...
          </doc>
        </docgroup>

        <doc>
           ... the documentation of the class Module.Class ...
        </doc>
      </class>
    </module>
  </module>

If all the children of a <docgroup> are of the same type, e.g. all are 
<method> elements, then the <docgroup> has the attribute homogen-type 
(="method" in the example). If all the children have identical name="..." 
attributes, then the <docgroup> gets a homogen-name="..." attribute aswell.

The <docgroup> has a <doc> child containing the docmentation for the other 
children of the <docgroup>. An entity that cannot be grouped (class, module,
enum), has a <doc> child of its own instead.

======================================================================
d) Pike entities
----------------------------------------------------------------------

Pike entities - classes, modules, methods, variables, constants, &c, have some
things in common, and many parts of the xml format are the same for all of
these entities. All entities are represented with an XML element, namely one 
of:

  <class>
  <constant>
  <enum>
  <inherit>
  <method>
  <modifier>    
  <module>
  <typedef>
  <variable> 

The names speak for themselves, except: <modifier> which is used for modifier 
ranges: 

  //! Some variables:
  protected final {
    int x, y; 

    string n;  
  }

A Pike entity may also have the following properties:

  Name - Given as a name="..." attribute:
    <variable name="i"> ... </variable>

  Modifiers - Given as a child element <modifiers>:
    <variable name="i">
      <modifiers>
        <optional/><static/><private/>
      </modifiers>
      ...
    </variable>
  If there are no modifiers before the declaration of the entity, the
  <modifiers> element can be omitted.

  Source position - Given as a child element <source-position>:
    <variable name="i">
      <source-position file="/home/rolf/hejhopp.pike" first-line="12"/>
      <modifiers>
        <optional/><static/><private/>
      </modifiers>
      ...
    </variable>
  The source position is the place in the code tree where the entity is 
  declared or defined. For a method, the attribute last-line="..." can be
  added to <source-position> to give the range of lines that the method 
  body spans in the source code.

And then there are some things that are specific to each of the types of
entities:

<class>
   All inherits of the class are given as child elements <inherit>. If there
   is doc for the inherits, the <inherit> is repeated inside the appropriate 
   <docgroup>:

     class Bosse { 
       inherit "arne.pike" : Arne; 
       inherit Benny;   

       //! Documented inherit
       inherit Sven;
     }

     <class name="Bosse">
       <inherit name="Arne"><source-position ... />
                            <classname>"arne.pike"</classname></inherit>
       <inherit><source-position ... />
                <classname>Benny</classname></inherit>
       <inherit><source-position ... />
                <classname>Sven</classname></inherit>
       <docgroup homogen-type="inherit">
         <doc>
           <text><p>Documented inherit</p></text>
         </doc>
         <inherit><source-position ... />
                  <classname>Sven</classname></inherit>
       </docgroup>
       ...
     </class>
 
<constant>
   Has a name attribute. Contains optional <type> and
   <source-position> child elements.

<enum>
   Works as a container. Has a <doc> child element with the documentation of
   the enum itself, and <docgroup> elements with a <constant> for each enum
   constant. So:

     enum E
     //! enum E
     {
       //! Three constants:
       a, b, c,
     
       //! One more:
       d
     }

   becomes:

     <enum name="E">
         <doc><text><p>enum E</p></text></doc>
         <docgroup homogen-type="constant">
             <doc><text><p>Three constants:</p></text></doc>
             <constant name="a"/>
             <constant name="b"/>
             <constant name="c"/>
         </docgroup>
         <docgroup homogen-name="d" homogen-type="constant">
             <doc><text><p>One more:</p></text></doc>
             <constant name="d"/>
         </docgroup>
     </enum>
     
   Both the <enum> element and the <constant> elements could have 
   <source-position> children, of course.

<inherit> 
   The name="..." attribute gives the name after the colon, if any. The name
   of the inherited class is given in a <classname> child. If a file name is
   used, the class name is the file name surrounded by quotes (see <class>).

<method>
   The arguments are given inside an <arguments> child. Each argument is 
   given as an <argument name="..."> element. Each <argument> has a <type>
   child, with the type of the argument. The return type of the method is 
   given inside a <returntype> container:

     int a(int x, int y);
  
     <method name="a">
       <arguments>
         <argument name="x"><type><int/></type></argument>
         <argument name="y"><type><int/></type></argument>
       </arguments>
       <returntype><int/></returntype>
     </method>
                 
<modifier>
   Works as a container ... ???

<module>
   Works just like <class>.

<typedef>
   The type is given in a <type> child:

     typedef float Boat;
     
     <typedef name="Boat"><type><float/></type></typedef>

<variable>
   The type of the variable is given in a <type> child:
     
     int x;

     <variable name="x"><type><int/></type></variable>

======================================================================
e) Pike types
----------------------------------------------------------------------

Above we have seen the types int and float represented as <int/> and <float/>.
Some of the types are complex, some are simple. The simpler types are just on
the form <foo/>:

  <float/>
  <mixed/>
  <program/>
  <void/>

The same goes for mapping, array, function, object, multiset, &c that have 
no narrowing type qualification: <mapping/>, <array/>, <function/> ...

The complex types are represented as follows:

array
   If the type of the elements of the array is specified it is given in a
   <valuetype> child element:

     array(int) 
 
     <array><valuetype><int/></valuetype></array>

   If the length of the array is specified it is given in a <length>
   child element:

     array(3)

     <array><length>3</length></array>

function
   The types of the arguments and the return type are given (the order 
   of the <argtype> elements is significant, of course):

     function(int, string: mixed)

     <function>
       <argtype><int/></argtype>
       <argtype><string/></argtype>
       <returntype><mixed/></returntype>
     </function>
   
int
   An int type can have a min and/or max value. The values can be numbers or
   identifiers:

     int(0..MAX)
   
     <int><min>0</min><max>MAX</max></int>

string
   A string type can have a numerical min and/or max character value. 
   The values can be numbers or identifiers:

     string(0..255)

     <string><min>0</min><max>255</max></string>

mapping
   The types of the indices and values are given:

     mapping(int:int)

     <mapping>
       <indextype><int/></indextype>
       <valuetype><int/></valuetype>

multiset
   The type of the indices is given:
  
     multiset(string)

     <multiset>
       <indextype><string/></indextype>
     </multiset>

object 
   If the program/class is specified, it is given as the text child of 
   the <object> element:

     object(Foo.Bar.Ippa)

     <object>Foo.Bar.Ippa</object>

Then there are two special type constructions. A disjunct type is written
with the <or> element:

  string|int

  <or><string/><int/></or>

An argument to a method can be of the varargs type:

  function(string, mixed ... : void)

  <function>
    <argtype><string/></argtype>
    <argtype><varargs><mixed/></varargs></argtype>
    <returntype><void/></returntype>
  </function>

======================================================================
f) Other XML tags
----------------------------------------------------------------------

p
   Paragraph.

i
   Italic.

b
   Bold.

tt
   Terminal Type.

pre
   Preformatted text.

code
   Program code.

image
   An image object. Contains the original file path to the image. Has the
   optional attributes width, height and file, where file is the path to
   the normalized-filename file.

======================================================================
g) XML generated from the doc markup
----------------------------------------------------------------------

The documentation for an entity is put in a <doc> element. The <doc> element 
is either a child of the element representing the entity (in the case of 
<class>, <module>, <enum>, or <modifiers>) or a child of the <docgroup> that
contains the element representing the entity.

The doc markup has two main types of keywords. Those that create a container
and those that create a new subsection within a container, implicitly closing
the previous subsection. Consider e.g.:

  //! @mapping
  //!   @member int "ip"
  //!     The IP# of the host.
  //!   @member string "address"
  //!     The name of the host.
  //!   @member float "latitude"
  //!   @member float "longitude"
  //!     The coordinates of its physical location.
  //! @endmapping

Here @mapping and @endmapping create a container, and each @member start a 
new subsection. The two latter @member are grouped together and thus they
form ONE new subsection together. Each subsection is a <group>, and the 
<group> has one or more <member> children, and a <text> child that contains
the text that describes the <member>s:

  <mapping> 
      <group>
          <member><type><int/></type><index>"ip"</index></member>
          <text>
            <p>The IP# of the host.</p>
          </text>
      </group>
      <group>
          <member><type><string/></type><index>"address"</index></member>
          <text>
              <p>The name of the host.</p>
          </text>
      </group>
      <group>
          <member><type><float/></type><index>"latitude"</index></member>
          <member><type><float/></type><index>"longitude"</index></member>
          <text>
              <p>The coordinates of its physical location.</p>
          </text>
      </group>
  </mapping>

Inside a <text> element, there can not only be text, but also a nested level
of, say @mapping - @endmapping. In that case, the <mapping> element is put in 
the document order place as a sibling of the <p> that contain the text:

  //! @mapping
  //!   @member mapping "nested-mapping"
  //!     A mapping inside the mapping:
  //!     @mapping
  //!       @member string "zip-code"
  //!         The zip code.
  //!     @endmapping
  //!     And some more text ... 
  //! @endmapping
    
  becomes:

  <mapping>
    <group>
      <member><type><mapping/></type><index>"nested-mapping"</index></member>
        <text>
          <p>A mapping inside the mapping:</p>
          <mapping>
            <group>
              <member><type><string/></type><index>"zip-code"</index></member>
              <text>
                <p>The zip code.</p>
              </text>
            </group>
          </mapping>
          <p>And some more text ...</p>
        </text>
      </group>
  </mapping>

Inside the <p> elements, there may also be some more "layout-ish" tags like 
<b>, <code>, <tt>, <i>, needed to make the text more readable. Those tags are
expressed as @i{ ... @} in the doc markup. However there are no <br>. A 
paragraph break is done by ending the <p> and beginning a new. A </p><p> is 
inserted for each sequence of blank lines in the doc markup:

  //! First paragraph.
  //! 
  //! Second paragraph.
  //! 
  //! 

  becomes:

  <p>First paragraph.</p><p>Second paragraph.</p>

Note that the text is trimmed from leading and ending whitespaces, and there 
are never any empty <p> elements.

In the example above the keyword `@mapping' translated into <mapping>, whereas
the keyword `@member string "zip-code"' translated into:
  <member><type><string/></type><index>"zip-code"</index></member>

The translation of keyword->XML is done differently for each keyword. How it
is done can be seen in lib/modules/Tools.pmod/AutoDoc.pmod/DocParser.pmod. Most
keywords just interpret the arguments as a space-separated list, and put their
values in attributes to the element. In some cases (such as @member) though, 
some more intricate parsing must be done, and the arguments may be complex 
(like Pike types) and are represented as child elements of the element. 

======================================================================
h) Top level sections of different Pike entities.
----------------------------------------------------------------------

In every doc comment there is an implicit "top container", and subsections can
be opened in it. E.g.:

  //! A method.
  //! @param x
  //!   The horizontal coordinate.
  //! @param y 
  //!   The vertical coordinate.
  //! @returns
  //!   Nothing :)
  void foo(int x, int y)

becomes:

  <docgroup homogen-name="foo" homogen-type="method">
      <doc>
          <text><p>A method.</p></text>
          <group>
              <param name="x"/>
              <text><p>The horizontal coordinate.</p></text>
          </group>
          <group>
              <param name="y"/>
              <text><p>The vertical coordinate.</p></text>
          </group>
          <group>
              <returns/>
              <text><p>Nothing :)</p></text>
          </group>
      </doc>
      <method name="foo">
         ......
      </method>
  </docgroup>
  
Which "top container" subsections are allowed depends on what type of entity is
documented:

ALL      -  <bugs/>
            <deprecated> ... </deprecated>
            <example/>
            <note/>
            <seealso/>

<method> -  <param name="..."/>
            <returns/>
            <throws/>