Sunday, June 23, 2013

Coding Standards - The Wright Way - Formatting

Formatting is a lot more important than it seems.  If you don't believe that, then!justLOOKat,this-mess?here.  You get the point, I trust.

Most languages in use today are C-like in structure.  Block delineators like {} or BEGIN...END or some such thing, explicit end-of-line markers (or implied ones) so that statements can span multiple lines, and indentation allowed.  For languages that don't have any of those things, you're on your own, but as a bit of advice, I'd advise you to move into the 21st century, for dog's sake.

Bracing Style

Now the management of block delineators is called 'bracing style', and there are lots of them.  At least a half-dozen that I know of personally that are in use, and another half-dozen that I've heard of but never actually seen in use beyond sample code.  There is a reason many fall by the wayside.  Lots of them are junk.  A few stick around, though.

Of the kinds I know and have actually seen used here is a sample:

Whitesmith


if( <expression> )
    {
    <statement>;
    <statement>;
    }


TB (or 1TBS or One True Brace or Kernighan and Ritchie or K&R)

if( <expression> {
    <statement>;
    <statement>;
}


PICO

if{ <expression> )
{ <statement>;
<statement>; } 

GNU

if( <expression> )
    {
        <statement>;
        <statement>;
    }


Allman or Block

if( <expression> )
{
    <statement>;
    <statement>;
}


Most of them are just "I do it that way" styles - they have no particular benefit.  Many of them are designed to conserve one thing - screen real-estate.

On conserving screen real-estate, that was a valid concern back in 'the day' when K&R were coding.  Even a high-end monitor had 24 visible text lines to play with and a line length of 80 characters per.  In a VI editor, 2-4 of those were chewed up by the editor itself, leaving 20 for actual coding.  Screen real-estate was a big deal.  Nowadays, you've got upwards of 40 lines available in most code editors with something in excess of 120 characters per line - more if you've got good vision and can use smaller fonts.  If you still feel a deep-seated need to conserve screen real-estate, then whatever you're doing, you're doing it wrong.

I've always used the Allman/Block style myself, for one very simple reason.  It's nice and sparkling clear.  I used it even back in the days when I was dealing with 24x80 monitors - yes, screen real-estate was an issue, but to me the clarity of the code was a more important consideration.  That opinion has not changed.

I've tried (and sometimes had to use) the K&R/OTB style popularized in Javascript and while I can see the intent of the original use, it is no longer important in that respect and it should have died out a long time ago from lack of use.  The rest are either screen-conserving to the point of being unclear, or unnecessarily baroque.

Almost any editor will natively support block bracing, most do that by default, and the ones that don't can be tweaked that way or used that way with minimal effort.  So use block bracing:

Function/Method Bodies (Including Accessors/Properties)

Function/Method bodies should ALWAYS be code blocks, and thus ALWAYS be enclosed in braces, regardless of length.  That is, none of this:

void main()
   MyFunction(); 


instead use:

void main()
{
    MyFunction();
}

Control Constructs

In simple control constructs, if the body of the construct has only one line, and that line is not itself a control construct, then braces are optional, but if the control constructs are nested, then brace them individually - if you don't, it is very easy to construct control trees where the actual flow is not evident from the indentation.  So:

if( <expression> )
    <statement> 


or

if( <expression> )
{
    <statement>
}


or

if( <expression> )
{
    while( <expression> )
        <statement>
}


BUT NOT

if( <expression> )
    while( <expression> )
        <statement> 


In the simplest case, the latter will work, but at even low levels of complexity, the precedence can be confused or lost and those can be fiendishly difficult bugs to track down, especially since there is generally nothing obviously wrong with the code unless you examine it very carefully.

Special Cases - Switch Statements

In case clauses of a switch() statement, you may use braces in individual case clauses or not, at your discretion.  Clarity is important, but there is no need to go nuts.  Switch...case statements are by their nature fairly clear already.  I would recommend that if the body of the case clause is longer than five lines or so, then making a code block is probably a good idea, so use braces.  If it's only two or three lines, why bother unless you think it's important (if for example there is a comment)?  But the switch() statement should always be a code block, so this is fine:

switch( nVar )
{
    case 0:
       return 9;
    case 55:
       print( "I'm a duck" );
       break;
    case 99:
    {
       print( "I'm a moose" );
       <other lines of code>
       break;
    }
    default:
    {
       // Nothing to do
     
       return -1;
    }
}


As shown, indent the case statements one tab from the switch, and the body of each case statement one tab from the case.

To Tab or Not To Tab

For indenting, I've actually had university students say that their professors instructed them to use spaces for indenting, or to use tabs but set the editor to convert tabs to spaces.  This reinforces my opinion that university professors really need to get out of the classroom more.

That was a valid procedure 'way back in ancient times when not all editors expanded tabs correctly.  I haven't seen an editor like that in 20 years.  Set your editor to use tabs, and to KEEP THE TABS.

Why?  Remember that rap about multiple coders?  If you keep the tabs, then if I like to indent 4 characters, Fred likes to use 2 and Jane likes to use 8, each of our editors will expand the tabs as far as we have the editor set to expand them, and we'll all see what we want to see regardless of who wrote the code.  At the end of the day, the code will all be indented uniformly in the proper way according to local settings, and you won't wind up with any of this junk:

public void MyFunc( int xnParameter )
{
  nReturnValue = 0;

      switch( xnParameter )
    {
                case 0:
        nReturnValue = 42;
             break;
      case 1:
                                   nReturnValue = 16;
                            break;
                                           case 2:
                    nReturnValue = 88;
                break;
          default:
                                  nReturnValue = -1;
                         break;
                   }

         return nReturnValue;
  } 


'Nuff said.

Spacing

Now, on the use of spacing in statements.  There are a number of places where this is used:

In brackets ():
  • If the brackets are empty, place them together: void MyFunction()
  •  If the brackets are not empty, then a space after the opening brace and before the closing brace: ( int xnParameter )
  • If the brackets are enclosing a type-specification (as in a type-cast), no spaces (to make it visibly distinct): (float)nValue
For square brackets, as are commonly used for array dimensions, the same rules apply.  Together if empty or spaces after the opening square bracket and before the closing square bracket if they are not empty: int[] anMyArray = new int[ 2 ];

For angle brackets, as used in C++ and C# for Template/Generic types, no spaces (to make it visibly distinct): void MyGenericFunction<T>( T xoParameter )

If a comma or semi-colon is used as a separator, no space before it, one space after it:

void MyGenericFunction<T, P>( T xoParam1, P xoParam2 )

If a comma is used as a thousands separator, no space before or after it, just like regular people: 1,000,000

Periods used as decimal places or type/member separators, no space before or after:

System.Web.HttpCookie 

or

1.05

(In French the use of commas and periods is generally reversed in numbers, so the same rules apply anyway:  1.000,50)

If there is an explicit end-of-line designator, no space before it:  int nMyVar = 0;

No leading or trailing spaces in lines.  If there is whitespace leading, it should be a tab, and there should be none trailing.  It's wasted space that will only ultimately force some schmoe to have to reformat the code eventually, because it'll screw up a cut/paste operation.

If you use Labels (and they should seldom be used, but I'm realistic - sometimes they're the best solution), then place the label flush against the left-hand-side of the page.

Put spaces before and after all operators and use brackets freely in expressions to make precedence clear even if it is implied and especially if the desired precedence is different from the default precedence.  Assume nobody knows the implicit precedence by heart:

int nMyVar = 5 + ( 3 - ( 18 / 2 ) + 5 );

The exception are auto-increment and auto-decrement operators, which need not have a space between them and their associated variable:  nValue--;


In-line Void Function Bodies

There is much debate on this.  Some say "NO DAMN WAY" and others say "Why not?"  I say, go ahead if you can keep it clear, but be careful, so:

public MyList: List<int>
{
    public void MyList(): base() {}
    public void MyList( int xnCapacity ): base( xnCapacity ) {}
    public void MyList( IEnumerable xanValue ): base( xanValue ) {}
}

is clear enough - the constructors call the base constructors in the normal way and do nothing else.

But in some cases, it is very bad.  For example in a catch() statement, if you use an empty code block you are swallowing an exception.  If it is manifestly clear why you are doing it (ThreadAbortException), then no problem, but if it's even possible to be unclear, then at least EXPLAIN YOURSELF.

You should never swallow an exception willfully - it's Bad Juju.  If you've got a reason, make sure that if somebody else comes along, or you are looking at it a year from now, you've left a note that gives your reason so you don't put in some handling just because it's pro forma and, because you didn't know or forgot the original reason, you introduce a problem that didn't exist before.

try
{
}
catch( ThreadAbortException ) {}
catch( Exception )
{
   // I'm not doing anything with this exception because this is just an example.
}

or even use a hybrid:

try
{
}
catch( ThreadAbortException ) { /* Automatically thrown on Abort() - no need to handle */ }
catch( Exception )
{
   // I'm not doing anything with this exception because this is just an example.
}

And, of course, if you're just going to re-throw, you don't have to really explain that (except perhaps to say why you have the catch() there in the first place):

try
{
}
catch( Exception ) { throw; }  // Why??

but this is obvious enough:

try
{
}
catch( Exception xoError )
{
    Log( xoerror );
    throw xoError;
}


Commenting

Some people get really rabid and frothing about commenting.  Let me be blunt.  If you're coding well, what you are doing will be evident even to juniors - most of the time.  Commenting every line of code is just wasting time, and those who espouse such practice do it out of dogmatic adherence to ancient strictures that have long since fallen away.  Back in the days when an identifier could only hold 4 letters, commenting was important, because without it most code was nearly unintelligible.  Now we have long identifiers, and it's not as important, since the identifiers can be self-describing.

Having said that, if what you are doing is even a little unclear or tricky, or your algorithm is unusual or complicated, then explain yourself - step-by step if that's what it takes to be clear.  If it's a really weird algorithm, you might want a fairly large block explaining it in general PLUS step-by-step comments as you code the algorithm to describe the steps as they occur.  What you require to be clear is what is right.  Not what some schmoe tells you because he's perpetually stuck in the '60s.

On commenting functions, I generally go with the principle that all public members should at least have cursory commenting to describe function, parameters, and return values.  the more complicated the function, the more commenting is appropriate.  For private or protected members, comment if you deem it necessary.  However, if you're using standard Form event handlers in C# for example, then:

private void MyOKButton_Click( object xoSender, EventArgs xoArguments ) { ... }

doesn't leave much to the imagination, does it?  Any comment you add to that will only be reiterating what you have so thoughtfully coded quite clearly, unless that handler looks that way artificially and it doesn't actually do what it clearly appears to be doing - handling the Click event of a Button called "MyOKButton".

In Visual Studio, there is an auto-documenting comment feature that can be turned on that is very handy and I highly recommend using it.  In that case, make sure that you comment anything that is not clear from the FUNCTION PROTOTYPE - operating on the assumption that the prototype may be all that the person can see - they might be reading the documentation generated from the comment.

In that environment, also keep in mind that your function summaries are what appear as the tool-tip when the built-in IntelliSense system tries to supply automatic help for your function - so take advantage of it.  In the long run it will save you and your associates much time for very little pain.  Such systems may appear in other IDEs as well - where available, take advantage of them.  Over-commenting has never killed anybody, and is preferable to under-commenting.


Copyright ©2013 by David Wright. All rights reserved.

No comments:

Post a Comment