Sunday, June 23, 2013

Coding Standards - The Wright Way - Naming Conventions

Now we're into it.  Lots of people get really firmly attached to their particular naming convention, and generally not for a very good reason.  To a degree, I'm guilty of that, too.  I started using Hungarian Notation a long time ago, and I still favor it.  However, what I use now is not really Hungarian Notation.  It's a variant that has seen a good many tweaks and modifications over the years, some of my own invention, some ideas that I got from others.

Why Hungarian Notation?  Well, because even though most languages are strongly typed, I like it for the following reasons:


  1. I don't have to go back to the declaration to find out what the thing is - I can see that.
  2. If I have several variables that refer to the same data but in different forms, I don't have to come up with contrived and baroque ways to distinguish them.  I can distinguish them literally by type.  That means that the association of the data is retained in the name, and not lost because I had to jump through a dozen hoops because it changes data type a couple of times.
  3. It makes it clear from the code whether up-casting or down-casting can be implicit or must be explicit.
  4. With things like IntelliSense, if I've forgotten what the thing is and only know vague particulars about it, I can still quickly narrow the scope of candidates to a very small field - by the prefix.
  5. If you name variables and such with Hungarian Notation, you will absolutely NEVER have a name-collision with your compiler/programming language, because NONE OF THEM USE IT.
Now a lot of people pooh-pooh Hungarian Notation simply because it is Hungarian Notation.  "No real programmer uses that any more!  Harrumph!" Bullshit.  What works works because it works.  I don't care if Charles Babbage came up with the idea.  If it's good it's good.   And if you're one of those people who insists that new is always better, I have exactly two words for you:  "New Coke".

One of the more baroque reasons for not using Hungarian Notation that I've come across is, oddly, that it's not Hungarian Notation.  That is not a reason for not using Hungarian Notation.  That's a reason for not using the things that are like Hungarian Notation where the association is by name only and the thing really violates pretty much all of the principles of Hungarian Notation.

What I do notice frequently about the above group, though, is that even though they firmly pooh-pooh Hungarian Notation, they frequently cobble together contrived and inefficient variants of it on the spot in their code because they find themselves in need of just such a mechanic!  Instead of using ad hoc mechanisms that are invented on the spur of the moment to meet a particular need, isn't it better just to use the thing pro forma and have the problem solved before it occurs?  Clarity, remember?

Basic Rule

Now, the Basic Rule of naming conventions.  This should never be disregarded, and if you choose to do so, you do so at your own peril.  Many compilers will enforce it, so they won't give you much choice, but some are a little more lenient, and in those cases, you have to impose the discipline on yourself.  So what you don't want to do:
  1. Never use a character in a name that has a special meaning in any other context.
I really shouldn't have to say that, but I see that happening every bloody day and it sticks in my craw something awful.

Recommendations

Now, besides that, there are a few tried and true recommendations.  Here they are:
  1. If your compiler or language imposes all-upper-case or all-lower-case, then you're stuck with it.  If neither is true, then you have the freedom to choose, so use BOTH cases.  I'm talking about Title-case, or Pascal-case as it's sometimes called. That is, if the thing you are naming is a first name, then you should use a variable name like FirstName.
  2. Do Not Ever Use Camel-Case for Anything, Period.  I don't know who came up with that, but it was a dumb idea then and it hasn't aged well.  Camel case is like Title-case but the first character is not capitalized: firstName.  Ugh.  I will explain that in more detail later, but for now, just take it as read.
  3. Lay off the underscores.  Lots of people use them, and it's a bad idea for one very simple reason.  That character is in an awkward place on the keyboard, and there are a dozen ways to arrange your life so it is not useful, so why add the pain when there is absolutely no gain?
  4. If it is possible, restrain your naming convention to using alphabetic characters only.  That's just for the sake of speed.  All of this comes together, I promise.
  5. Don't use single-character names unless the name is naming a variable that is being used in a mathematical equation and that's how the variable appears in the equation.  So if you're calculating the conversion of mass to energy, you can use rE = rM * rC ^ 2.0F );, but if you're naming a generic index in a loop, then call it nIndex, not nI.

Name Composition

How do you compose a name?  Well, what does the thing represent, or do?  That's the name.  So FirstName, LastName, FullName and Age are all OK. A function that combines FirstName and LastName into a complete name might be declared as string MakeFullName( string xsFirstName xsLastname );.  Some people get caught up in rigid Noun-Verb orders and nonsense like that.  I say if you have Nouns and Verbs, then put them in an order that makes sense.  If under the circumstances FullNameGet makes more sense than GetFullName, then go with it.  Rigid rules, in my experience, ultimately lead to nonsense names that have to be nonsense because that's what the rule says they have to be.  Rules aren't meant for that kind of thing.

You may choose the order based entirely on an artificial environmental constraint.  Perhaps your IDE keeps a list of classes, properties, and methods and by default sorts them in canonical alphabetic order in that list.  For the sake of convenience, you might choose to go noun-verb so that things are associated by name in that list, or verb-noun so that they are associated by function.  Either is a valid reason. Do what works best.  Just one caveat, though.  If you're in an environment where many people work on the same code-base, then get together with them and arrive at a consensus about how you want that list to sort.  Everybody will have to deal with it, so everybody should get a shot at airing their concerns.

Having said that, how about using abbreviations?  Well, if you can safely use the abbreviation without losing clarity, then by all means save yourself the typing time.  I'm not going to stand here and demand that you name a variable SizeInMilliMeters instead of MmSize just because I slapped an arbitrary rule on you that said, "You can't use abbreviations".  That's nonsense.  If you don't know that 'mm' is short for Millimeter then you've got bigger problems than I'm inclined to help you with.


Bringing It Together

Now, you're going to say that it looks like I'm advocating the exclusive use of Title-case.  You got it right the first time!  Move to the front of the class.  If you have the freedom of mixed-case, then there is no reason to use anything else.  TitleCase makes things clear to read and it's really fast to type.  Couple that with Hungarian Notation, and what you've got is a naming convention that depends virtually entirely on the alphabet alone, is fast and easy to type, clear and expressive.  What else do you really need?  Nothing.

Now, what should you name like this?  Anything not covered below, so Class/struct declarations, method definitions, delegate declarations, enumeration definitions, and event definitions.  Now there is a certain amount of leeway, of course.  If you are writing a container class and you want the Count and Capacity properties to be Count and Capacity for the sake of consistency with other container classes, I'm not going to rag on you about 'not following the standard'.  Consistency contributes to clarity, too you know.  And while it would be nice if everybody agreed on something, that's not happening at least in my lifetime, so live and let live.

One benefit of that leniency, however, is that if you have a container class, for example, and you define a property Count that does exactly what Count does in every other container class, but you need another, similar one that is almost but not exactly like Count, you can provide a different one.  Count for the property that behaves just like Count, and nCount for the special variant that does something just slightly different, thus gaining the benefit of both while sacrificing neither.

Camel-case and Loathing

I mentioned I would explain my loathing of CamelCase.  One very popular coding standard espouses the following ridiculous doctrine:  Use CamelCase for fields and parameters, and TitleCase for Properties and Variables.  Why is that ridiculous?  Think of it.  You have a class.  It has a field which convention requires you to name "firstName".  It has a property called, by convention, 'FirstName' that reads the variable out as stored, but when a value is passed in, the property code validates the contents of the value before it assigns it to the field.  This is not the least bit unlikely as a scenario - in fact it is quite likely.

What you have now is this:  Code where whether or not the contents of that field get validated depends on whether or not you accidentally miss the shift key.  That's an incredibly easy mistake to make and very common - I do it a dozen times a day, and I'm pretty good.  My problem is that I type really fast, so sometimes I don't actually hit the keys in the order my brain intended, so maybe I intended to Hold-Shift-Press-F, but what my hands actually did was Press-F-Hold-Shift, and voila, I have a lower-case 'f' where I meant to type an upper-case 'F'.

The problem now being that, if we're talking about that firstName field up there, now the value assigned to it won't be validated.  That bug might sit in that code for ten months and never show up once.  Not until an invalid value is passed in.  Now you will have invalid data lurking in your program that every other stitch of code assumes is good data - because it's supposed to have been validated, after all.  Yet, looking at the code, it's really hard to see that there's anything wrong with it.  And even now that you've got bad data in there it might not actually throw up any red flags!  Your program might pass through bad data for years without complaint, and it's entirely possible that the bug is detected, if it ever is detected, completely by accident.  That's B.A.D. as in Broken As Designed.

You might spend hours looking for that bug, especially if it lurks a long time before it manifests itself.  The problem will be worse if somebody else is debugging it, because they don't know that you intended to use the property instead of the field - something that, after ten months or so, even you won't be sure of anymore.  It's a bug that should never have occurred in the first place, and the coding standard actually makes it possible. Coding standards should not encourage bugs that otherwise would not exist.  In fact, that is precisely unlike anything a coding standard should do.

Hungarian Notation - My Convention

I mentioned that I favor Hungarian notation, but I also noted that what I use now is not like the classical version.  Languages evolve.  Data types evolve.  Coding standards should evolve, too.  So what I use is based on Hungarian Notation, but different in many respects.  I'm not going to say this is the One True Way - I don't think like that.  If you've got a better idea, then go ahead.  Just make sure you do it for a reason, not just because you demand that you have to be different.  Here is the summary:

The prefixes for types that I use are fairly extensive, because there are lots of new types and lots of new ways to use them.  So I'll go by category, starting with first-level modifiers.  These go at the immediate left of the name of the object.

Integral Types

  • Signed Integers:  n   (nIndex)  (from iNteger)
  • Unsigned Integers: u  (uByte) (from Unsigned integer)
  • Real Numbers: r  (rPi) (from Real number)
  • Characters: c  (cChar) (from Character)
  • Pointers: p (pPointer) (from Pointer)
Integral types may optionally be modified with the byte-length of the type, so Signed Integers can be n, or n1 (byte), n2 (short), n4 (int), or n8 (long).  Unsigned Integers likewise.  Reals can be r, r4, r8, or r16.  Chars can be c, c1, c2, or c4.  Pointers can be p, p4 or p8.  Sometimes it is useful to make the distinction, sometimes it is not.  Do whichever is clear under the circumstances.  

You will recognize that in some languages there are identities to be found, for example in C++, u1 can be the same as  c1, u2 can be the same as c2, and u4 can be the same as c4.  Use whatever clearly expresses your intent.  For example, say you have a string, which you wish to convert to an array, which you then want to stream as a byte-stream (for an admittedly contrived example).  You might convert sString to acString, and then to au1Bytes.  Internally, acString and au1Bytes might be identical structures.  In fact, in C++ there is a good chance that all three are identical structures.  That's fine.  If they are, the compiler will optimize out the difference and likely treat them as the same entity until their contents diverge, at which time it will make them separate copies, but if the contents don't diverge, the compiler will likely simply treat one as an alias for the other until it has a reason to do otherwise.  As a programmer, you're interested in CLARITY.  If the optimizer finagles your code to make it more precise, well, that's what it's there for.

Non-Integral Types

  • Strings: s (sFirstName) (from String)
  • DateTime: d (dToday) (from Datetime)
  • Extended Reals (like the C# decimal type): m  (mOnamatopoeia) (from deciMal)
  • Instanciated Interfaces: i (iArray) (from Interface)
  • Instanciated Classes/Structures: o (oMyObject) (from Object)
  • Generic Types: t (tClass) (from Type)
  • Delegates: g (gCallback) (from deleGate)
  • Enumerations: e (eEnum) (from Enumeration)
I put delegates in there specifically for C#, because they are taken as different than in C++, but in C++ if you feel more comfortable calling them pointers and treating them that way, I'm not going to fault you for that - they are.

Array Modifier

  • Array: a (acString) (from Array)
An array modifier may optionally be modified with a number of dimensions, so for example a two-dimensional array of byte might be called a2uBytes.  You may also repeat the 'a' to represent the number of arrays in a sparse array, so you might have aauBytes to name a two-dimensional sparse array of byte.  If those are required for clarity, then by all means use them.  If they do not lend any clarity, than do not - just tag it as an array, so auBytes.  Of course, this can be combined with integral modifiers, so you could have a4u8Matrix, which describes a four-dimensional array of 8-byte unsigned int, or aai4Indices, which indicates a two-dimensional sparse array of four-byte signed integers.  Use whatever you need to be as clear as you need to be.

Again, don't feel like you need to get nuts.  These distinctions are sometimes very useful to have, and at other times there are just meaningless overhead.  You'll know you have a multi-dimension array generally from the usage (anNumbers[ 10, 10 ]) and the same goes for a sparse array (anNumbers[ 10 ][ 10 ]).  Clarity - rule 1 - do whatever makes the code more clear.

Collection Modifier

  • Collection: c (csString) (from Collection)
This is a new one.  Historically, I've just treated collections like arrays, but as the utility of collection classes expands, that identity doesn't really apply well anymore.  So I've adopted a new modifier to indicate a collection class, as distinct from an Array.  This can apply to any type of collection, be that a List, Dictionary, Stack, Queue, Bag or what have you.

Context: Variable/Constant/Input Parameter/In-Out Parameter

As you may or may not know, a property isn't actually a variable - it's a function.  I mention that up-front so as to avoid confusion.  The next modifiers are prefixed to the above modifiers to indicate the context in which a variable exists.  Specifically, whether or not you are intended to CHANGE IT.

  • Class Constants: k (knMaxCount) (from Konstant)
    • This applies to both constants and statics.  Enumerations require no such prefix because they're enumerations - which means that for practical purposes they qualify as a type definition and thus may be plain TitleCase, like public enum State { None, First, Second, Third };.
  • Instance Variables/Fields: f (fau1Buffer) (from Field)
    • These are garden-variety instanced field variables.  The corresponding property (if there is one) will probably be declared with exactly the same name, but without the leading 'f'.  Thus, if you refer to it as 'fau1Buffer', you are referring directly to the instance variable, but if you refer to 'au1Buffer', then you are referring to it through the accessor (property), and thus are using any additional code that implies.
  • Input Parameters: x (xsName) (from eXclusive)
    • This is prefixed to parameter declarations of any type that are being passed into a function with the intention that they are not modified.  
    • This is an important distinction, because in C# for example, the default is to pass instances of classes by reference which means, in principle, that if you modify that instance, you have modified it outside the scope of the function.  Normally, that's bad - and those bugs are tough to track down.  
    • So the leading 'x' is a specific statement both to the calling function and inside the declaring function that when the function exits, that value has not been changed - guaranteed.  Basically, it means 'read-only' or 'constant' if you prefer.
  • Input/Output Parameters: m (mmFactor) (from Modifiable)
    • I've used this specifically along with the type declaration 'm' to illustrate that they serve different functions and are in different places and that's OK.  The 'm' meaning 'Input/output parameter' is always followed by either an array/type or type indicator, so there is no risk of confusion.  
    • This is used, as 'x' is used, in the declaration of a parameter.  The purpose of the 'm' in this case is to indicate specifically that the parameter in question may be modified by the function, which means that the value after the function exits will not of necessity be the same as it was when it was passed into the function.  
    • It is not mandatory that the value change - that's not the point.  The point is that from both the calling function and inside the declaring function, it is clear that the value can be modified, and indeed probably should be.

Interface Elements/Components

This comes up a lot more frequently these days, and is one of the places where people fall back on cobbled-together variants of Hungarian Notation because they need something.  Well, here is a set to start you off:  These are three letters long instead of one, because A) they are a specific type of object that is used in a specific context, and B) there are a lot of them and more by the day.  There may be a time when four are needed, but three allow for 17,576 combinations, so there's time.  These likewise go in front, to btnOK.

  • Panel: pnl (from PaNeL)
  • Form: frm (from FoRM)
  • Menu: mnu (from MeNU)
  • Context Menu: mnc (from MeNu Context)
  • Menu Item: mni (from MeNu Item)
  • Button: btn (from BuTtoN)
  • Status Bar: sts (from STatuS)
  • Text Field: txt (from TeXT)
  • Numeric Field (Roller): nmr (from NuMeric Roller)
  • Drop-down-list: ddl (from Drop-Down List)
  • List: lst (from LiST)
  • Grid: grd (from GRiD)
  • Table: tbl (from TaBeL)
  • View: vew (from ViEW)
  • PictureBox: pic (from PICture)
  • Image: img (from IMaGe)
  • Icon: ico (from ICOn)
  • Bitmap: bmp  (from BitMaP)
  • Dialog: dlg (from DiaLoG)
  • Thumb: thm (from THumB)
  • Scroll-bar: scb (from SCroll Bar)
  • Cursor: csr (from CurSoR)
  • Header: hdr (from HeaDeR)
  • Footer: ftr (from FooTeR)
  • Interface Element: ele (from ELEment)
    • This is kind of a catch-all for interface elements that don't already appear in the list above, and in fact if you'd rather just use 'ele' as the prefix for all interface elements, by all means do.  I fund it useful to know precisely which are which, at least for the common ones.
  • Component: cmp (from CoMPonent)
    • This is a catch-all for those 'drop-on' interface elements that aren't actually interface elements as such.  Things like the C# Timer class, which can be instanced in the form designer, but has absolutely no corresponding visual representation.


Calling Conventions

I'm going to toss a note in here about calling conventions.  This applies specifically to object-oriented languages, and I feel strongly about it, so it's worth mentioning.

If you're referring to something that is a constant, property, enumeration or method associated with a class, then when you call it you should always directly prefix it with the parent class.  Thus, MyClass.MyEnumeration.  If you are referring to something that is a property or member of an instance of a class, then you should always directly prefix it with the instance name of the instance.  Thus oObjectInstance.ToString().  That's not difficult - some of you will be saying, "Uh, yeah, dude.  How else can you do it?"  Sure.  That's the obvious part.

Now here is the tricky bit.  This applies even inside the class or instance in question.  That is, if there is code inside a class definition that makes reference to a property or method of that class, then it should refer to it using the keyword 'this'. Thus this.ToString(), or this.GetType().  If you are referring to the enumeration MyClass.MyEnumeration in the code inside MyClass that appears in an instance-specific method, then you still use the class name.  Thus this.eEnumValue = MyClass.MyEnumeration.One;.

Why?  Clarity.  Make it starkly clear what you are referring to.  You might say, "It doesn't matter, I know which ones are local variables and which one are instance properties!"  Yeah.  You do.  Right now.  Look at that same code in two years and you won't be so sure.  So you'll have to check.  If you always do it this way, then there is never any question about what you're doing.  If a variable in a function is not referred to with the prefix 'this', then it's not an instance value - it's on the local stack.  Now you know the precise scope of the value - whether it will be in effect instance wide or just in the body of the function you're actually in.

Clarity first.  Always Clarity.

Summary

Don't get the impression that this list is by any means exhaustive.  I change newer parts of it occasionally, and invent things as they become necessary.  This is already considerably more elaborate than the first formalized version of this schema I learned from a fellow whose full name I won't use without permission, but you know who you are, PM.  That was in turn more sophisticated than the version of it I had been using up to that point, which I adapted when I read about Hungarian Notation, which modified an earlier version of it I had picked up from UseNet back in the early '80s before I found out it had a real name.

Things change, and over the past 30+ years, this has changed a lot.  This will change more, of necessity.  Computer science is still very young.  There will perhaps come a time when schemes like this are no longer even relevant.  That's progress.  For now, it works, and I'll continue to adapt it and use it until it stops working or I find something better.


Copyright ©2013 by David Wright. All rights reserved.

No comments:

Post a Comment