Requirements from a Code Generation tool

So it’s 5am on a Friday morning and I’ve been unable to sleep, so rather than lying in bed and thinking about code generation I figured I should get up and write about it. I know what your thinking, but really it’s not that blatantly obvious why I was lying in bed alone. Instead I blame it on the fact that googling my name just isn’t as productive for me as it is for some people.

What I’ve been thinking about this morning is what requirements do I have for a code generation tool. About 4 years ago I wrote a O/R mapping add-in for the VB6 IDE (if I had it I’d post it for all you hard core VBers, but I don’t), that was relatively feature complete, and I’m drawing a lot of my requirements knowledge from that experience. Here’s the list of functionality requirements that I have when I look at a code generation tool:

  • Multi-datasource support - For me this goes without saying. If a tool can’t support more than one major datasource then it’s not going to be a useful addition to my toolkit for long term use. Rarely do any of us have the opportunity to work on one project for the duration of our careers and every project move you make is a possible, and likely, datasource move.
  • Datasource element transformation - Nothing is worse than looking at database elements, whether they’re tables, columns, stored procs or something else, in an environment where some naming standard has caused the true business purpose of said element to be less than obvious. I need a way to convert the RT_EMP_HIST table into a business object that has the name EmployeeHistory (complete with the proper casing if possible). If a tool can’t create objects that have meaningful (from a business domain standpoint) names, the benefits of its rapid code generation will be lost in poor readability and higher maintenance costs.
  • Repeatability - I’m sure you’re thinking that a code generating tool, by nature, has repeatability built into it, and you’re right. If I fire up the tool and run it now I will get the same results (assuming the data structure I generating from haven’t changed) if I run it in 5 minutes, 10 minutes, 10 days or 10 years. The repeatability that I look for is slightly different. You could argue that its more portability than anything. I want to be able to install the chosen tool onto 1, 2 or 100 different machines and easily execute the generation with the same results on each of the machines. Without this I now have a single machine that uses voodoo to execute the process. Nobody like voodoo in their software development process.
  • Templating - This is a no brainer. If you can’t customize the look and feel of the end code product, you have been backed into a style and, potentially, architectural corner. This was one of the things that I didn’t do in the O/R mapper that I created for VB6. Because that was create for in house use, I only had to worry about one style, one layout and one architecture. By no means was that add-in something that I would feel comfortable forcing onto another project or company. Instead I would hope that everyone would like to make their own choices about these things and get the results that they want from their tool.
  • Multi-language support - Different companies and different projects within those companies, choose different programming languages. There’s no way around it. If I have a tool that can’t switch from VB for the .NET framework 1.1 to C# for the 2.0 framework, or even VB6, I’m less likely to be to take the tool with me from project to project. This doesn’t mean that I expect one template to generate code for all of the different languages that my whimsical mind can think of. I’m more than ready to write a new template, or better yet just modify and existing one, for each language.
  • Granular generation - There are times when I don’t want to generate the O/R or DAL for an entire datasource. Maybe all I’m worried about interacting with in my application is a small subset of the datastore. In that case I should be able to select that subset and have the generation only operate on it. Ultimately I want the tool I’m using to remember what subset I am working with so that I don’t have to select 49 of 300 tables every time I want to generate code. This holds true for more than just tables. I want the ability to select subsets of columns as well. Again, there’s no reason the tool I’m using shouldn’t be able to remember my selections from previous sessions.
  • IDE integration - As I said earlier, once upon a time I created an O/R mapping tool for VB6. One of the nicest features of it was the fact that it was an add-in to the Visual Studio IDE. I didn’t have to jump windows from one application to another. There was no need to manually add generated files to a project or solution. It all happened automatically (heck, you could even see the files being added as the generation was being performed). One of the best things about it was that after generation was completed, I could immediately compile the application and see any issues that may have been created.
  • Custom code retention - Although I like to create templates that create code that encourages placing custom code outside of the generated code, situations do occur where you have to embed your custom code right in the middle of your generated code. The tool that you’re using needs to be able to recognize this and retain these blocks of code, in the same locations (from a code standpoint, not a line number) when you re-run the generation. How this is done has to be easy and unobtrusive to the programmer as well as effective. My personal recommendation is to avoid having to do this by designing your code so that you are encouraged to place custom code into non-generated objects.
  • Data type mapping - Like data source element transformation, I want to have control over the mapping of source datatypes to programming language data types. Occasionally you will need to map a data type to a non-standard programming type because of database conventions or choices. I will say that I don’t expect this mapping to work in one off situations (if it can great). If you choose to use a char(1) datatype, containing ‘Y’ and ‘N’ values, to represent boolean values in your database, you should be doing this consistently throughout the database. If you are, then a global mapping from char(1) to bool in C# should be achievable.Choosing a code generation tool or methodology is a lot more than just thinking “Which tool will create my code for me quickly?” Productivity gains realized by generation speed are only gains if the generated code follows the coding standards, formats and architectures that your team can work with and need. Everything that isn’t quite what you want as an end code product, or as a generation process, takes away some of the productivity gains that could be realized. There are a number of tools that are on the market and you will need to evaluate each of them based on your requirements. Remember, this list is my requirements so yours might differ.