Introduction
I have recently been involved in a discussion on performance. One of the big discussion items is the area of developer productivity vs. application performance. Now, we all make tradeoffs like this every day. Is it better to buy or to build? Is it better to code at a high level or low level? Is it better to get something out sooner that may not be optimized vs. getting it out later and providing a better experience for the user? Is it better to code at Java/.NET vs. a lower level (think C++/C or assembler)? Personally, I've coded in assembler and it REALLY SUCKS! I did it for a few quarters at Georgia Tech and I hated the complete experience. I've coded in C/C++ and it was marginally better, but this was in my unix days. I'm a .NET junkie and I will continue to be so. I believe that providing high level tools to make development easy is a good thing. I am willing to pay a performance price to get a solution out to a customer sooner. This is something that makes sense in the consulting work that I do every day. However, there are times when doing performance analysis and optimization does make sense.
Performance and Orders of Magnitude
Lets talk performance. Performance tends to fall into a general area of Orders of Magnitude to complete an operation. Based on the number of items an algorithm operates on, called N, algorithms tend to complete based on a function that looks something like
O(N) = A N^3 + B N^2 + C N log(N) + D N + E log(N) + F (comment: higher/different orders are entirely possible, I'm just thrown out some common ones)
Out of that function, we tend to drop the constants (A, B, C, D, E, and F) and we tend to look only at the highest order of magnitude problem. This means that it is typically more important to work on the part of an algorithm that is causing the N3 problem than the log(N) problem. From a numeric standpoint, this makes sense. After all, if we have 50 items to process and we look at 50^3, we get a problem of the order 125000. If we have a log(50), that's 1.69, which is a big difference. Also, we need to understand the number of items we are dealing with. With an algorithm dealing with 3 items it may not be a big deal with your algorithm works on an order of N^2. If you have a 100 items, N^2 can be a big deal.
Where does the choice of raw algorithms come up? The answer is typically it depends. It tends to not come up as much in the consulting world and much more in the product world. However, in the consulting world typically involves a lot of database interaction, as a result, there is a complete section in this devoted to the topic.
Database Access
Now, the problem in the consulting world is a little bit different. The world tends to run on data and that data is typically in databases. Bill Vaughn is recognized for saying that all data access technologies wait at the same rate. This is true. Its very important to realize that a query that takes 5 seconds to run, but is run 100k times a day is a bad thing when it should only be taking 350 milliseconds to run if it had been optimized. So, its better to only get back the data that you absolutely need. Going off process/server to get data is really expensive. Don't muck it up with something that looks like:
select * from table <!-- I've seen this done to get a simple calculation, no joke.
You want to do something that looks more like this:
select col1, col2, col3 from table where col4=val4
In addition, you want to make sure that your database is optimized. You want to make sure that you have primary keys, foreign keys, and a good indexing strategy. How you access your data is important. I like some routines that I have developed over the years. The routines protect from Sql Injection and they are pretty simple to use. They aren't for everyone. Stored Procedures are also a pretty good thing to use. They can be used to generate dynamic sql internally, so its really your choice.
Algorithms and dividing the work
Algorithms come in many shapes and sizes. Generally, they fall into having a certain percentage of their work is serializable and some is parallelizable. Work that is only serializable can only be done 1 step at a time. No matter how many CPUs or threads you attempt to use, you aren't going to get any benefit. The only way to help these applications with hardware is to get faster CPUs with a higher frequency and more instructions per second. Going from a system with 1 cpu to a system with 4 won't necessarily help these parts of an application. Algorithms that are parallizable are much more interesting in the current environment. With these algorithms, adding more CPUs, threads, cores and such will general improve the performance of these algorithms. There is always some overhead involved in managing the different threads and such, but the over benefit is an increase in the amount of work that can be performed.
In a consulting gig, there are typically parts of an application that can be divided out into parallizable work. For example, a web request is typically an entity to itself, so http requests typically get mapped to different threads. The requests don't typically interact with each other, so this is a place where you can get a win. Fortunately, this is taken care for you by the web server vendor.
The Problem
As this performance argument was going on, my mind went to an article by Jeff Atwood I had seen discussed a few weeks ago on twitter. The article is at http://www.codinghorror.com/blog/archives/001198.html. Basically, the article talks about why its a poor idea to do performance optimization when looked at from a cost benefit analysis. The thinking is that programmers are expensive and hardware is cheap, so based on that, just throw more hardware at a problem until it gets fixed. I'm sure that for some customers, this is an acceptable solution, however, I have not seen them. First off, you have to understand where your potential bottlenecks
are. Its your job as a developer to understand this. You have to
understand database performance. If you don't know about sql, primary keys, foreign keys, and indexes, go learn about them. You have to understand what your
application is doing and be able to pinpoint the bottlenecks. Just
claiming that you need more better hardware is craziness and will probably make the customer question you.
Turn this around and lets look at this from a customer's perspective in the consulting world. I have an application that has been written in house and must support a set number of users. To get more hardware to support the application requires the initial outlay of cash, signoffs from managers, appropiate space must be provisioned in your data center, the server must actually get there, the application must be setup on the server. The cost of these things is significant. Its a hard to thing to fathom because they don't show up in a budgetted line item, but they exist. To get a $5,000 server setup might cost $15,000-$40,000 depending on the bureacracy that one must go through just to get the power turned on to a running application. This doesn't even count the yearly maintennance cost of the server or the support that must go into the operating system on the server. Before the Linux folks freak out and claim that their OS is free, its only free like a pet is free. It still has to be maintained.
Looking at the cost of a developer's time for a month vs. the $20-50k to actually get a server up and going is a completely different discussion than the $5k initial outlay of cash for a server.
Stupidity
One of the things that I heard during this
discussion was that it was important to not do stupid things. "Doing
stupid things gets you into trouble." Amen, my bothers and sisters.
We all agree on that. Its your job to not do stupid things. How do
you not do stupid things? You educate yourself. You learn. You don't
just take the defaults of settings. You figure out what is going on.
You have to learn. Learn Sql. Learn algorithms. Learn, learn, learn.
Why the performance analysis is wrong
Yeah, just throwing hardware at a problem is wrong. Its wrong in a given set of circumstances. Generating an application that takes on the order of N^3 or N^2 is just plain wrong. No amount of hardware is going to fix this problem. It shows that a developer doesn't know their algorithms. Spend the time to fix your algorithm. Spend the time to optimize your data access if a simple calculation requires way too much data. Spend the time to get rid of these N^3 and N^2 parts of your algorithm.
Why the performance analysis is right
I've seen developers try and squeeze the last little bit of performance from an application. As I said, working on those N^3, N^2, and database problems can give you easy wins. The problem is that developershumans don't tend to take a balanced look at things. People tend to gravitate to one extreme or the other other. In this case, there are folks that will attempt to optimize forever. Does it matter if an application scales at 5 N Log(N) or 3 N Log(N)? No, it really doesn't matter. You typically don't need to worry about that level of optimization.
At this point, I am reminded of an optimization that a professor I had at Georgia Tech did when he worked at IBM. He was working in their mainframe area. He found an operation that was taking a large amount of time (read minutes). He spent weeks counting cycles and improving it. A performance analysis was done and after all of this time, no improvement was found. He was shocked. How could all of his time not improve things? The problem was that this method was only called during a backup of the system. Yeah, that writing to tape operation. Know where your bottlenecks are. Its your responsibility to know when an improvement needs to be made and when it doesn't. :-)
Final Thoughts
Ok, that's it. Go forth. Be intelligent with your applications. Don't do stupid things!
PS. I reserve the right to change whatever I said in whatever way I said it.