In this paper, we examine the performance of typical CFD kernels on five major microprocessor-based systems, and show that, for large problem sizes, performance is bounded by the speed of main memory. We further show that, for the solution of large sparse linear systems, preconditioning with a multilevel additive Schwarz method, not only enhances convergence, but also, improves performance by a factor of two to three. This is a direct result of better cache utilization. Further, we show the excellent parallel performance of this method. Finally, we apply the parallel solution technique developed to the unsteady Navier-Stokes equation in the study of transitional heat transfer in louvered fin heat exchangers.