Using VBA in Microsoft Excel for Data Analysis Automation
Visual Basic for Applications (VBA) may be used to automate virtually anything in any Microsoft Office (MS Office) product. If you have a basic understanding of VBA but no clear application for its use yet, this article will provide exactly that: real-life, pragmatic examples of complete VBA procedures that transform entire business processes into the click of a button.
I’m going to walk you through creating a composite key and finding all the records from a file that match those in our master file, and then we’ll analyze the non-matching records using several pivot tables to assist in pattern recognition.
Even more important than learning the code is learning the right mindset to make VBA work for you. VBA can save you time and make you a rockstar at work, but as with any great power, you need to use it wisely. We’ll discuss the fundamental mindset shift that happens when you start working with VBA and how to make that mindset work for you and for your company.
The “Macro Mindset”
Once you have successfully automated anything related to your job in VBA, you’ve then probably adopted a whole new outlook (no pun intended) on MS Office products, particularly Excel. This new mindset is the result of understanding the object hierarchy of these applications or even using the macro recorder effectively.
Once you understand how macros work, you start looking for more macro projects that can potentially save you a lot of time. Since a macro is defined as a way to map input parameters to output parameters, often for the purpose of automating work, then it is fitting to describe your mindset shift from a fundamentally “manual mindset” to a “macro mindset.”
This new macro mindset can cause problems if you do not use it in the context of how businesses work. Sure, it would be nice if you could automate every task that touches Excel, but what is the opportunity cost?
Two points that you must keep in mind are: “How much time will it take?” and “Is there an overlapping functionality in another application that we have or will have in the near future?”
Let’s look at how to know when using VBA is the right way to go.
Understand the Context of Your Automation Project
As with any project, the first thing you need to do is understand the context of the process that you want to automate. Construct a timeline to help clarify deadline estimates and the life expectancy of each solution.
Make sure you understand the risks of using a VBA process as a solution, as well as any possible alternatives. For example, if your department is investing in a new business intelligence tool that could solve the problem, then you should investigate that tool prior to writing any VBA code.
Look at your timeline. Figure out how long it should take you to finish writing your VBA script, how long it should take you to perform the task manually, and how long this task will exist. Is this business process going to change dramatically in a few months, and would such a change break your code?
Decisions in IT procurement and resource allocation could reduce the life expectancy of your code. In other words, understand that there is a risk of wasting your time developing a VBA solution that is either replaced by a business intelligence tool in a few weeks or rendered obsolete because of an unforeseen business process change (e.g., the business process was specific to a client that you were working with and they terminated their agreement with your company).
Since business intelligence tool licenses can cost thousands of dollars and most companies use MS Office by default, I think there is far less risk in starting a VBA project, finding out that it is no longer viable, and then abandoning it completely. Alternatively, you could investigate the functionality of a new business intelligence tool, spend hours of time training yourself on how to use it, and then find out that there are compatibility issues (or other unanticipated limitations) with it.
One final word on context: if you aim to find the most opportunities to use VBA, you’ll find them in small companies, companies that are downsizing, or even departments within large companies operating on a low budget. In general, companies and departments trying to save money will be far more receptive to using VBA as a solution.
Keep these general rules in mind for choosing the VBA path:
- VBA solutions are ideal when saving money is a high priority.
- VBA solutions are highly flexible.
- VBA solutions are best when maintained and used by as few users as possible.
- The vast majority of VBA solutions are written in Excel.
- Smaller companies generally have more opportunities for VBA than larger companies.
- VBA solutions are only as robust as you make them.
The general point of macros and VBA is to save time (and yes, add functionality, but our focus here is time), so as long as you are actually saving yourself time, then the macro mindset will work for you instead of against you.
Now, let’s look at some actual code.
Sample 1: Product Code Lookup Procedure
For those of you who are fairly new to VBA, a procedure is an independently executable series of VBA statements.
The basic idea for this sample procedure is to find out whether or not products from the ProductReport file contain product specification codes for an item catalog category that are new (i.e., product specifications that do not currently exist in the master file) and if so, which line items are they and how many are there?
For this example, I am using a text file, ProductReport, to represent the third-party report. The baseline file is the Excel workbook storing the historical product information.
I suggest scanning the “ProdCodeLookup2” code on GitHub in its entirety first. I will break it down in the following sections.
Perhaps the most important part of solving problems is the ability to break them down into parts and tackle them methodically. For me, the most natural approach is to write each block of code based on the order in which a user would perform this task if it were to be done manually.
Later on, I refine my procedure based on a more sophisticated understanding of the object model and rewrite it using considerably less code. (For performance and readability reasons, you may want to alter the order in which these tasks should be completed. Hold onto this thought, as we will come back to it later.)
Here’s the first manageable chunk of code:
Notice the name of the procedure follows Pascal Case and ends in the version number, which is useful as you discover alternative ways of writing code. It’s incredibly painful to realize as you’re troubleshooting that you forgot to switch to the revision, and the mistake you made broke your original, previously working code! So always keep updated, working copies of your procedures.
The very first statement (line 5) that I (and most people on Stack Overflow) write is purely for increased performance. ScreenUpdating is a property of the Application object that controls whether or not your screen updates the actions your macro is performing in real-time. By turning off the ScreenUpdating property, you won’t see the magic until your code finishes running, but you will notice considerably faster runtimes once you get into loops. Although I did not set the property back to True, it’s probably a good idea to do so at the very end of your code (unless you have a reason not to).
Now witness my exceptional creativity and wit as demonstrated in the following variable names:
Okay, the names may not be witty or creative, but you don’t want witty and creative when you start debugging code. When something breaks, simple and descriptive variable names are what will save the day.
I found it easier to manage variables by grouping them according to their data type and function (meaning, the way in which I use them). For example, all of the range variables beginning with rng are essential for controlling how I find which columns and parts of the table I want to work with and at what time.
It’s also worth noticing that all of my lookup variables are defined near each other. Once you start writing longer procedures, it may be helpful to add comments to them and reserve a new line and “Dim” keyword for each.
Making a habit out of such a practice will make things easier on you when you’re troubleshooting or updating your code. For example, here is the same block rewritten in the style I have described:
You get the idea; just write the variable name into words or short phrases describing how you use it. This part is fairly straightforward but extremely important to understand.
Assign the names of the workbooks and worksheets to object variables. There are many benefits to this action, including performance and readability. As you become more familiar with VBA, you may opt out of this practice—the primary goal is to ensure that troubleshooting and maintaining your code is as easy as possible. If you and future users will not benefit from this practice, then feel free to omit it.
Assign your workbook and worksheet variables to objects like so (yes, order matters—workbooks first, because they are the parent objects):
Since I am unable to top his brilliantly concise explanation, here is a link to John Walkenbach’s website and a brief description that will tell you everything you need to know about object variables.
Let’s check out the next part (lines 25-30):
StartTime is a variable that holds the initial value of the Timer property (a member of the VBA.DateTime object) and will be used at the end to calculate the total runtime. You can, if you want, create short variables for longer string values that you may need to reference multiple times. The “ICC” and “PC” variables exemplify this concept.
Finally, the last line in this section defines the scope for the following lines of code. (You definitely do not want the compiler to guess which workbook or worksheet to use, so be sure to specify it.)
Before creating the “ProdCombo” composite key, some preparation and formatting operations are in order (lines 31-42). Hide the columns in the baseline workbook that are irrelevant to this process, create the new columns that we do care about, and apply bold formatting on all headers for readability.
On lines 44 through 52, turn on filtering for all the headers and then filter out everything except labels (let’s assume that we are interested only in this product specification).
In lines 53 through 69, I am creating the ProdCombo using a loop, concatenating values in five columns, and using a space delimiter.
Deleting blank products is important, which is the purpose of lines 70 through 78. For the purpose of this example, since we care only about products that have values in at least one of the five fields, we don’t need records that are blank in all five Label fields.
Notice the comment on line 80 that refers to running another procedure: “Now run the ProdCodeLookup2 procedure.”
I originally wrote and tested two separate procedures; one for formatting and other ancillary tasks, and the other for executing the VLOOKUP. I do not necessarily recommend that you do the same, but since this was my first time using a VLOOKUP in a loop, I found it easier to develop in this manner. The ProdCodeLookup2 procedure performs a VLOOKUP on each cell under a specified header and loops through all of the records until it reaches the end of the table.
Before we look at the second example, let’s talk about a few tips to deal with VLOOKUPs in VBA.
Tips for Dealing with VLOOKUPs in VBA
Blanks and errors (basically anything that returns the default error status from a VLOOKUP function that you would perform manually) will cause the entire procedure to crash. Yes, that is correct. When you are working with VLOOKUP in VBA, this practice pretty much forces you to implement error handling into your code.
There is at least one useful example on Stack Overflow on error handling for VLOOKUPs specifically. Wrap the loop containing the function in lines 83 and 96, as I did in the example, and that solves the problem in a simple way.
Then on the last few lines (98-109), perform the following tasks:
- Define the ProdCode column header location based on the column name.
- Use the worksheet function “count” on all values under ProdCode and assign them to a variable.
- Return the result to the user in a message box.
- Turn ScreenUpdating back on.
- Return the time (in seconds) that the application took to run.
- Display the time in a message box to the user.
That concludes the first VBA script. So, at this point, you are probably wondering, “Hey! Where is the data analysis?”
Data Analysis: Sample 2
Okay, so this information is useful, but what about the rest of the products? How do these results compare to the rest of the specification categories (besides labels), and what types of comparisons are worth our attention? What about all of the products that do not match and what can we learn about them?
To make things a bit more complicated and interesting, let’s add a status flag to the products. The status represents whether or not the product specification code in question is active.
As a quick recap, the dimensions of our analysis may be aptly summarized as “product category,” “active/inactive status,” and “match/no match status.” With these additional details, we have several use cases for data visualization via pivot tables.
Where is the data analysis, you ask? Right here, in a separate procedure:
It is a longer procedure that uses pivot tables. These two characteristics alone justify a short list of tips to keep in mind when you write your own version:
- Avoid “.Select”: Try to avoid using the “.Select” method or “.Selection” object, for performance reasons, mostly. Try to look up methods and properties that you are selecting and think about how to avoid selecting them at all.
- Experiment with recording macros: The Record Macro feature will allow you to quickly learn some of the essential bits of syntax in addition to commonly used methods, objects, and properties.
- Write comments: Taking the time to give variables meaningful names and utilizing white space consistently are surprisingly helpful ways to make your code more readable. The associated benefits will become clear as you write increasingly complex and lengthy procedures.
- Performance: I am going to mention a few simple but very effective performance tricks that will make a huge difference as you progress. The VB Editor utilizes something called “dot processing,” which essentially means that it parses your VBA code according to the periods (dots). By reducing the number of dots, you minimize the number of statements that need to be compiled. Again, this process will help you considerably more as you advance the size and complexity of your procedures.
- Learn R1C1 cell referencing: Since the macro recorder uses it, you should know how to read it and be familiar with this reference style—especially for learning how pivot tables work in VBA.
Do you remember the thought that I told you to hold onto (i.e., altering the order in which you program tasks from the order in which they are manually performed)? It is now time to revisit that thought.
As I was writing the “ProdFlag_v3” procedure, I found the copying, pasting, and deleting operations to be problematic. I did not really understand how the “.CutCopyMode” property works, and I needed to make the code work–and fast.
Fortunately, I learned a valuable lesson in the process:
The order in which actions are performed manually is not necessarily the order in which they ought to be performed in VBA.
While there are advantages to understanding how a task is performed manually, the key consideration is to not allow this to limit your thinking–such limitations will result in inefficient code containing unnecessary commands.
In this example (lines 1-36), we are copying blank (alternatively, “NoSpec” or “Spec-less”) records into a new workbook for analysis. Originally, I thought it would be easiest to copy everything from the report into the analysis sheet and filter out the unwanted data later. As I have learned the hard way, it can be simpler to filter first.
After writing a description of the code and declaring the variables, you will want to pay attention to lines 18, 20, and 21. Think of the PRtable object variable as a reference defined by the upper-left and bottom-right corners of a given range on a specified worksheet.
I chose cell “A1” deliberately because I knew that all of the most important data on the report began in the first column. I will cover one way to make this dynamic later.
Lines 20 and 21 represent the syntax for creating a new worksheet. Since I am storing this macro in the same workbook containing the results of the analysis, I just ensure that it “has focus” (i.e., “is active”) before running the macro.
Lines 23 through 25 declare the object variable of the previously created worksheet, assign a value to the “FinalRow” variable, and add AutoFilters to the export worksheet. The following block (lines 27-30) consists of a “For-Each” loop that filters out everything, except blanks, for the records in the columns containing the Spec data.
The block of code comprised of lines 32 through 36 takes care of the copying and pasting between workbooks. Notice the manner in which I defined the selection of data that I want to copy (without using .Select): line 32 contains a very useful “.Resize” method with two arguments, the PRTableRows variable and the number seven, which represent the number of rows and columns I want to copy, respectively.
In case you were wondering, line 38 is not an error; I intentionally reassigned the FinalRow value here because it needed to be reset. Remember how it was used in line 28? By the end of that “For-Each” loop, its value is equal to the number of columns in that table, which is not the value we want it to use for this next loop. If you do not reset it between loops, you can get some pretty weird results!
For aesthetic reasons (and practice), I added lines 39 through 50 to create a For-Each loop with a With/End-With statement nested inside it. The code basically changes the color and font of the values depending on whether the records are flagged as active or inactive. At this point, I need to clarify one important detail to avoid confusion.
The active/inactive status flag is not programmed into the code; it is written inside the baseline data. If you create a flag of your own, you may compare the logic associated with my active/inactive flag and emulate it for your own purposes.
I turned on the AutoFilter property on line 52, then dynamically searched for my first header of interest (the item status column, i.e., the status flag), and assigned it to an object variable.
Lines 53 through 55 help us search through all of the cells in a range and find one cell based on its text value:
There are two important parts of this statement that I want you to focus on for saving time:
- The “LookAt” argument in the “.Find” method has two constants: “xlPart” and “xlWhole.” If you use “xlPart,” then the method will search through all of the cells in the range and stop at the first one that contains the string you are using. You do not want to use “xlPart” if there is any chance that you have a header containing the same word multiple times, because it may select the wrong header if the order changes. For this reason, I strongly recommend sticking with “xlWhole.”
- Qualify the range with the worksheet name. Even ActiveSheet is better than no qualification; it’s a best practices approach that will make your life easier.
VLOOKUPs aren’t the only useful worksheet functions to use in VBA. In lines 57 and 58, I used the “CountIf” function to search through all of the values in our item status column and assigned the results to corresponding active and inactive variables. If you skip to the message box statement on lines 204 through 206, I use these variables to return the results to the user there.
Instead of going over each of the pivot tables one-by-one, I am going to save both of us a lot of time by summarizing the process and explaining to you the trick to programming pivot tables.
Using Pivot Tables in VBA
The prerequisites to pivot tables in VBA are the same, conceptually, as they are for a user who is creating them manually: check to ensure that your data source contains valid, normalized, high-quality data. In my particular example (Sample 2), the same holds true: create your test data for the imaginary product, its categories, and its specifications.
Are you ready to learn the magic? You are about to find out how to automate one of the most widely used, advanced features in Excel. Okay, here we go:
- Copy the two lines that I used (lines 63 and 64).
- Navigate to your workbook, click the Developer tab, and click “Record Macro.”
- Left-click one cell in the table you want to use for the pivot data and press “Ctrl” + “A” to select all of the cells.
- Press “Alt” + “N” + “V” + “Enter” (or change some settings before pressing “Enter”).
- Create your pivot table manually by dragging the columns you want into the appropriate dimensions, filtering, grouping, formatting the values, etc.
- Click the “Record Macro” button again.
- Press “Alt” + “F11” to get back to the VB Editor window and find the code in the module you recorded it in.
- Copy it, paste it into your project under the worksheet you created, and read it.
- Repeat for as many pivot tables and worksheets as your heart desires.
Are you relieved? The point is that you don’t have to think about the pivot table objects and methods very much; automating pivot tables is really most efficient when you do it once manually with the recorder and then read the code a few times if you are interested in passing variables or changing constants.
Compare this process with the process of programming V-LOOKUPS, which require error handling, a carefully defined table or range assigned to an object variable (and qualified by a worksheet object), and contained within a loop. None of that is macro-recorder-friendly!
If you anticipate using the macro recorder more (even if it is only for pivot tables), I have one more suggestion …
Do Yourself a Favor: Learn R1C1 Style
It’s probably not the cell-referencing style that you are used to, but R1C1 is what the macro recorder uses. I suggest familiarizing yourself with it to save time when working with pivot tables, specifically, so that you have less work to do when editing the code generated from the macro recorder.
From here on out, it is up to you to figure out which pivot table objects are worth learning in VBA. The truth is that you can automate a workbook containing 20 pivot tables in a matter of minutes, once you understand some keyboard shortcuts, how to set up your tables properly, and how to prepare your code to insert blocks of recorded pivot tables.
Efficiently Allocating Your Time
The two most important parts of automating analysis (in terms of doing so in the most efficient manner possible) are as follows:
- Most of your effort should be spent preparing the data. Ensure that the data quality is top-notch, and think about any flags, keys, validation, reconciliation, and simple calculations that you may want to use.
- Visualize your end result as you build and refine your table.
I bet you are surprised at this point. You probably did not expect this article to focus so little on the actual VBA code involved in constructing pivot tables, charts, and other objects and properties that may be used for data analysis.
You do not save most of your time by learning all of the code that is required to create a pivot table—that is a surprisingly time-consuming task. Instead, learn how to prepare the data set and automate the preparation process itself.
In a similar fashion, using VBA can save you time and effort, but using it efficiently and wisely can help you most of all.