How I learned PowerPivot Blog Series – Data Prep

This post is the second in a series about learning the built-in Business Intelligence features baked into Excel 2013.  If this is your first time accessing this blog series, I encourage you to read the first blog post in order to get the proper context:

Blog One: How I learned PowerPivot from taking football stats

The first post in the series described how I used an iPad and Excel 2013 to capture stats for my son’s youth football team.  While somewhat kludgy, it did the job reasonably well.   But it was lacking the ability to look deeper into the stats to find nuggets of insight (aka Analytics).  So…PowerPivot to the rescue!

As noted in the previous blog, I began to recognize that this mass of stats was actually a rich data set that was ripe for deeper analysis.  In order to do deep analysis, I needed to ‘pop the hood’ (or ‘bonnet’ for my international audience) and prepare the data to be used in PowerPivot.  What is PowerPivot, you ask?  I could write an entire blog on this, but Rob Collie already did in his blog entry called What is PowerPivot?.  For now, I’ll simply say that PowerPivot is the engine that powers analytics within Excel.

  1. Before I go further – it’s important to note that the company I work for (Magenic) provides access to Microsoft Office Professional Plus, which includes Excel 2013.  This version allows me to use PowerPivot (and Power View, which will be discussed in a future blog). In other words, if you need these features, make sure you have the proper version of Excel 2013.

Step 1: Enable PowerPivot Add-In (this article describes the process better than I can)

Step 2: Convert the Data Range to a Table

It’s a good practice to convert the data range that will be analyzed into a Table, with the added benefits of easier sorting, filtering and formatting data.  Note: this is different that simply renaming the Tab.  To convert the range into a table:

  • Click in the data range
  • Choose Insert from the ribbon, then click Table:

Insert Table

  • Confirm the data range, and make sure that the “My table has headers” checkbox is checked.

Create Table Dialog

  • Change the table name from the default name ‘Table1’. It’s not only best practice; it saves a step when the table gets added to the Data Model as shown below:

Rename Table Name

Step 3: Add to Data Model

  • Once the Table was created, add it to the PowerPivot Data Model.  To do that, choose PowerPivot from the ribbon, then click Add to Data Model:

Add to Data Model

  • After clicking the Manage button within the PowerPivot tab of the Excel ribbon (see the above image), here’s the view of the table within PowerPivot:

Data Model - First Table

Repeto (Latin for Repeat)

I repeated the process for a Roster, Plays, and Teams table.  When I originally set up the Excel spreadsheet with separate tabs to capture the data, I didn’t have the foresight to know that it was step one of creating a dimensional model.  At the time, simply capturing data and putting related data together in a separate tab made little sense to me.  So, in a way, I was lucky.  But the lesson I can pass on to those reading this blog who may not be data analysts is to organize your data so it is readily ‘consumable’ by PowerPivot.  I’m certainly not a data analyst, but here are some tips I can pass along:

  • Think ahead on how your data will be used.  In my case, I knew that the coach wanted to see Rushing Yards by Player, Tackles by Player, Rushing and Passing Yards by Play, Points per Game, etc.  There is a pattern: X by Y.  In data modeling terms, this is referred to as Fact by Dimension.  In my case, the Play_Data table became my Fact table, and the Roster, Plays, and Teams tables became my Dimension tables.  I’ve seen my share of SWS (Super Wide Spreadsheets), where every imaginable piece of data is stuffed into a massive number of columns.  Not only does it make it difficult to print, this approach makes it difficult to use with PowerPivot.  Keep what is being measured in one table, and what it is being measured by in separate tables.
  • Clean data is good data.  While Excel accommodates a bit of sloppiness when it comes to data, PowerPivot and tools like Pivot Tables and Power View do not.  For example, “Offsides” and “Off Sides” may be the same penalty, but not to PowerPivot.  “12 yrds” and “12” may convey the same information, but one is a text entry and the other is a number.  In my case, I inputted my fair share of phantom player numbers and play numbers as well as a large number of misspelled penalties.
  • Renaming is Key!  The names that you assign to columns and tables will be used extensively in PowerPivot, and it’s important to know up front that PowerPivot doesn’t “auto-fix” formulas when you later rename a column or table.  Save yourself time and headaches in the future by thinking it through (even a little bit helps).  I tip my hat to Rob Collie for that tip, which is found in his book DAX Formulas for PowerPivot – The Excel Pro’s Guide to Mastering DAX.  A great read for those who want to take their PowerPivot skills to the next level.
Here’s a useful tip: once the data range is converted to a table, take a few moments to use the Column Filters to ensure you have valid and consistent data.  I use Filters frequently, and I encourage you to become familiar with using this valuable function.  Lastly, while you are reviewing your data, take the extra effort to highlight the columns and set the type of data within the column.  Excel typically categorizes everything as General. Not only is it good practice to properly categorize the data, but PowerPivot will also pick up the data categorization when the table is added to the data model.  Here’s a screen shot:

Data Categorization

Bonus Tip

There is one more tip I’ll share that I’ve found to be pretty handy: the Sort by Column feature.  In the screenshots above, did you notice the “10 – Newport” data within the Game column?  Previous to the version of PowerPivot that comes with Excel 2013, I had to add a number to the text so that the data would sort properly.  This was terribly annoying, especially for everyday things like sorting by Month or Day.  So, without further ado, here is the solution:

  • I removed the appended game numbers in my Play_Data table, and in my Teams table in Excel (not PowerPivot) I added a GameNumber column, as seen below:

Game Number Column

  • I updated PowerPivot to refresh the Play_Data data and to bring in the GameNumber column and data into the data model:

Update Data Model

  • I opened the PowerPivot window:

Manage PowerPivot

  • I selected the Team column and clicked on the Sort by Column button on the ribbon.  In the dialog box, I set the Team column to be sorted by the GameNumber column and clicked OK.
  • Now, when I create a Pivot Table or Power View graph, the games display in the order played:

Game Sorted Properly  Tackles by Game

Blog Preview

In the next blog of this series I’ll show you how to create relationships within the Data Model.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s