Data Model - After

How I learned PowerPivot Blog Series – Creating Measures

This post is the fourth in a series about learning the built-in Business Intelligence features baked into Excel 2013.  If this is your first time accessing this blog series, I encourage you to read the previous posts in order to get the proper context:

Blog One: How I learned PowerPivot from taking football stats
Blog Two: How I learned PowerPivot Blog Series – Data Prep
Blog Three: How I learned PowerPivot Blog Series – Behold, the Data Model!

As a quick review: the first post in the series described how I used an iPad and Excel 2013 to capture stats for my son’s youth football team.  The second post focused on taking raw data and prepping it to work in PowerPivot, while the third post detailed how I created relationships in the data within PowerPivot.  In today’s blog, I’ll use this data model as the source for creating measures and displaying information in a Pivot Table (or two – we’ll see how far we get in this post).

A bit of review may be helpful before I proceed.  Here’s is what’s been done to this point:

  • Five tabs in an Excel spreadsheet, each with data related to stats captured during a game, the team roster, the teams played,  penalties, and the plays themselves.
  • The data ranges in each tab were converted to Tables named Play_Data, Roster, Team, Penalties and Plays
  • The tables were added to the PowerPivot data model
  • Relationships were created between the tables in the data model

Show me some analytics!

As you may recall from the previous blog post, the following fairly simple data model was created in PowerPivot:

Data Model - After

With this model in place, I’m able to create some Pivot Tables and get into some analysis.  Here’s a simple dashboard showing offensive running yards by opponent and quarter and features a Pivot Table, Pivot Chart and two Slicers.  Easy for me to do, but not as easy to explain how I did it.  Fortunately, Chandoo has a Make Dynamic Dashboards using Pivot Tables & Slicers video and download that will do the job nicely.  Suffice it to say it took me <3 minutes to put it together and offered the football coach the ability to do some ad-hoc analysis…

PivotTable1

…which was fine until the coach asked me for similar analysis regarding scoring.  Problem is, I only tracked the result of the play (touchdown, extra point, fumble, penalty, etc) but not the numerical score itself.  Calculated Columns to the rescue!

Calculated Column – Points Scored

The solution to this was simple – in the Game table within PowerPivot I added a Calculated Column.  To do this, I clicked in the top cell in the Add Column,  typed in the super groovy formula as displayed in the image below*, and pressed Enter:

Points Scored Calculated Column

And due the magic within PowerPivot, the formula is automatically applied to every row in the table.  I double-clicked the column header and typed in Points_Scored, and I was done.

Points Scored Calculated Column 2
* Check below for an update to the formula used to calculate Points_Scored

I then clicked on the PivotTable icon in the ribbon (see above) to create a new PivotTable.  The Points_Scored calculated column shows up in the PivotTable list…

PivotTable Fields 1

…and I used it to create the following Scoring Dashboard:

Points Scored Pivot Table

Good stuff so far – the coach is getting good info.  He then asks: “can you show me stats by player of all-purpose yards?”  In other words, the coach wanted some analysis of the total of offensive rushing, receiving, kick and punt return yards by player (and by offensive I don’t mean it in a bad way, although some of the runs were pretty ‘sick’).  Although all-purpose yards weren’t tracked when I took the stats, I used another calculated column to sum the total of the rushing, et al yardage.

Calculated Column – All Purpose Yards

…which is as easy as it sounds.  Entering in the formula below in a new column and renaming the column All_Purpose_Yards, and voila – instant measure.

All Purpose Yards Calculated Column

But, wait, knowing the number of all purpose yards run on a single play is kinda interesting.  But what the coach really wants to know is the total number of yards by a player.  Or in a game.  Or when running a particular play.  Now what?

You’re gonna need a Measure.

Earlier in my career, I met a man named O. Dexter Sweet Jr. (and Yes, he’s a real person.  You may even know him).  He said to me these words: “Measures?  Measures are dirt simple.”  Or something like that.  Point is, Measures are not as mysterious as I once thought they were, but they are powerful.  Given that I’m a visual learner, the following graphic visually describes the difference between a Calculated Column and a Measure at the most basic level:

Calculated Column vs Measure Visual Depiction

Essentially, that’s it.  Well, at least enough to get started.  So, let’s start.

Measure – Total All Purpose Yards

There are two ways to create a Measure: within the PowerPivot Window and within the PowerPivot tab.  We’ll start in the PowerPivot Window (gentle reminder: accessed from the Excel workbook via the “Manage” button in PowerPivot ribbon)…

PowerPivot Manage Button

…and make sure the Calculation Area is displayed by clicking on the highlighted button and click on cell underneath the [All Purpose Yards] column:

Calculation Area

Now all that’s left to do is click the AutoSum button, and the Measure is automatically created for you!

Total All Purpose Yards Measure - PowerPivot Window

I tend to rename these Measures to suit my preferences, so I simply clicked within the Formula Bar and rename it to [Total All Purpose Yards]:

Total All Purpose Yards Measure - PowerPivot Window

You can do the same within the PowerPivot Tab:

Calculated Field - Menu Option

Total All Purpose Yards Measure - Calculated Fields

While creating Measures within the PowerPivot window is convenient, especially for these “sum the column” measures, I find that use the Calculated Fields option more often as 1) I have more control over the formatting of the data, and 2) I end up creating a larger number of Measures that are more complex than the basic “sum the column” measures.  One thing to pay close attention to is into which Table will the Measure be saved.  Since this Measure is associated with the Game, I choose to save it into the Game table.  Taking a moment to think about how the Measures will be organized will not only save time in the future, but make it much easier to create pivot tables and Power View visualizations.

Yeah, so what’s the big deal?  I won’t tell you – I’ll show you:

All Purpose Yards by Play and Player

By looking at the above chart in Power View (more on Power View in a later post), I can easily see that player #15 had 289 all purpose yards when the team ran play #5.  I can also instantly know that play 20 and 2.5 were complete duds. (If you’re wondering why not mention play ‘V’ – that’s the best play in the playbook: the Victory formation.  You’d expect to lose yardage).

While this statistic is helpful, it is only one stat – the coach wants ‘stats’.  Plural.  Meaning more than one.  Like, for example, average yards per carry and average yards per game.  Easy, right?  You just take the total yards gained and divide that by the total number of carries, right?  You’d be correct.  Problem is, we don’t know the total yards for the player or game, we only know the total yards for a specific play in a specific game by a specific player.  Hmmm, I wonder how we can solve this problem…?

A new Measure!  Let’s start with something basic: Average Yards per Carry.  Now, lest you football purists jump all over me for confusing “Average Yards per Carry” and “Average Yards per Catch”, the coach defined “Carry” as either a run or pass play.  So, here’s the Measure:

Carries Measure

Here you see an example of a CALCULATE function being used, which translates into: “count the number of times a player either ran or caught the ball when the team was on offense”.  With that Measure in place, now we can easily calculate the average yards per carry:

Average Yards per Carry Measure

The CALCULATE is a powerful and versatile Data Analysis Expression (DAX) function.  If you are not already familiar with using DAX formulas within Power Pivot, I highly recommend Rob Collie’s book DAX Formulas for PowerPivot.  Not only did I find it an invaluable resource, I actually enjoyed reading Rob’s book.  Entertaining and informative.

Using a similar approach, I created a host of new Measures, such as:

  • Average Yards per Game
  • Touchdowns, Extra Points, 2 Point Conversions, Safeties
  • Offensive Rushing Yards, Offensive Passing Yards, Kick Return Yards, Punt Return Yards

Using Power View, an Excel Add-On, and the measures created above I was able to create the following analysis of individual offensive player performance:

Individual Player Offensive Analysis

Now this is what the coach was looking for (adding the background image was a nice touch IMO).  With this, he could easily see the players that were producing the team’s offense.  But as important, and maybe even more so given that this is a Youth football team, identify those players that not as involved so he could look for ways to increase their contributions.  A nifty feature of Power View is the ability to click on a particular player and have the charts automatically highlight that player’s stats across the five charts.

Individual Player Offensive Analysis - Players 31 and 85

For example, the first shows that Player #31 led the team in many of the offensive categories while Player #85 not only caught the majority of passing yards, but also led the team in 2 point conversions.  That’s cool – but what plays produced the most scores?  Well, funny you ask.  Below is a set of Power View-based visualizations that show just that; clicking on a player’s number provides additional insight: Player #31 scored five touchdowns when they ran a particular running play, which Player #85 scored three 2-point conversions when they ran a particular pass play:

Scoring Analysis

So in conclusion…

Excel is a great tool for doing all sorts of things, and if you’re anything like me you use it all the friggin’ time.  Excel with PowerPivot?  Well, now we’re talking pure awesomeness.  You think I’m being a touch hyperbolic?  Nope, I’m not.  Being able to create my own dimensional models and then immediately using them for ad-hoc analysis is taking Excel to another dimension (pun totally intended).  Yeah, learning DAX might take a bit of time and effort, but making this investment pays huge dividends when you can take even a modest amount of data and transform it into useful analysis.

This blog series has focused on a season’s of stats data from a youth football team and using PowerPivot to gain insights on team and individual performance.  Hmmm…does this have relevance in the business world?  Indeed it does.  I co-authored a White Paper called The Changing World of Business Intelligence: Leading with Microsoft Excel that talks about how Excel with Power Pivot can be a place of convergence for technical and business folks to create business-oriented analytical solutions.  It’s worth a read – you may gain some useful insights.

 

Update

I changed the formula to add the BLANK( ) function so that zeroes wouldn’t display in the Points_Scored calculated column.  Why, you ask?  The problem was that player number BLANK showed up in the Player # slicer in the Scoring Dashboard scoring zero points.  Revising the formula as follows removes the zeroes in the calculated column, which fixes the data anomaly nicely:

=(if([Play_Result]=”Touchdown”,1,BLANK())*6)+…

How I learned PowerPivot Blog Series – Behold, the Data Model!

This post is the third in a series about learning the built-in Business Intelligence features baked into Excel 2013.  If this is your first time accessing this blog series, I encourage you to read the previous posts in order to get the proper context:

Blog One: How I learned PowerPivot from taking football stats
Blog Two: How I learned PowerPivot Blog Series – Data Prep

As a quick review: the first post in the series described how I used an iPad and Excel 2013 to capture stats for my son’s youth football team.  The second post focused on taking raw data and prepping it to work in PowerPivot.  In today’s blog, I’ll describe creating relationships within PowerPivot and create a fully-functioning data model.  This data model will serve as the source for creating measures and displaying information within Pivot Tables and Power View visualizations within Excel.

A bit of review may be helpful before I proceed.  Here’s is what’s been done to this point:

  • Five tabs in an Excel spreadsheet, each with data related to stats captured during a game, the team roster, the teams played,  penalties, and the plays themselves.
  • The data ranges in each tab were converted to Tables named Play_Data, Roster, Team, Penalties and Plays
  • The tables were added to the PowerPivot data model

The Data Model

Ahh, behold the The Data Model.  It’s almost as if the privilege for even speaking these words were once only reserved for Data Analysts (and well deserved, BTW – these are bright people).  PowerPivot reaches into this rarified air and brings this power to the masses.  Well, at least those of us that have basic understanding of the underlying data and knowledge of Excel.  In other words, the Power User.  As I see I it, Microsoft’s strategy is to position Excel as the de facto BI tool and supercharge it with capabilities such as PowerPivot, Power View, and Data Explorer.  Now is a great time to jump in and ride the BI wave!

With these superlatives out of the way, let’s look at the Data Model.  From Excel, I clicked on the PowerPivot tab and clicked on Manage, as seen below:

Manage Data Model

This displays the ‘Data View’ view, which looks something like this:

Data View

To get to the Data Model, simply click on the Diagram icon in the ribbon:

Data Model Ribbon Choice

Data Model – Before

Choosing the Diagram View displays the following data model.  Note that the tables are not connected, simply floating as independent entities without any relationships (sounds kinda lonely…).

Data Model - Before

Data Model – After

Here’s Diagram View after the tables are all ‘hooked up':

Data Model - After

Data Relationships (aka “The Hookup”)

I was fortunate that relationships between the tables were pretty straightforward, in large part because I had planned it that way.  In the Think ahead on how your data will be used’ section in my previous post, I had thought about how the data was going to be used by the coaches and how this process informed how the data needed to be structured into Facts and Dimensions.  This is the foundational step in creating a dimensional model.  For those of you that want to go a bit deeper on a structured approach for dimensional modeling from a business perspective, I encourage you to review TDWI’s Dimensional Data Modeling Primer: From Requirements to Business Analytics document and/or attend a training session.

So, let’s get started.  I’ll show two common methods to relate tables together: Click-and-Drag and Create Relationships.

Click-and-Drag

Assuming we’re still in the Diagram View, I first wanted to create a relationship between the Game table (aka my Fact table) and the Plays table (aka a Dimension table).  To do this, I simply clicked on the Play field in the Game table and dragged it to the Play field in the Plays table.  That’s all there is to it.

Create Relationships 1

This view shows the active relationship:

Create Relationships 2

Create Relationships

Another common method for creating relationships is a bit more manual but does the job equally as well. Within the Design tab in PowerPivot, click on Create Relationship…

Create Relationships Menu

…to bring up the Create Relationship dialog box.  In this example, I created a relationship between the Game table and the Team table (aka a Dimension table):

Create Relationships Dialog

I repeated this process to create the relationships needed between the tables.  In the end, I had created a honest-to-goodness star schema dimensional model.  In other words, a fully-functioning data cube.  In a later post I’ll talk about publishing the PowerPivot workbook into SharePoint, where the uploaded PowerPivot workbook is transformed into a data source for reports and other data visualization tools that can connect to a SQL Server Analysis Services (SSAS) database.  This is significant – it wasn’t all that long ago that only Data Analysts and other technical folk had the tools, knowledge and experience to create dimensional models and data cubes.  <<Reference White Paper>>

You may have noticed that in the Data Model – After screen shot that some of the connections were dotted-line versus a solid line.  In PowerPivot, tables may have multiple relationships to each other but only one connection may be active at any time.  A solid line connection visually indicate an Active relationship between tables, while dotted-line connections visually indicate Inactive relationships between tables.  This is OK – do not fear it.  In a subsequent post I’ll talk about how I used the DAX function USERELATIONSHIP in creating a measure that leverages an Inactive connection.

Blog Preview

Now that I have a viable PowerPivot data model, I’m going to spend the next blog in the series creating a series of basic Measures and a Pivot Table or two.  This is where it gets fun, so stay tuned!

How I learned PowerPivot Blog Series – Data Prep

This post is the second in a series about learning the built-in Business Intelligence features baked into Excel 2013.  If this is your first time accessing this blog series, I encourage you to read the first blog post in order to get the proper context:

Blog One: How I learned PowerPivot from taking football stats

The first post in the series described how I used an iPad and Excel 2013 to capture stats for my son’s youth football team.  While somewhat kludgy, it did the job reasonably well.   But it was lacking the ability to look deeper into the stats to find nuggets of insight (aka Analytics).  So…PowerPivot to the rescue!

As noted in the previous blog, I began to recognize that this mass of stats was actually a rich data set that was ripe for deeper analysis.  In order to do deep analysis, I needed to ‘pop the hood’ (or ‘bonnet’ for my international audience) and prepare the data to be used in PowerPivot.  What is PowerPivot, you ask?  I could write an entire blog on this, but Rob Collie already did in his blog entry called What is PowerPivot?.  For now, I’ll simply say that PowerPivot is the engine that powers analytics within Excel.

  1. Before I go further – it’s important to note that the company I work for (Magenic) provides access to Microsoft Office Professional Plus, which includes Excel 2013.  This version allows me to use PowerPivot (and Power View, which will be discussed in a future blog). In other words, if you need these features, make sure you have the proper version of Excel 2013.

Step 1: Enable PowerPivot Add-In (this article describes the process better than I can)

Step 2: Convert the Data Range to a Table

It’s a good practice to convert the data range that will be analyzed into a Table, with the added benefits of easier sorting, filtering and formatting data.  Note: this is different that simply renaming the Tab.  To convert the range into a table:

  • Click in the data range
  • Choose Insert from the ribbon, then click Table:

Insert Table

  • Confirm the data range, and make sure that the “My table has headers” checkbox is checked.

Create Table Dialog

  • Change the table name from the default name ‘Table1′. It’s not only best practice; it saves a step when the table gets added to the Data Model as shown below:

Rename Table Name

Step 3: Add to Data Model

  • Once the Table was created, add it to the PowerPivot Data Model.  To do that, choose PowerPivot from the ribbon, then click Add to Data Model:

Add to Data Model

  • After clicking the Manage button within the PowerPivot tab of the Excel ribbon (see the above image), here’s the view of the table within PowerPivot:

Data Model - First Table

Repeto (Latin for Repeat)

I repeated the process for a Roster, Plays, and Teams table.  When I originally set up the Excel spreadsheet with separate tabs to capture the data, I didn’t have the foresight to know that it was step one of creating a dimensional model.  At the time, simply capturing data and putting related data together in a separate tab made little sense to me.  So, in a way, I was lucky.  But the lesson I can pass on to those reading this blog who may not be data analysts is to organize your data so it is readily ‘consumable’ by PowerPivot.  I’m certainly not a data analyst, but here are some tips I can pass along:

  • Think ahead on how your data will be used.  In my case, I knew that the coach wanted to see Rushing Yards by Player, Tackles by Player, Rushing and Passing Yards by Play, Points per Game, etc.  There is a pattern: X by Y.  In data modeling terms, this is referred to as Fact by Dimension.  In my case, the Play_Data table became my Fact table, and the Roster, Plays, and Teams tables became my Dimension tables.  I’ve seen my share of SWS (Super Wide Spreadsheets), where every imaginable piece of data is stuffed into a massive number of columns.  Not only does it make it difficult to print, this approach makes it difficult to use with PowerPivot.  Keep what is being measured in one table, and what it is being measured by in separate tables.
  • Clean data is good data.  While Excel accommodates a bit of sloppiness when it comes to data, PowerPivot and tools like Pivot Tables and Power View do not.  For example, “Offsides” and “Off Sides” may be the same penalty, but not to PowerPivot.  “12 yrds” and “12” may convey the same information, but one is a text entry and the other is a number.  In my case, I inputted my fair share of phantom player numbers and play numbers as well as a large number of misspelled penalties.
  • Renaming is Key!  The names that you assign to columns and tables will be used extensively in PowerPivot, and it’s important to know up front that PowerPivot doesn’t “auto-fix” formulas when you later rename a column or table.  Save yourself time and headaches in the future by thinking it through (even a little bit helps).  I tip my hat to Rob Collie for that tip, which is found in his book DAX Formulas for PowerPivot – The Excel Pro’s Guide to Mastering DAX.  A great read for those who want to take their PowerPivot skills to the next level.
Here’s a useful tip: once the data range is converted to a table, take a few moments to use the Column Filters to ensure you have valid and consistent data.  I use Filters frequently, and I encourage you to become familiar with using this valuable function.  Lastly, while you are reviewing your data, take the extra effort to highlight the columns and set the type of data within the column.  Excel typically categorizes everything as General. Not only is it good practice to properly categorize the data, but PowerPivot will also pick up the data categorization when the table is added to the data model.  Here’s a screen shot:

Data Categorization

Bonus Tip

There is one more tip I’ll share that I’ve found to be pretty handy: the Sort by Column feature.  In the screenshots above, did you notice the “10 – Newport” data within the Game column?  Previous to the version of PowerPivot that comes with Excel 2013, I had to add a number to the text so that the data would sort properly.  This was terribly annoying, especially for everyday things like sorting by Month or Day.  So, without further ado, here is the solution:

  • I removed the appended game numbers in my Play_Data table, and in my Teams table in Excel (not PowerPivot) I added a GameNumber column, as seen below:

Game Number Column

  • I updated PowerPivot to refresh the Play_Data data and to bring in the GameNumber column and data into the data model:

Update Data Model

  • I opened the PowerPivot window:

Manage PowerPivot

  • I selected the Team column and clicked on the Sort by Column button on the ribbon.  In the dialog box, I set the Team column to be sorted by the GameNumber column and clicked OK.
  • Now, when I create a Pivot Table or Power View graph, the games display in the order played:

Game Sorted Properly  Tackles by Game

Blog Preview

In the next blog of this series I’ll show you how to create relationships within the Data Model.

How I learned PowerPivot from taking football stats

It all started out with a simple request from my son’s football coach.  What I didn’t know was how saying yes would help me evolve from a guy that was fairly handy with Excel to a BI analyst.  So, at the urging of Steve Hughes, who is the BI Practice Lead at Magenic, I’m writing a blog series about this process.  My goal: show the steps I took in transforming these stats into a full-fledged BI solution within Excel 2013.  In this series I’ll show how I went from this:

Solution - Before

To this:

Football Power View Image

First, the request: at the beginning of my son’s 6th grade football season, his coach asked me if I could take stats for the team.  He was looking for the very basic: rushing yards and tackles.  I had done it once the previous season, filling in for a parent that couldn’t be at the game.  Simple stuff, really.  Since having this information helps the coaches better plan for the next game, I said yes to the request.

Being the tech savvy over-achieving business analyst that I am, I thought about the easiest way to capture the data.  Writing it down? Not a chance.  I avoid picking up a pen or pencil whenever possible.  Our family has an iPad, so why not use that?  I did a bit of research, and settled on using Numbers to capture the data during the game, export the data, email it to myself, and import it into an Excel spreadsheet.  But to use all this capability to just capture rushing yards and tackles?  Kinda like driving a semi to pick up groceries.  It should have been no surprise to those who know me when I added more stats to the spreadsheet – I mean, why not?  Soon, I had added the play being run, passing yards, kickoff and punt yardage, penalties, touchdowns, etc.

Soon enough, I began amassing a fair amount of data. Using the iPad was fairly easy to capture the stats in real-time (thanks to parent helpers that were my spotters), and I had the added benefit of being an unofficial part of the team staff on the sideline.  The evolution was already underway, as my Excel formulas were becoming more complex in summarizing stats by play and player.  In other words, a bunch of SUMIFS and COUNTIFS-based formulas that did the job, but were cumbersome and cludgy (not to mention error prone).  That said, the coaches couldn’t get enough of it and were actually using the data to tweak the game plan on a weekly basis.

Somewhere during mid-season I discovered is that the football stats were a rich data set that was readily consumable by Power Pivot, a feature available to me in Excel 2010 sitting on my desktop.  The evolution of displaying stats takes an interesting turn, which I’ll talk about in the next blog.  In a word: Slicers!

(Note for those who may be curious: the images shown in these blog posts are real stats, but the names and numbers of the players have been changed to mask their identity.  For those of you who know me and are trying to get a glimpse of how your kid did last year on my son’s team – good luck.)