Learn to build a Butterfly Chart in Tableau

This chart has many names

Butterfly | Pyramid | Bikini

Take your pick!

This #WOW2020 challenge from Lorna Brown is so great! It’s an easy, quick introduction to a complex-looking chart but is fairly straight forward. It’s known as butterfly, pyramid, bikini and diverging bar chart. This chart allows the user to see trends and variations for a binary dimension along a higher cardinality dimension. It’s most common to see this chart type showing gender proportions by age bins. Just like in this example.

Continue reading for the step-by-step tutorial! OR if you prefer something a little more interactive – check out the video I made during my weekly live stream on 07.14.2022

Play Video

Shaping the data

The first thing we have to do is get the data in the right format – you can see when we bring it in, it’s got 4 rows of junk. We can fix this by checking the “Data Interpreter” box in the data source window. When we do that. Tableau will examine the datasource and find a tidy table for you. It’s a pretty slick feature and is really useful when you download open public data excel files from around the internet.

From here we need to pivot the data because Excel files are dumb and they put the years across columns which is the exact opposite of the best way to display data in Tableau unless you want to deal with Measure Names & Measure Values which comes with this own set of challenges. So we’ll pivot the columns to rows.
 
But wait! There’s more! We also need to split the data to separate the year from the gender! What a mess of a file this is. I really despise how excel used for public data…
 
Okay, so now that we’ve got the data pivoted and split, we’ve ALMOST got it ready to go. But Lorna said to use only the data from 2021, so we’ll add a datasource filter for that. And NOW we’re ready to roll

Let's build the chart

So the first thing we need to do is create separate male and female population calculations. By doing this, we’ll be able to use multiple axes in our single sheet. Our calculations will look like this
//MALE POP
sum(if [Gender] = "Males" then [Population] else 0 END)
//FEMALE POP
sum(if [Gender] != "Males" then [Population] else 0 END)

Now that we’ve got those two calculations we can start to build our pyramid. Take the [Age] pill and add it to rows, then add both of these new fields to the columns shelf as separate pills but DO NOT make them a dual axis. We should get something like this

Well this doesn’t look quite right. But we can fix that but editing axis of the the [Male Pop] field. and simply reverse it. While we’re here, let’s go ahead and add color to our bars as well. Now we should haves something like this

This is looking pretty good. Now we just need to create a new calculation for the total population. Lorna mentioned using an LoD expression. There are a couple ways we could do it. For this solution, I’m going to use an exlude expression to disregard gender anywhere in the view for this calculation.

//Total Population
{ EXCLUDE [Gender]:SUM([Population])}

From here we can add this as the primary axis for both sides of our pyramid, synchronize and format the colors to get this

But then i started to think about it and the LOD isn’t even needed because the we aren’t doing any filtering on this sheet and we want to show the total population at the age group level. That is just the datasource column of Population…

NO LOD NECESSARY!

And that brings me to my last point because all that’s left here is to format and clean up tooltips – don’t over-engineer your solutions. Be skeptical, try things out. Don’t immediately jump to using a solution that could result in a lack of performance.

Now with that…

GO FORTH AND VIZ

Check out the interactive visualization below

Leave A Reply

Your email address will not be published.