In the last post, I shared a tip to to ensure  guaranteed grouping on all existing data in a data set. But what if your flow is scheduled to run every week or every month? Will the new data break your carefully designed dashboard? To ensure that only grouped that goes to the final datasource/viz, we need to do a few things:

  1. Remove ungrouped(dirty) data from output
  2. Generate an email trigger when ungrouped(dirty) data is detected
  3. Modify the flow to handle the ungrouped(dirty) data

We will address the first item in this post and  the remaining in near future. We will start with the same flow and data set in last post.

Original Flow

 

Step 1

Create a list of master categories we want to maintain(Office Supplies, Furniture and Technology). You can use Excel to create this list or a flow like below will output only the master categories to the output csv file.

 

 

Saving MAster Categories

 

Step 2

Now bring in the master category in to the flow and insert a left join as shown below. All data in the join will mach now, but any future ungrouped/dirty category will have null values at the right side of this join and we can use this fact to filter them out.

Joining Master Categories

 

Step 3

Insert a clean step after the join and keep only ‘Non-null ‘ values  in category-1(Master_Category) filter. You can also remove this field as we dont need it in the output.

Keep Only Non Null Master Category

 

Flow With Ungrouped Categories Removed

 

Step 4

Now, we need to identify the ungrouped/dirty categories. Insert a branch after join and keep only ‘Null’ values in Category-1(Master Category). This will be empty for now as we dont have any ungrouped data, but we are future proofing our flow, remember?

Step 5

Our intention is to identify the ungrouped categories in our dataset. Add an aggregation after the clean step and group by categories and write it to an output file, we have our final flow.

Final Flow

 

Testing The Flow

Add some additional rows to the input data with new categories and run the flow.(Alternatively you can use this file). You can  see that the cleaned data output only  have 3 categories as expected. Additionally, ungrouped categories are added to Category_Ungroup.csv.

Output

 

Ungrouped Categories

 

Next Steps

  1. Generate an email trigger when ungrouped(dirty) data is detected
  2. Modify the flow to handle the ungrouped(dirty) data
One thought on “Tableau Prep – Future Proofing Group and Replace”

Leave a Reply

Your email address will not be published. Required fields are marked *