Tableau Prep – Future Proofing Group and Replace
In the last post, I shared a tip to to ensure guaranteed grouping on all existing data in a data set. But what if your flow is scheduled to run every week or every month? Will the new data break your carefully designed dashboard? To ensure that only grouped that goes to the final datasource/viz, we need to do a few things:
- Remove ungrouped(dirty) data from output
- Generate an email trigger when ungrouped(dirty) data is detected
- Modify the flow to handle the ungrouped(dirty) data
Create a list of master categories we want to maintain(Office Supplies, Furniture and Technology). You can use Excel to create this list or a flow like below will output only the master categories to the output csv file.
Now bring in the master category in to the flow and insert a left join as shown below. All data in the join will mach now, but any future ungrouped/dirty category will have null values at the right side of this join and we can use this fact to filter them out.
Insert a clean step after the join and keep only ‘Non-null ‘ values in category-1(Master_Category) filter. You can also remove this field as we dont need it in the output.
Now, we need to identify the ungrouped/dirty categories. Insert a branch after join and keep only ‘Null’ values in Category-1(Master Category). This will be empty for now as we dont have any ungrouped data, but we are future proofing our flow, remember?
Our intention is to identify the ungrouped categories in our dataset. Add an aggregation after the clean step and group by categories and write it to an output file, we have our final flow.
Testing The Flow
Add some additional rows to the input data with new categories and run the flow.(Alternatively you can use this file). You can see that the cleaned data output only have 3 categories as expected. Additionally, ungrouped categories are added to Category_Ungroup.csv.