In the last post, I shared a tip to to ensure guaranteed grouping on all existing data in a data set. But what if your flow is scheduled to run every week or every month? Will the new data break your carefully designed dashboard? To ensure that only grouped that goes to the final datasource/viz, we need to do a few things:Remove ungrouped(dirty) data from outputGenerate an email trigger when ungrouped(dirty) data is detectedModify the flow to handle the ungrouped(dirty) dataWe will address the first item in this post and the remaining in near future. We will start with the same flow and data set in last post.Original Flow Step 1Create a list of master categories we want to maintain(Office Supplies, Furniture and Technology). You can use Excel to create this list or a flow like below will output only the master categories to the output csv file. Saving MAster Categories Step 2Now bring in the master category in to the flow and insert a left join as shown below. All data in the join will mach now, but any future ungrouped/dirty category will have null values at the right side of this join and we can use this fact to filter them out.Joining Master Categories Step 3Insert a clean step after the join and keep only ‘Non-null ‘ values in category-1(Master_Category) filter. You can also remove this field as we dont need it in the output.Keep Only Non Null Master Category Flow With Ungrouped Categories Removed Step 4Now, we need to identify the ungrouped/dirty categories. Insert a branch after join and keep only ‘Null’ values in Category-1(Master Category). This will be empty for now as we dont have any ungrouped data, but we are future proofing our flow, remember?Step 5Our intention is to identify the ungrouped categories in our dataset. Add an aggregation after the clean step and group by categories and write it to an output file, we have our final flow.Final Flow Testing The FlowAdd some additional rows to the input data with new categories and run the flow.(Alternatively you can use this file). You can see that the cleaned data output only have 3 categories as expected. Additionally, ungrouped categories are added to Category_Ungroup.csv.Output Ungrouped Categories Next StepsGenerate an email trigger when ungrouped(dirty) data is detectedModify the flow to handle the ungrouped(dirty) data Post navigationTableau Prep Tip – Getting Group And Replace Right At One Go For Enormous Data Sets Tableau Prep Email Alerts
[…] future proofing group and replace post, I promised to write about generating an email alert when ungrouped data is detected by the flow. […]Reply