Automatically loading the latest file from a folder in to Tableau

If you have a folder where files are added daily, how do you load the latest file automatically in to tableau? I recently answered a stack overflow question  and this post is an attempt to explain the other aspects that is not covered on that answer.

Basic Setup

If you want to load multiple files automatically, you need to create a new union and fill up the create union window correctly.  In the below screen shot, We are loading all files in Sales_Data folder that have a name starts with ‘Sales’.  Please note that you can load files from sub folders or parent folders by checking the appropriate check button.

 

This setup will load all the files matching the criteria. If you need all data, good! you can stop here, but please continue if you want to know how to keep only the latest file.

Scenario 1: File name contain properly formatted date

In this case, all files are names in  Sales_YYYYMMDD format. This is the easiest format to handle as Tableau automatically recognize it. To keep only the latest file, create an extract(recommended for performance) filter and choose to keep only the TOP 1 path using aggregation max.

Here is the final results that shows only the latest file.

 

Scenario 2: File name contain date but not in  suitable format

If your file names are formatted in Sales_DDMMYYYY format, above trick is not going to work as the max aggrgation will pick 31012017 over 15112018. In this case we need to create a calculated date from the path and use that to apply the top 1 filter.

Calculation is dependent on your file name,  but here is a sample calculation:

DATE(DATEPARSE("ddMMyyyy",MID([Path],LEN([Path])-11,8)))

Now apply the Top 1 filter as explained above and you have your data loaded.

 

Scenario 3: File name contain an auto incremented number

Scenario 1 can be used in this case without any changes as Tableau’s max calculation will return latest file name.

 

Scenario 4: File name contain no date and auto incremented number

In this scenario, you need to have a field like date in your data that can identify it as the latest record. The process is still the same except that now you need to apply the TOP filter on data itself and not on the file name.

You may also like...

Leave a Reply

Your email address will not be published.