When BI Tools Fall Short: Automated Reports using Excel

Challenge:

As the only financial analyst with data engineering skills on my team, I was responsible for ensuring accurate and timely reporting. The leadership team frequently encountered issues with incomplete data in daily/weekly reports generated using the MicroStrategy platform. MicroStrategy did not have the capability to check for data completeness before pulling in the data, leading to incomplete reports being sent to senior stakeholders.

Solution:

I developed a custom workflow that integrated a data quality check with a flag that triggered the sending of the report. By tweaking the existing Oozie workflow, I added an additional step in Python (PySpark) that checked data completeness based on parameters like transactions change vs. yesterday, revenue amount, marketing costs, etc. If the validation failed, the workflow would finish successfully without generating a report.

In case of complete data, the workflow created an empty table in HDFS, which was then used in a case-statement. Another flag was written to the MySQL table and read by the on-premise Microsoft Server, allowing the triggering of Excel Macros to refresh and send out the report. This process ensured that reports were generated only when the data was complete.

To give a bit of context on the data refresh in Excel, we used ODBC connectors to connect Excel to Hive, allowing us to pull data into DataTables or PivotTables. I used dynamic GETPIVOTDATA formulas to pull the data into the correct cells and update the graphs automatically.

Results:

The integration of data quality checks and report-triggering flags eliminated the issue of incomplete data in daily/weekly reports. Previously, incomplete reports were sent to stakeholders at least once a week, but after implementing the solution, there were no issues with incomplete data. The solution saved time and resources by generating reports only when the data was complete. The entire process took about a month to design, create queries, automate, and create workflows.

Conclusion:

By automating data quality checks for daily/weekly reports, we eliminated the issue of incomplete data, saved time and resources, and provided value to stakeholders. The solution required time and effort to build, but it was essential to make data-completeness based reporting possible.