Five Data Resolutions for the New Year
My first post on “data resolutions” for the new year was in 2017, and that list was designed to help improve how you manage and use your data. I’ve been able to continue applying some resolutions in 2018, but I let some slip too. It’s not all-or-nothing when it comes to keeping these resolutions. You will get payback for your efforts, even if they only last a few months. No judging here.
After looking back on my client experiences in 2018, I have a fresh new set of resolution ideas worth your consideration. These resolutions can supplement your planned activities, and while not all may apply to you, I hope they will help you harvest more from your data with less effort.
#1 Pay off technical debt
Suppose that once upon a time, you chose to look past a problem with your database or application in order to meet a deadline. The impact of the problem might have been poor query performance, occasional metric inaccuracies, or in some cases bad data. The idea that you could use a work-around or simply “live with it for now” seemed practical, but you knew there was a cost. Your choice to take on this technical debt was based on your cost or time constraints. If you have been piling up some of this debt, then this is the year to start paying it off.
Technical debt is insidious, and it quietly eats away at the value and efficiency of your applications and data systems. I like to compare technical debt to consumer credit card debt. The debt you take on comes with a rate of interest that accrues the longer you hold the debt. As you wait to resolve the original debt (problem), you get dinged for the interest on an ongoing basis until the full debt is paid. For example, poor query or application design can be a drag on performance. As the system matures and data volume increases, the ongoing costs to your company could be in the form of processing delays, usability issues or even system outages.
Take time in your new year to review your technical debt. As with consumer debt, you may want to pay off the debt having the highest interest cost, or just go for the easiest on the list to get things rolling. Either way, dig deep and execute a plan so that “live with it for now” does not become “live with it forever”.
#2 Question the excuse that “we’ve always done it this way.”
This resolution is for those old habits that kill productivity while lurking quietly in the shadows. Their net effect on your organization is a drag on resources, and every business has old habits they should question.
If you have business or technical processes that have been around a long time, there is probably one that everyone wants to avoid. Why does it seem that these old processes are guarded by some sort of demon? These ‘demons’ usually represent plain old business challenges such as political tension, technical complexity, a fear of change, or an aversion to ownership.
Understanding the underlying reason for these lingering habits is the first step in changing them. Still, creating “change” won’t be easy. If you plan to tackle one of these change initiatives for 2019, start by defining a discovery phase that includes the right leadership stakeholders. Approval of a “change” initiative cannot be given if there is no summary of the issues and resulting costs. Determine if change must be brought about to an entire system, or just portions of it. Either way, use January as your expiration date for these old excuses, and start questioning everything.
#3 Take another step into self-service BI
The well-established movement into self-service reporting and analysis has helped power users become more productive while keeping shadow IT at bay. Take it a step further by implementing more self-service BI solutions targeting your data power users.
Costs of creating and maintaining reports can be alleviated by funneling newer requests into your self-service offering. The audience you should target includes power users or users who frequently request custom reports. Enabling these users to create their own dashboards, reports or exports is a win-win for business and IT. As always, a solid and secure data foundation is critical before deploying data access to users.
Shifting to self-service reporting implies a move away from canned reports. If you have a mountain of legacy reports to support, this is a chance to double down with resolution #2 above. Is the entire legacy reporting process still relevant? Are the reports worth the processing and support costs? Can users get what they need with more flexibility? While not all canned reports can be retired, it is worth taking inventory on which are still needed and if there are better ways to deliver the information.
#4 Freshen up your data security plan
With all the recent data breaches broadcast in the past year, it’s a good time to take a good look at your data security. You already have yours bulletproof, right? If you are revisiting this for 2019, ensure that your company has an updated data security policy in place. The FTC website offers great resources on implementing a security plan that will make sense for your organization. Keep in mind that data security is applied at different levels, and your business may require it in ways that are not covered by boiler plate policies.
Writing and revising a security plan forces an organization to consider its unique data vulnerabilities. Larger organizations need to review and audit their policies more frequently. This is especially true for organizations settling after a merger or acquisition. Big changes to an organization can expose new vulnerabilities that should be covered in a revised data security plan along with responsible stakeholders.
If you need some further motivation, 2018 was a banner year for data breaches. Just ask the 500 million guests from Marriott’s Starwood hotels. Put a plan in place that works for your organization and keep it current.
#5 Reduce your data plumbing
In the analytics world, the biggest drag with putting your data to work is the plumbing. This includes any processes you implement to move, clean, standardize, restructure and load your data. Traditional data warehouse and OLAP systems spend large amounts on their extraction, transformation and loading (ETL) processes. Legacy reporting systems may go a step further by running elaborate processes to summarize data.
In 2019, data plumbing will continue adding delays and costs to information delivery that are no longer necessary. Look for opportunities to move away from complex, batch-oriented data load processes. Organizations starting on greenfield data solutions are best positioned to minimize their data plumbing, but existing analytic solutions can also be optimized. Here are some scenarios with which you can consider trimming your data pipeline:
- Self-service reporting: power users might prefer using raw data as opposed to the uber-transformed data in traditional data marts and warehouses. To speed up data delivery, consider an approach that resembles “ELT” – extract, load, and then [maybe] transform. Data lakes and other repositories that don’t immediately call for a structure or standard with their data are ideal for this approach. The goal is to load the data as is with less delay.
- Columnar databases: if you have a columnar database platform for your data warehouse (e.g. Microsoft SSAS Tabular models, Amazon Redshift, Snowflake, Incorta, etc.), then traditional approaches of staging your data may no longer be necessary. Structuring source data into staging tables is a worthwhile step with RDBMS data warehouses, but often that indirect approach can be simplified with columnar database platforms. If your source data is well structured, most columnar engines can optimize the loading of source data with less reliance on data preparation. Although direct loading may be possible, keep in mind the impact to your source system when using any load approach.
- Semi-structured data: not long ago, we lacked the tools now available to query and load semi-structured data formats (e.g. JSON files). Development teams that used home-grown query approaches to parse, untangle and unnest this data now live with a maintenance nightmare (not to mention some technical debt). If you truly need that data in a schema, have a look at the semi-structured data extensions and tools now offered with most database platforms. These tools encourage organizations to reconsider data sources they once thought impractical to load.
I hope the new year brings you much success both personally and professionally! I am thankful to my clients and colleagues for another great year. All the best in 2019!