August 7, 2018/Jin Huang/Big Data

Companies today are drowning in data. The average company exceeds 50 percent per year in volume of data growth and has an average of 33 unique data sources — these are overwhelming amounts that make extracting analytical insights an arduous task.

For those swamped with data that’s not being put to use, data lakes can provide immense value. Data lakes are storage systems that hold large volumes of raw, highly diverse data from many sources. Aside from providing internal benefits such as architecture flexibility and scalability, they make processing data quicker and more accurate, uncovering analytical insights that companies didn’t even realize before.

In the pursuit of scalable and high-performing SaaS technology to support customers, we at Aktana recently built out a multitenant, big data architecture for our platform and started utilizing data lakes. Although the processing happens on the back-end, below we’ve outlined a few ways in which data lake storage has analytical advantages for end users in marketing and sales, as well our comprehensive security approach to data lakes. Here’s what you need to know about data lakes:

They produce a ripple effect of insights.

Research shows that companies using data lakes improved the accuracy of their data analysis by a significant 25 percent. This is largely because data lakes ingest and preserve data from a wide range of sources, whether that’s CRM systems, data warehouses, or external market share and sales data. With more information to work with, processors like Aktana’s decision support engine produce analytical insights that are a better reflection of the market and, thus, more likely to succeed when executed upon. For sales and marketing teams using Aktana, this means more relevant suggestions for healthcare professional (HCP) engagement and higher chances for conversion.

When the immediate insights gleaned from this stored data better reflect market variables, the long-term strategy also benefits. At Aktana, the results of our newly refined suggestions also feed back into our machine learning model. Using this newly processed data, the system will reveal new patterns and better informed suggestions for strategy rules like prioritizing content development or recommending alternate channel integrations with hard-to-see HCPs.

With market conditions constantly evolving, companies may not have the bandwidth or resources to continuously adjust their strategy accordingly. These advanced analytics, as supported by data lake storage, can help commercial teams make smart decisions that take multiple market variables into account — all with the click of a button.

They’re a treasure chest for future insights.

Prior to data lakes, it was common practice to extract a company’s data for a specific use case and discard any leftover data. In unique contrast, data lakes preserve massive amounts of data in its original form for future use.

This is particularly valuable since the rapidly changing marketplace makes it difficult to foresee all the potential use cases for data. In fact, many companies view future data discovery as a major advantage — according to a TDWI survey, 49 percent of respondents see data exploration as a top benefit of data lakes. For example, let’s say you’ve been using segmentation data as a driver for suggestions, and later on you want to see how HCP segmentation changes over time to better understand why conversion happens. This is easy to do because you have the historical segmentation data in the data lake.

Another real-world example is facilitating ongoing and iterative feature engineering. You build a machine learning model that uses features based on characteristics like HCP profile, interactions, segmentations, and sales data. Later on, you discover that combining various data sets to create new features actually leads to better predictions. As pointed out in a recent talk at Stanford by AI and deep learning expert Andrew Ng, coming up with features is difficult, time-consuming, and requires expert domain knowledge. This wouldn’t be possible without a data lake that preserves not only fidelity of data but also retains historical data for a long period of time.

As data analytics inevitably become more prevalent for companies (53 percent of companies are using big data analytics today), this long-term storage feature of data lakes will only continue to grow in importance.

How to keep customer data safe.

Privacy and security are imperative to the design, development, and deployment of data lakes. Data remains protected and logically separated from other customer data with adherence to these strict privacy and security best practices:

  • Create customer- and region-specific buckets within the data lake so customer data doesn’t intermix with non-relevant data
  • Encrypt customer data in databases and file systems with customer-specific keys, leveraging infrastructure and best practices provided by AWS key management system (KMS).
  • Conduct security and compliance testing for every release and production system regularly, utilizing current OWASP security standards.
    • Take all necessary measures to be in compliance with GDPR. At Aktana, this includes, but is not limited to, a comprehensive review of all services, systems, and vendors that we work with. More details are available here.

Adhering to these best practices at all times means customers can rest assured that their rich data is stored away carefully and that their marketing and sales users can focus efforts on leveraging the analytical advantages of data lakes.

A Winning Strategy

Data lakes are not only a savvy storage solution, but they also provide a strategic advantage by opening up a world of analytics that would be cumbersome to run otherwise. By ingesting high volumes of diverse data, secure data lakes enable more effective decision support for marketing and sales teams.