A leading consumer lending organization, operating in multiple states within the contiguous United States, identified a challenge associated with having the capability to report across multiple areas of the business, in a timely and accurate manner. Several of the internal departments operated within their own data silos, constructed in several different ways. It was noted that it had become extremely cumbersome to aggregate and report on data enterprise wide. Each of the individual departments had dedicated analysts with different skill sets that were tasked with consolidating and presenting the data in the manner that was most comfortable. The organization operated on a “skeleton” IT crew with limited bandwidth to support all the individual processes.
Challenges identified by the business included:
In response to these challenges, the client embarked on a journey to implement Snowflake, a cloud-based data warehousing platform, with the goal of centralizing and modernizing its data infrastructure, breaking down data silos, and enabling more efficient data management and analytics. This case study explores the motivations, planning, implementation, and outcomes of this strategic move to address their data challenges.
The Snowflake data architecture can be provisioned on any of the three major cloud providers (AWS, Microsoft Azure, Google Cloud Platform). The client already had a heavy presence within the Microsoft Azure cloud platform; therefore, it made logical sense to utilize and enhance the existing cloud storage layer in Blob Storage.
The Point-of-Sale system had itself been through an evolution over the past decade. It had been migrated from the legacy provider to a new one…. partially. While both systems were still being utilized in Day-to-Day operations, the organization decided to move all activity to a third Point-of-Sale system. All three POS systems were to be active for over 2 years until historical loans were cleared from the system, while ramping up the new. From a data harvesting perspective, the solution needed to have the flexibility to extract from all three systems on a nightly basis and made available for all data processing prior to the next business day.
Additional systems were identified that needed to be included in the overall architecture. These systems included Marketing, Risk, Underwriting, Payment Processing, and other ancillary systems that were deemed critical for the overall analytical solution. Overall, the client is leveraging over 10,000 artifacts to support the analytical solution.
The Microsoft Azure Data Factory service is being utilized for the data collection activities from all sources, including on-prem databases, Azure SQL Databases, multiple cloud solutions (including Salesforce), structured data (.txt, .csv files), and semi-structured data files in XML, JSON, HTML, and emails. All these files are loaded into an external stage in the Blob Storage account and ultimately ingested into the Snowflake database in the data storage layers identified in the “Data Storage” section below.
The data is transformed leveraging Snowflake services to provide a collection of “Blocks”, which consist of identified metrics for several different business areas within the organization. The illustration below depicts the data flow through the solution.
The client utilizes a variety of data sources that need to be consolidated into a single data layer for reporting. The sources consist of:
The Microsoft Azure Data Factory service is utilized for the data extraction component, and load into the Azure Blob Storage account. The files are loaded into Blob storage as a one for one from the source system, with the transformations being performed by the Snowflake services.
Data is stored in Blob Storage and ingested into Snowflake via streams to keep data updated as files are loaded into the storage account.
Snowflake allows for multi-tiers of data storage as depicted in the illustration below.
The Snowflake architecture was chosen to address the challenges being faced, including the following
Time Travel and Cloning: Snowflake provides features like Time Travel and cloning, allowing users to query historical data and create virtual copies of data for testing and development without impacting the original data.
Semi-Structured Data Support: Snowflake can handle semi-structured data formats like JSON, Parquet, and Avro, making it suitable for modern data processing needs.
Cost Efficiency: Snowflake’s pay-as-you-go pricing model means organizations only pay for the resources they use. It provides cost transparency, which can help manage data warehousing costs effectively.
Elasticity: Snowflake’s elastic compute resources can automatically adjust to match query demands, ensuring optimal performance during peak usage periods.
Global Availability: Snowflake is available on multiple cloud platforms, making it a suitable choice for organizations operating globally. It offers data replication across regions for high availability and disaster recovery.
Compliance: Snowflake complies with various data protection and privacy regulations, such as GDPR and HIPAA, which is essential for organizations handling sensitive data.
Ready to make the switch to Snowflake? Contact us to get started.