Our Data Lake and Warehouse Explained

At Learn Amp, our data architecture is designed to maximise both performance and flexibility by utilising a combination of a data warehouse and a data lake.

  • Data Warehouse: Ideal for structured, high-performance reporting and business intelligence.

  • Data Lake: Perfect for flexible, large-scale data analysis and advanced analytics like machine learning.

BI tool diagram (1).png

How Our Setup Works

  • Data Warehouse (Redshift): We begin by loading data from our database into temporary tables within Redshift. Here, we transform and organise the data into a structured Redshift Schema - our “Data Warehouse”. This structured format ensures high data quality and performance, making it easy for BI tools to generate reliable insights and support decision-making.

  • Data Lake (S3 with Apache Parquet): Once the data is organised in Redshift, we export it into an S3 bucket in Apache Parquet file format, which forms our “Data Lake.” Unlike raw data lakes, our files retain a semi-structured format per table, offering both flexibility and organisation. This setup allows data scientists and teams to leverage raw data for advanced analytics, such as machine learning or custom data projects.

The Benefits of Our Hybrid Approach

  1. Optimised for Business Intelligence: Our data warehouse delivers high-quality, structured data, ready for seamless integration with BI tools. This makes it easy to perform fast, accurate analysis for reporting and strategic insights.

  2. Flexible Advanced Analytics: By exporting data into a data lake, we offer the flexibility to work with data in its nearly raw form. This supports diverse use cases, from machine learning to exploratory analysis, without sacrificing organisation.

  3. Flexible Data Accessibility: With secure access to both the data warehouse and the data lake, you can use the right data source for your needs, whether it’s structured reporting or exploratory data science.

Secure Access and Convenience

  • Data Warehouse Access: You’ll receive credentials and access details for our Redshift schema. We implement strict security measures, requiring you to provide specific IP addresses, which we will whitelist to ensure only authorised users can access our data warehouse.

  • Data Lake Access: For the data lake, we give you access to download Parquet files from our AWS S3 bucket. This setup allows your team to integrate data with your existing systems securely.