At Learn Amp, our data architecture is designed to maximise both performance and flexibility by utilising a combination of a data warehouse and a data lake.
Data Warehouse: Ideal for structured, high-performance reporting and business intelligence.
Data Lake: Perfect for flexible, large-scale data analysis and advanced analytics like machine learning.
...
How Our Setup Works
Data Warehouse (Redshift): We begin by loading data from our database into temporary tables within Redshift. Here, we transform and organise the data into a structured Redshift Schema - our “Data Warehouse”. This structured format ensures high data quality and performance, making it easy for BI tools to generate reliable insights and support decision-making.
Data Lake (S3 with Apache Parquet): Once the data is organised in Redshift, we export it into an S3 bucket in Apache Parquet file format, which forms our “Data Lake.” Unlike raw data lakes, our files retain a semi-structured format per table, offering both flexibility and organisation. This setup allows data scientists and teams to leverage raw data for advanced analytics, such as machine learning or custom data projects.
The Benefits of Our Hybrid Approach
...
Data Warehouse Access: You’ll receive credentials and access details for our Redshift schema. We implement strict security measures, requiring you to provide specific IP addresses, which we will whitelist to ensure only authorised users can access our data warehouse.
Data Lake Access: For the data lake, we give you access to download Parquet files from our AWS S3 bucket. This setup allows your team to integrate data with your existing systems securely.
Why This Matters
...
.