Our Data Lake and Warehouse Explained

Our Data Lake and Warehouse Explained

Our Data Lake and Warehouse Explained

Overview

At Learn Amp, our data architecture is designed to maximise both performance and flexibility. We use a modern Data LakeHouse approach—built on AWS Redshift—to give you the best of both worlds: structured reporting power and the flexibility to export data for advanced analytics.

This page explains how our architecture works and why it benefits you.


Functionality Breakdown

How Data Flows

  1. Data ingestion: Learning data is loaded from our database into AWS Redshift

  2. Transformation: Data is cleaned, transformed, and organised into an optimised schema

  3. Your access: You connect to Redshift and either query directly or export to your preferred format

Connection Options

All access is via our AWS Redshift database. What you do with it depends on your use case:

Use Case

Approach

Example Tools

Use Case

Approach

Example Tools

BI Dashboards

Query Redshift directly

Power BI, Tableau, Looker

Ad-hoc Reporting

Query Redshift directly

DBeaver, psql

Data Science / ML

Export from Redshift to Parquet/CSV

Python, Spark, Databricks

Custom Pipelines

Export from Redshift to your data lake

Your ETL tools

💡 Tip: For advanced analytics workloads, we recommend exporting your query results to Apache Parquet format—an industry-standard columnar format that's highly efficient for data science tools.


Benefits of Our Approach

Optimised for Business Intelligence

The Redshift data warehouse delivers high-quality, structured data that's ready for seamless integration with BI tools. This makes it easy to perform fast, accurate analysis for reporting and strategic insights.

Flexible for Advanced Analytics

By connecting to Redshift and exporting to formats like Apache Parquet, you get the flexibility to work with your data in whatever way suits your needs—from machine learning to exploratory analysis.

Single Point of Access

One connection method, multiple use cases. Whether you're building a dashboard or training an ML model, you start with the same Redshift connection and take it wherever you need.


Pre-requisites

Role Requirements

This is a provisioned service—there are no in-app settings to configure.

Access Type

How to Get Access

Access Type

How to Get Access

Redshift connection

Contact your Customer Success Manager for credentials

Security Measures

We take your data security seriously:

  • IP Whitelisting: Only approved public IP addresses can connect

  • Secure Credentials: Connection details are sent securely to your designated Primary Contact

  • Read-Only Access: Your schema is isolated and read-only


Quick Start Guide

  1. Contact your CSM to enable Data Lake access

  2. Provide your requirements:

    • Public IP addresses for whitelisting

    • Primary Contact details for receiving credentials

  3. Connect to Redshift using the provided credentials

  4. Choose your approach:

    • Query directly for BI dashboards

    • Export to Parquet/CSV for data science

  5. Review the schema: Data LakeHouse: Data Schema


FAQs

Do I need separate access for Data Warehouse vs Data Lake?

No—both are the same Redshift connection. "Data Warehouse" refers to querying directly; "Data Lake" refers to exporting data to formats like Apache Parquet for advanced analytics.

What export formats are supported?

Once connected to Redshift, you can export to any format your tools support: Apache Parquet, CSV, JSON, and more.

How often is data refreshed?

Data refreshes hourly through our ETL pipeline.

What tools can I use?

Any PostgreSQL-compatible client: Power BI, Tableau, DBeaver, psql, Python libraries, and more.


Troubleshooting

Issue

Solution

Issue

Solution

Not sure how to export to Parquet

Use your BI tool's export feature, or query via Python with pandas/pyarrow

Need help with specific tool setup

Check Getting Started with Data Lake

Connection issues

Verify port 5439 is open and IP is whitelisted


Next Steps