Our Data Lake and Warehouse Explained
Our Data Lake and Warehouse Explained
Overview
At Learn Amp, our data architecture is designed to maximise both performance and flexibility. We use a modern Data LakeHouse approach—built on AWS Redshift—to give you the best of both worlds: structured reporting power and the flexibility to export data for advanced analytics.
This page explains how our architecture works and why it benefits you.
Functionality Breakdown
How Data Flows
Data ingestion: Learning data is loaded from our database into AWS Redshift
Transformation: Data is cleaned, transformed, and organised into an optimised schema
Your access: You connect to Redshift and either query directly or export to your preferred format
Connection Options
All access is via our AWS Redshift database. What you do with it depends on your use case:
Use Case | Approach | Example Tools |
|---|---|---|
BI Dashboards | Query Redshift directly | Power BI, Tableau, Looker |
Ad-hoc Reporting | Query Redshift directly | DBeaver, psql |
Data Science / ML | Export from Redshift to Parquet/CSV | Python, Spark, Databricks |
Custom Pipelines | Export from Redshift to your data lake | Your ETL tools |
💡 Tip: For advanced analytics workloads, we recommend exporting your query results to Apache Parquet format—an industry-standard columnar format that's highly efficient for data science tools.
Benefits of Our Approach
Optimised for Business Intelligence
The Redshift data warehouse delivers high-quality, structured data that's ready for seamless integration with BI tools. This makes it easy to perform fast, accurate analysis for reporting and strategic insights.
Flexible for Advanced Analytics
By connecting to Redshift and exporting to formats like Apache Parquet, you get the flexibility to work with your data in whatever way suits your needs—from machine learning to exploratory analysis.
Single Point of Access
One connection method, multiple use cases. Whether you're building a dashboard or training an ML model, you start with the same Redshift connection and take it wherever you need.
Pre-requisites
Role Requirements
This is a provisioned service—there are no in-app settings to configure.
Access Type | How to Get Access |
|---|---|
Redshift connection | Contact your Customer Success Manager for credentials |
Security Measures
We take your data security seriously:
IP Whitelisting: Only approved public IP addresses can connect
Secure Credentials: Connection details are sent securely to your designated Primary Contact
Read-Only Access: Your schema is isolated and read-only
Quick Start Guide
Contact your CSM to enable Data Lake access
Provide your requirements:
Public IP addresses for whitelisting
Primary Contact details for receiving credentials
Connect to Redshift using the provided credentials
Choose your approach:
Query directly for BI dashboards
Export to Parquet/CSV for data science
Review the schema: Data LakeHouse: Data Schema
FAQs
Do I need separate access for Data Warehouse vs Data Lake?
No—both are the same Redshift connection. "Data Warehouse" refers to querying directly; "Data Lake" refers to exporting data to formats like Apache Parquet for advanced analytics.
What export formats are supported?
Once connected to Redshift, you can export to any format your tools support: Apache Parquet, CSV, JSON, and more.
How often is data refreshed?
Data refreshes hourly through our ETL pipeline.
What tools can I use?
Any PostgreSQL-compatible client: Power BI, Tableau, DBeaver, psql, Python libraries, and more.
Troubleshooting
Issue | Solution |
|---|---|
Not sure how to export to Parquet | Use your BI tool's export feature, or query via Python with pandas/pyarrow |
Need help with specific tool setup | |
Connection issues | Verify port 5439 is open and IP is whitelisted |
Next Steps
Set up access: Getting Started with Data Lake
Explore available data: Data LakeHouse: Data Schema
Connect Power BI: Connecting Power BI via Data Lake