Getting Started with Data Lake
Overview
This guide provides everything you need to securely connect to your Learn Amp Data Lake—built on AWS Redshift. Whether you're setting it up for the first time or troubleshooting access, you'll find all the technical requirements and step-by-step instructions here.
💡 Looking to enable Data Lake? Contact your Customer Success Manager to explore how this feature can enhance your reporting and data integration capabilities.
Functionality Breakdown
What is Learn Amp Data Lake?
The Learn Amp Data Lake is part of our modern Data LakeHouse architecture, designed to deliver scalable, flexible access to learning data for advanced reporting, analytics, and integration.
Feature | Description |
|---|---|
Advanced Analytics | Self-service dashboards and reporting tools embedded in Learn Amp |
Direct Data Access | Secure, customer-specific schemas for BI tools, HRIS platforms, and AI pipelines |
Custom ETL Pipeline | Optimised for analytics workloads, transforming data into query-ready formats |
⚠️ Note: The Data Lake is currently in Beta. We're actively evolving the platform based on customer feedback.
Data Refresh
Our Beta release targets an hourly ETL refresh cycle, ensuring your data stays current throughout the day.
Pre-requisites
Role Requirements
This is a provisioned service—there are no in-app settings to configure.
Action | Who Can Help |
|---|---|
Enable Data Lake access | Contact your Customer Success Manager |
Receive credentials | Provided to designated Primary Contact |
Request IP whitelisting | Submit via Customer Support Portal |
Technical Requirements
Before connecting, ensure your network meets these requirements:
Requirement | Details |
|---|---|
Outbound TCP Port 5439 | Must be open on your network firewall to allow connections to Redshift |
DNS Resolution | Your network must resolve the hostname |
IP Whitelisting | The public IP address(es) provided must be your actual egress IPs—NAT/proxy must not mask them |
Antivirus/Firewall | Local security software must allow traffic on port 5439 with no SSL interception |
Proxy Configuration | If using a proxy, it must permit access to |
Why IP Whitelisting Matters
To safeguard your data, access to the Data Lake is strictly controlled through IP Whitelisting. Only approved public IP addresses can connect—this aligns with the Principle of Least Privilege and minimises security risks.
Quick Start Guide
Step 1: Gather Required Details
Collect the following before submitting your request:
Your Learn Amp subdomain
Public IP address(es) to be whitelisted (not internal/dynamic IPs)
Primary Contact name and email for receiving credentials
Step 2: Submit Your Request
Raise a ticket via the Customer Support Portal including:
Your subdomain
List of IP addresses (up to 5)
Primary Contact details
Step 3: Provisioning
We will:
Create a read-only Data Lake user for your subdomain
Whitelist your provided IP addresses
Send connection credentials securely to your Primary Contact
You'll receive: hostname, database name, username, and password.
Step 4: Test Your Connection
Use any PostgreSQL-compatible client to connect:
Hostname:
bi.la-dl.comPort:
5439Database: Provided in credentials (e.g.,
prod_eu1)Username: Provided in credentials
Password: Provided in credentials
💡 Tip: Store your credentials securely—they'll be needed for all future connections.
Example Connection
Using psql Command Line
# Connect using psql client
psql -d prod_eu1 -U datalake_access_<subdomain>_db_user -h bi.la-dl.com -p 5439
# List all available tables in your schema
\dt <subdomain>.*
# View data in a table
SELECT * FROM <subdomain>.tags LIMIT 10;Replace <subdomain> with your actual Learn Amp subdomain.
FAQs
What if I use a NAT Gateway or proxy?
If your outbound traffic routes through NAT or a proxy, the public IP seen by external services is your NAT/proxy egress point. This is the IP to whitelist—and it must be static. If your proxy performs TLS inspection, configure an exception for TCP 5439 traffic to bi.la-dl.com.
What if I need more than 5 IP addresses?
Our standard policy allows up to 5 IPs per customer. If you need more (e.g., multiple offices or large analyst teams), submit a request via the Customer Support Portal with a brief justification explaining your network architecture.
What BI tools can I use?
Any PostgreSQL-compatible tool: Power BI, Tableau, Looker, DBeaver, or command-line psql.
How often is data refreshed?
Data refreshes hourly. Actual times may vary during Beta as we optimise performance.
Troubleshooting
Issue | Solution |
|---|---|
Connection refused | Verify port 5439 is open on your firewall and your IP is whitelisted |
Authentication failed | Double-check credentials—usernames and passwords are case-sensitive |
Cannot resolve hostname | Check DNS resolution for |
Timeout during connection | Check proxy/firewall settings. Ensure no SSL interception on port 5439 |
Need more than 5 IPs | Submit a request via Support Portal with justification |
Still Need Help?
We're here to support you. If you run into issues or have questions:
📩 Submit a ticket via the Customer Support Portal
🔄 Reply directly to your existing support request
Next Steps
Review available data: Data LakeHouse: Data Schema
Connect Power BI: Connecting Power BI via Data Lake
Learn about our architecture: Our Data Lake and Warehouse Explained