/
How can I request access to the Data Lake?

How can I request access to the Data Lake?

Access to the Data Lake is provided to customers who have purchased the Data lake Bolt-on and to customers on “Advanced” package. Please contact your Customer Success Manager if you are interested in access to the Data Lake.

 

This guide illustrates the steps needed to provide you with access to the Data Lake.

 

Preliminary considerations: Security Best Practices

To ensure the security of your Data Lake, we follow the principle of least privilege. This means we only whitelist the necessary IPs to access the Data Lake. Please ensure that:

  • Only the specific IPs or IP ranges that will need access to the Data Lake are provided. We recommend whitelisting only a small number of IPs for the most secure access.

  • If you are unsure which IPs to whitelist, please consult with your IT team to verify the minimal set of IPs needed for access.

  • Avoid providing large IP ranges, unless absolutely necessary. Whitelisting large ranges increases the attack surface and poses unnecessary security risks.

  • Redshift Endpoint Access: In addition to whitelisting your IPs, ensure your firewall is configured to allow access to the Redshift endpoint.

IP Whitelisting Policy for Datalake Access

To enhance security and ensure efficient access control, we enforce a strict IP whitelisting policy:

  1. Minimal Access Approach

    • Customers must specify only the necessary users or machines requiring access.

    • Instead of large CIDR blocks, only the exact IPs of authorized users/machines should be provided.

    • This minimizes security risks and ensures controlled access.

  2. Maximum 5 IPs Per Customer

    • Each customer can whitelist up to 5 IP addresses for accessing the datalake.

    • Requests exceeding this limit will need to be analysed and approved.

    • Customers should carefully choose the most essential IPs for access.

This policy ensures a secure and efficient way for customers to access the datalake while preventing excessive exposure.

 

Process to request access

Step 1: Gather the Required Information

Ensure you have the following details prepared:

  1. Your Subdomain: The subdomain associated with your company in our system.

  2. Your IP Address: Access to the Data Lake is restricted to whitelisted IPs. Provide all the IP addresses that will be used to access the Data lake by all users that will be authorized to do so in your organization.

  3. Point of Contact: Identify the primary contact person at your organization for this request. This person will receive secure credentials for access.

Step 2: Submit Your Request

To request access:

  1. Create a Support question in our customer portal.

  2. Include the following in your request:

    • Your company subdomain.

    • The IP addresses to whitelist.

    • The name and email of the primary contact person.

Step 3: Access Provisioning

Once your request is received:

  1. A read-only user will be created for your subdomain in the Data Lake.

  2. Connection credentials, including the username and connection details, will be securely shared with you via email.

  3. The IP addresses provided will be whitelisted.

Note: The credentials will be stored securely in our system, but ensure you save them securely for your use.

Step 5: Test Access

After receiving your credentials:

  1. Use the provided credentials to verify access to the schema and tables. You can use a psql connection, example below:

# connect using psql client psql -d prod_eu1 -U datalake_access_<subdomain>_db_user -h bi.la-dl.com -p 5439 # list all of the available tables of your schema /dt <subdomain>.* # view data in a table select * from <subdomain>.tags limit 10;
  1. If you encounter any issues or have any question, please comment on your open ticket or raise a new support ticket.

Client-Side Requirements for Successful Redshift Access

🔒 Network Configuration

  1. Allow Outbound Access to Redshift Endpoint:

  2. Ensure Actual Public IP Matches Whitelisted IPs:

    • Clients may use NAT or proxy gateways — ensure outbound traffic is coming from the whitelisted public IP(s).

  3. No Proxy Blocking Redshift:

    • If outbound connections go through a proxy, it must allow:

      • TCP 5439

      • Connections to Redshift domain/IPs

    • Some proxies break SSL or require explicit allowlisting of domains

  4. DNS Must Work:

    • Redshift is hostname-based. Their DNS must resolve your Redshift hostname to a valid AWS public IP (176.34.231.221) (use nslookup or dig).

    • Internal DNS or firewall DNS overrides may interfere


🛡️ Firewall / Security Gateway

  1. Permit Egress to AWS Redshift IP Range (Region-Specific):

    • Ensure their firewall doesn't block outbound traffic to AWS public IPs for Redshift (you can find IP ranges in AWS IP Ranges JSON)

  2. Disable Deep Packet Inspection / SSL Termination:

    • Redshift uses TLS for secure connections; DPI tools may block it or degrade performance

  3. Whitelist Domain/Service:

    • Add your Redshift endpoint to allowlist in any:

      • Web filters

      • Security gateways (like ZScaler, Palo Alto, etc.)

      • Proxy-based controls

  4. Antivirus / Endpoint Protection:

    • Some corporate antivirus software blocks unknown ports like 5439 or drops packets silently

    • Redshift must be added to "trusted domains" if needed


📊 Client Tools (e.g. PowerBI, Tableau, DBeaver)

  1. Install Redshift ODBC/JDBC Driver:

  2. Allow the Tool to Connect Unrestricted:

    • Ensure firewall/antivirus allows PowerBI/Tableau/DBeaver to initiate outbound TCP traffic

    • Avoid using intermediate network gateways that could alter requests

  3. Use Correct Connection String:

    • Example:

      redshift-cluster.czare90wicld.eu-west-1.redshift.amazonaws.com:5439/prod_eu1
      redshift-cluster.czare90wicld.eu-west-1.redshift.amazonaws.com:5439/prod_eu2


📋 Summary Checklist

Setting

Required Configuration

Setting

Required Configuration

Outbound Port 5439

Must be open

DNS Resolution

Must resolve to AWS Redshift public IP (176.34.231.221)

Firewall

Must not block AWS/Redshift traffic

Proxy

Must allow Redshift endpoint and not intercept TLS

Public IP

Must match whitelisted IPs

Antivirus/EDR

Must not block outbound database traffic

BI Tool Setup

Correct driver + correct connection string

 

Related content