“Data really powers everything that we do.”
– Jeff Weiner, CEO, LinkedIn
This post is about my learning with AWS Redshift DDL commands. I will be going through the below objects in Redshift.
1. Group
This defines a new user group in the redshift. In this below command I will be creating user groups for developers and analysts.
# create user group 'developers_group' for developers
>>>CREATE GROUP developers_group;
# create user group 'analysts_group' for analysts
>>>CREATE GROUP analysts_group;
# grant permssion to access schema 'retail_banking_schema' to 'developers_group'
>>>GRANT USAGE ON SCHEMA retail_banking_schema TO GROUP developers_group;
# grant permssion to access schema 'retail_banking_schema' to 'analysts_group'
>>>GRANT USAGE ON SCHEMA retail_banking_schema TO GROUP analysts_group;
# adding a user 'suraraj' to 'developers_group'
>>>alter group developers_group add user suraraj;
>>>commit;
2. User
Users can be categorised as Superusers and normal users.
A. Superusers (User with “CREATEUSER” permission)
- Database superusers have the same permissions as database owners for all databases.
- A superuser can ony create another superuser.
- A admin user is a user created during cluster creation.
- Superuser can only view the sytstm catalog tables.
Create a superuser “adminuser” and the only way to identify a superuser who has “CREATEUSER” permission.
# create sueruser 'adminuser' with password in in plain-text
>>>CREATE USER adminuser CREATEUSER PASSWORD '1234Admin';
# Alter a sueruser 'adminuser' to add CREATEUSER persmission
>>>ALTER USER adminuser CREATEUSER;
B. Users (normal user)
- A superuser can only create a new user.
- User with default password setting => User can change password.
- User with password DISABLE => restrict user from changing password, as well as the password is deleted from system and user can only temporarily login through IAM credential.
# create user 'suraraj' with password in in plain-text
>>>CREATE USER suraraj password 'password12345';
# create user 'suraraj' with password in in md5
step1 : create password in MD5 string using md5 redshift function
>>>select md5('password12345') ===>> 234c434b4b77c89e6b94f12c5393af5b
step2 : create user by concating the md5 hashed string and a prefex 'md5'
>>>CREATE USER suraraj password 'md5234c434b4b77c89e6b94f12c5393af5b';
# Similary we can create password with sha256hash
Optional parameters | Descriptions |
---|---|
CREATEDB | NOCREATEDB | Whether the user can create a database |
CREATEUSER | NOCREATEUSER | Whether the user to create a new user |
SYSLOG ACCESS { RESTRICTED | UNRESTRICTED } | RESTRICTED: Visible only the rows created by the given user. UNRESTRICTED: Visible all rows created by other users as well. |
IN GROUP ‘developers_group‘ | User in ‘developers_group’ group |
VALID UNTIL | Time until the password is valid. |
CONNECTION LIMIT { limit | UNLIMITED } | Max concurrent connection by the user, not just the connections. Max available per database: 500 user: usually 2 per user superuser: no limit |
SESSION TIMEOUT limit | The time period until the session stays active. Min: 1 min to Max 20days |
# List all database users from PG_USER_INFO catalog table.
>>>select * from pg_user_info;
#This will provide the usernames along with the respective groups.
>>>SELECT usename, groname
FROM pg_user, pg_group
WHERE pg_user.usesysid = ANY(pg_group.grolist)
AND pg_group.groname in (SELECT DISTINCT pg_group.groname from pg_group);
3. Database
Creates a new database in the cluster.
CREATE DATABASE retail_banking_db
OWNER='suraraj'
CONNECTION LIMIT 500
COLLATE CASE_INSESITIVE
Optional Parameter | Description |
---|---|
CONNECTION LIMIT | limit: A maximum number of database connections users are permitted to have open concurrently. UNLIMITED: to permit the maximum number of concurrent connections |
COLLATE CASE | whether string search or comparison is CASE_SENSITIVE or CASE_INSENSITIVE. Default is CASE_SENSITIVE. |
4. Schema
Schema is a collection of database objects linked to a database and a user which segregates the objects with security and access permissions.
- Each Redshift database has a default schema: PUBLIC
- Schemas care two types of schema : Local Schema and External Schema.
Externa Schema:
External schemas are created to reference a database in an external data catalog such as :
- AWS Glue
- Athena
- Database in HIVE Metastore
- Amazon EMR
- Aamazon RDS/ Auroraor any other Postgre SQL edition database
# Create Exernale Schema "gluetablescheama" from Glue Data Catalog
# Glue database Name : samplegluedb
# IAM_ROLE: IAM role to be created with glue,redshift and S3full access.
CREATE EXTERNAL SCHEMA IF NOT EXISTS gluetablescheama
FROM DATA CATALOG
DATABASE 'samplegluedb'
REGION 'eu-west-2'
IAM_ROLE 'arn:aws:iam::<AWS account-id>:role/<role-name>'
CREATE EXTERNAL DATABASE IF NOT EXISTS
Refer to the Amazon CREATE EXTERNAL SCHEMA link for more details.