AWS S3 RPDSQL Pipeline

Architecture

Branching

Set Up AWS S3 Bucket

Create an S3 bucket, and upload a test file:

test.csv

1,2
3,4
5,6

Branching

Create bucket policy:

Branching

Create Pipeline

You must log in to RPDSQL database first (not RapidsDB), using rpdsql-ops/rapids command (not Rapids Shell).

create table beta (num VARCHAR(20), price VARCHAR(20));

CREATE PIPELINE p AS LOAD DATA
S3 'rapids-db-test'
CONFIG '{"region": "ap-southeast-1"}'
CREDENTIALS '{"aws_access_key_id": "key_id", "aws_secret_access_key": "secret_key"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE beta
FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY '';

test pipeline p;

show pipelines;

start pipeline p;

Test and Run Pipeline

Branching

After creating and running the pipeline in the RPDSQL database, the AWS S3 data should be transferred into RPDSQL database. Then you can use RapidsDB connector to load data from RPDSQL database and do your data analysis.