Byte Ebi's Logo

Byte Ebi 🍤

A Bit everyday A Byte every week

[AWS Cloud Fundamental Notes] Kinesis、EMR and Redshift

Introduction to Kinesis, EMR and Redshift usage and operational mechanism

Ray

Used for big data architecture

graph LR;
data-reception-->data-processing-->data-warehousing;

Corresponding Services

Amazon Kinesis

Kinesis : Specialized in collecting and streaming big data, supporting two major data types:

  1. Video: Using Kinesis Video Streams
  2. General data: Using Kinesis Data Streams

The party responsible for uploading data to Kinesis, such as various sensors, is called a Kinesis Producer.
The output data is called a Data Record and is stored in the Kinesis Stream.

Shard

There are multiple “shards” in a stream, responsible for buffering data dispersal.
When data traffic increases, the number of shards needs to be increased to ensure sufficient processing speed.
Each shard can write up to 1000 records per second, with a total of up to 1MB, and read up to 2MB per second.

Kinesis Consumer

The system that extracts and operates on data is called a Kinesis Consumer.

Amazon Elastic MapReduce(EMR)

The underlying engine of AWS EMR is the open-source big data processing system Hadoop.
It is the most common Kinesis Consumer and is the AWS cloud version of Hadoop.

The service is based on EC2 Instances and can be accessed via SSH to the operating system inside EMR for operation.

The organized data is sent to S3 or Redshift.

Redshift

Redshift : A big data data warehousing solution for data analysis, is a column-based clustered database.

Main uses:

  1. Data Warehousing: Stores dense storage machines
  2. Data Analysis: Computes intensive machines

Recent Posts

Categories

Tags