Written by Carlos Requena
on April 24, 2020

[AWS] Simple Storage Service S3

“Emergencies have always been the pretext on which the safeguards of individual liberty have been eroded.” –Friedrich August von Hayek

Simple Storage Service (S3)

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

Amazon S3 provides easy-to-use management features so you can organize your data and configure finely-tuned access controls to meet your specific business, organizational, and compliance requirements.

Amazon S3 is designed for 99.999999999% (11 9’s) of durability, and stores data for millions of applications for companies all around the world.

You can use S3 to store and retrieve any amount of data at any time, from anywhere on the web.

The basics of S3 are

S3 is object-based and allow you to upload files.
Files can be from 0 Bytes to 5TB
There is unlimited storage
Files are store in buckets. Think about bucket as folders.
S3 is a universal namespace, this means that the name of the bucket must be unique globally.
When you upload a file to S3, you will receive a http 200 code if the upload was successful

S3 Objects

Think of objects just as files. Objects consist of the following:

Key: This is simply the name of the object.
Value: This is simply the data and is made up of a sequence of Bytes.
VersionId: Used for versioning
Metadata: Data about the data you are storing
Sub-resources: ACLs, torrent

S3 data consistency

Read after write consistency for PUTS of new objects, this means that if you upload a file for first time you will be able to read it immediately.
Eventual consistency for overwrite PUTS and DELETES, this means that if you upload or update or delete an existing file and try to read it immediately you may get the older version or you may not, basically changes to objects can take a little bit of time to propagate.

S3 features

Tired storage availability (S3 standard, S3-IA, S3-OneZone-IA, S3 Intelligent Tiering, S3 Glacier, S3 Glacier deep archive).
Lifecycle management.
Versioning.
Encryption.
MFA for delete.
Secure your data using access control lists ACLs and bucket policies.

S3 storage classes and tiers.

S3 Standard: 99.99% availability and 99.99999999999% durability, store your objects redundantly across multiple devices in multiple facilities and is designed to sustain the loss of two facilities concurrently.
S3-IA Infrequent Access: Is for data that is less frequently accessed but requires rapid access when needed. It has a lower fee than S3 Standard, but you will be charged a retrieval fee.
S3 One Zone Infrequent Access: For where you want a low-cost option for infrequently accessed data, but that does not require the multiple availability zone data resilience.
S3 Intelligent-Tiering: Designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead.
S3 Glacier: Is a secure, durable and low-cost storage class for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premise solutions. Retrieval times is configurable from minutes to hours.
S3 Glacier Deep Archiving: Is the amazon S3 lower-cost storage class where a retrieval time of 12 hours is acceptable.

S3 performance

Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned prefix. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second. Similarly, you can scale write operations by writing to multiple prefixes.

Optimizing Amazon S3 performance

S3 Security and Encryption

By default, all newly created bucket are private, you can setup access control to your bucket using:

Bucket policies at bucket level.
Access Control List ACLs at objects level.

S3 buckets can be configured to create access logs which log all requests made to the S3 bucket. This can be sent to another bucket and even another bucket in another account.

Encryption in transit

Encryption in transit is achieved by SSL/TLS

Encryption at rest (Server Side)

S3 Manage Keys SS3-S3: Where Amazon manages all the keys
AWS Key Management Service SSE-KMS: Where you and Amazon manage the keys
Server Side Encryption Where you provide the KEY-SSE-C and give to Amazon your own keys that you manage.

Encryption at rest (Client Side)

Where you upload your files already encrypted using your own keys.

S3 Versioning

Stores all versions of an object (Including all writes and even if you delete an object)
Once enabled, versioning cannot be disabled but only suspended.
Versioning is integrated with lifecycle rules.
Versioning has MFA capabilities, which uses multi-factor authentication to provide an additional layer of security.

S3 Lifecycle management

Uses lifecycle rules to manage your objects, which define how Amazon S3 manages objects during lifetime. Lifecycle rules enable you to automatically transition objects among the different storage classes and tier from S3 to Glacier. Using a lifecycle rule you can automatically expire objects based on your retention needs or clean up incomplete multi-parts uploads.

There three ways to share buckes across accounts.

Using bucket policies and IAM Applies across the entire bucket and provide programmatic access only.
Using ACLs and IAM Applies to individual objects and provide programmatic access only.
Cross-Account IAM Roles Provides programmatic and console access.
See Example walkthroughs: Managing access to your amazon S3 resources

S3 Cross region replication

Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. Buckets that are configured for object replication can be owned by the same AWS account or by different accounts. You can copy objects between different AWS Regions or within the same Region.

Versioning must be enabled on both the source and the destination buckets.
Files in an existing bucket are not replicated automatically.
All subsequent updated files will be replicated automatically.
Delete markers are not replicated.
Deleting individual versions or delete markers will not be replicated.

To enable object replication, you add a replication configuration to your source bucket. The minimum configuration must provide the following:

The destination bucket where you want Amazon S3 to replicate objects
An AWS Identity and Access Management (IAM) role that Amazon S3 can assume to replicate objects on your behalf

S3 Transfer acceleration.

S3 transfer acceleration uses the Cloud Front edge network to accelerate your uploads to S3 instead of uploading directly to your S3 bucket, you can use a distinct URL to upload your files to an edge location which then transfer that files to S3 using the Amazon enhanced network.

Enables fast, easy and secure transfer of files over long distances between your clients and an S3 bucket. S3 transfer acceleration takes advantage of Amazon CloudFront globally distributed edge locations as the data arrives at an edge location and then is routed to an S3 bucket over an optimized network path.

You will get a distinct URL to upload your files, example: buckt-name.s3-accelerate.amazonaws.com

How is S3 billed

For storage.
For requests.
For storage management.
For data transfer.
For transfer acceleration.
For cross-region replication.

← → Top