- Object storage: flat files
- Objects consists of:
- Key (name of object)
- Value (this is simply the data)
- Version ID (versioning)
- Metadata
- Subresources like access control lists and torrent
- Size: 0 to 5 TBytes
- S3 uses a universal namespace (i.e. unique name)
- Uloaded code get 200 HTTP response.
- A bucket is owned by the AWS account that created it. Bucket ownership is not transferable.
- By default, you can create up to 100 buckets in each of your AWS accounts (maximum is 1000 buckets).
- S3 URL styles are :
- Virtual hosted style:
https://my-bucket.s3.us-west-2.amazonaws.com/puppy.png (bucket-name + s3 + region )
- Path-style access URL **:
https://s3.us-west-2.amazonaws.com/mybucket/puppy.jpg (s3 / bucket-name )
-
Static web site URL: bucket name + s3-website +
-
Legacy global endpoint URL: (has no region)
s3.amazonaws.com
- ** Buckets created after September 30, 2020, will support only virtual hosted-style requests.
- Tiered storage available
- Lifecycle Management
- Versioning
- Encription
- MFA (Multi-Factor Authentication)
- Secure date using Access Control lists and Bucket Policies.
-
S3 standard: 99,99% availability
-
S3 IA (Infrequent Accessed): usecase: disater recovery and backups
-
S3 One Zone -IA: usecase: secondary backup
-
S3 Intelligent Tiering
- Types:
- Frequent access tier: deafult
- Infrequent access tier: objects not accessed for 30 days
- Archive instant access tier: objects not accessed for 90 days
- Archive access tier: configurable from 90 to 700+ days
- Deep archive access tier: configurable from 180 to 700+ days
- Types:
-
S3 Glacier
- Retrievals data options
-
Expedited: urgent request (1-5 minutes). With two options:
- On-demand:
- Provisioned
-
Standard: access archives within 3-5 hours.
-
Bulk retrials: large amount, even petabytes, in a day (5-12 hours)
-
S3 Glacier Types
- Glacier Instant Retrieval: min storage is 90 days
- Glacier Flexible Retrieval:
- Subtypes: Expedited (1-5 minutes), standard ()3-5 hours and Bulk (5-12 hours)
- Glacier Deep Archive
- Subtypes: standard (12 hours) Bulk (48 hours).
- Minimum storage duration of 180 days.
-
- Retrievals data options
S3 types | S3 Standard | S3 Intelligent Tiering | S3 Standard IA | S3 One Zone - IA | S3 Glacier | S3 Glacier Deep Archive |
---|---|---|---|---|---|---|
Durability | 11 9% | 11 9% | 11 9% | 11 9% | 11 9% | 11 9% |
Availability | 99.99% | 99.9% | 99.9% | 99.5% | N/A | N/A |
Availability Zones | >= 3 | >= 3 | >= 3 | 1 | >= 3 | >= 3 |
Minimum storage duration | N/A | 30 days | 30 days | 30 days | 90 days | 180 days |
Comments | Designed to sustain the loss of 2 facilities concurrently | Lower fee than S3 standard but you are charged a retrieval fee. | not used for data resilience as it uses only one AZ | Retrieval times configurable from minutes to hours. | retrieval time of 12 hours is acceptable. |
You can protect your data in transit using:
- SSL/TLS: Secure Socket Layer / Transport Layer Security
- Client side encryption
You can protect your data at rest using:
-
Server side encription:
- S3-Managed Keys (SSE-S3).
- Each object encripted by a unique key.
- As additional safeguard, it encrypts the key itself with a master key that it regularly rotates.
- Uses 256-bit advanced encryption standard (AES-256)
- Customer master keys (CMKs) stored in AWS KMS-Key Management Service (SSE-KMS):
- Add addional separed permissions for use of a CMK.
- Provide audit trail which show when your CMK was used.
- Customer-Provided Keys (SSE-C):
- User manages the encryption keys and S3 manages the encryption, as it writes to disks and decryption when you access your objects.
- S3 does not store the encryption key you provided, only a randomly salted HMAC value of the encryption key to validate future requests.
- For presigned URLs, specify the algorithm using the
x-amz-server-side-encryption-customer-algorithm
request header
- S3-Managed Keys (SSE-S3).
-
Client-side encryption options
- KMS managed encryption keys (CSE-KMS)
- Custome managed master ecryption keys (CSE-C)
- uses strong multi-factor encryption
- encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it rotates regularly
- Uses 256-bit Advanced Encryption Standard (AES-256)
Picture from TutorialDoJo
Pay for what you use:
- GBs per month
- Transfer out of region
- PUT, COPY, POST, LIST and GET requests
Free of charge
- Transfer in to S3
- Transfer out from S3 to CloudFront
Usecases:
- disaster recovery
- backup & restore
- Tiered storage
- On-premise cache & low-latency files access
-
File gateway (NFS & SMB):
- flat files stored directly on S3
- store and retrieve files directly using the NFS version 3 or 4.1 protocol.
- store and retrieve files directly using the SMB file system version, 2 and 3 protocol.
- access the data directly in S3 from any AWS Cloud application or service.
- manage S3 data using lifecycle policies, cross-region replication, and versioning.
- Allows mounting a single folder containing 1,000 TB of data.
-
Volume gateway (iSCSI)
- Stored Volumes:
- data stored on site and assynchronous backed up to S3.
- Gateway-stored volume configuration provides durable and inexpensive off-site backups that you can recover to your local data center or Amazon EC2. For example, if you need replacement capacity for disaster recovery, you can recover the backups to Amazon EC2
- supports iSCSI protocol.
- Gateway-stored volume supports volumes of 512 TB size.
- Cached Volumes:
- data is stored on S3 and the most frequently accessed data is cached on site.
- All gateway-cached volume data and snapshot data is stored in Amazon S3 encrypted at rest using server-side encryption (SSE) and it cannot be accessed with S3 API or any other tools
- Gateway-Cached volumes can support volumes of 1,024TB in size
- Stored Volumes:
- AWS Storage Gateway backup the data in Amazon Storage as incremental EBS snapshots
- AWS Storage Gateway, by default, uploads data using SSL and provides data encryption at rest when stored in S3 or Glacier using AES-256
The Multipart upload API enables you to upload large objects in parts.
Multipart uploading is a three-step process: You initiate the upload, you upload the object parts, and after you have uploaded all the parts, you complete the multipart upload. Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts, and you can then access the object just as you would any other object in your bucket.
As a best practice, we recommend you configure a lifecycle rule (using the AbortIncompleteMultipartUpload action) to minimize your storage costs.
A web browser with block the execution of a script that originates from a server with a different domain name than the webpage. Amazon S3 can be configured with CORS to send HTTP headers that allow the script execution. CORSR defines a way for client web apps that are loaded in one doman to inretact with resources in a different domain.
CORS is configured with a XML document with rules that identify the origns that you will allow to access your bucket, the operations (HTTP methds) that will support for all origin and other operation-specific info.
<CORSConfiguration>
<CORSRule>
<AllowedOrigin>http://www.example1.com</AllowedOrigin>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>DELETE</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
<CORSRule>
<AllowedOrigin>http://www.example2.com</AllowedOrigin>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>DELETE</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
</CORSRule>
</CORSConfiguration>
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.
Cross-region replication (CRR) allows asynchronous copying of objects accross buckets in different AWS regions. CRR is enabled within a bucket level configuration.
Buckets that are configured for bucket replication can be owned by same or different AWS accounts.
Types of object replication, and when to use: - CRR: cross region replication - Meet complicance requirements - Minimize latency - SRR: single region replication - Aggregate logs into a single bucket - configure live replication between production and testing accounts.
With S3 object lock you can stored objects using write-once-read-many (WORM) model. You can use it to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely. It provided two retention modes:
- governance mode: users cannot overwrite or delete an object version or alter its lock settings unless they have special permissions. To override or remove governance model retentionl, a user must have s3:BypassGovernanceRetention.
- compliance mode: protected object version cannot be overwritten or deleted by any user, including the root user in your AWS account.
Object Lock also enables you to place a legal hold on an object version. Like a retention period, a legal hold prevents an object version from being overwritten or deleted. However, a legal hold doesn't have an associated retention period and remains in effect until removed. Legal holds can be freely placed and removed by any user who has the s3:PutObjectLegalHold permission.
- publish notifications for the following events
- Object created, removed, restored, replicated
- S3 lifecycle expiration, object tagging
- event types supported are SQS, SNS, Lambda and EventBridge
- Global endpoint that span S3 bbuckets in multiple AWS regions
- Dynamically route requests to the nearest S3 bucket (lowest latency)
- bi-direction bucket replication
- failover controls: active-active or active-passive
- Uses AWS lambda function to change an object before it is retrieved by the application (e.g. analytics application).
- Example (see picture below): Analytic app request the redacted object from S3 object lambda access point, which call a "redaction lambda function". The reacted lambda function access the S3 bucket using the S3 access point.
- Usecases:
- Redacting PII
- Converting accross data formats
- Resiizing and watermarking images.
S3 Pre-signed URLs vs CloudFront Signed URLs vs Origin Access Identity (OAI) vs Origin Access Control (OAC)
-
All S3 buckets and objects by default are private.
-
Pre-signed URLs use the owner´s security credentials to grant other time-limited (1 hours) permission to download or upload objects.
-
Steps: security credential, S3 bucket name, object key, specific HTTP method, expiration date and time of the URL
-
Usecases:
- allow only logged-in users to download a premium video on your S3 bucket
- allow ever changing list of users to download files by generating URLs dynamically
- allow temporary a user to upload a file to a bucket location.
- Control user access to private content through:
- restric access to files in cloudfront edge caches
- restric access to files in your S3 bucket
- Cloudfront can require users to access files using ether signed URLs or signed cookies.
- signed URLs or signed cookies can be used for any origin such as S3 or HTTP server.
- S3 bucket can be configure as origin of a cloudfront distribution. S3 files can only be access through cloudfront URL.
- Steps: create cloudfront user called origin access identity. Give OAI permission to read files in S3. Remove permission from anyone else.
- You cannot set OAI if S3 is configured as a website endpoint.
- OAC is more preferable was to restrict S3 origin.
- Enable secure access to S3 files through cloudfront by permitting only designated cloudfront distribution to access their S3.
- AWS signature version 4 (SigV4) can be enabled on cloudfront requests to S3 buckets.
- Server-side encryption with AWS KMS (SSE-KMS) can also be enabled when performing uploads and downloads through cloud front distribution.
S3 Access types:
-
User based: IAM policies
-
Resource based:
-
allow cross account access
-
Object Access Control list (ACL) - finer grain
-
Bucket Access Control List (ACL) - less grain.
-
S3 Bucket Policies:
- Grant access to another account
- Conditions:
- SourceIP: public IP or Elastic IP | VpcSourceIP: private IP (through VPC endpoints)
- SourceVPC or SourceVPCENdpoint - only works with VPC endpoints
- CloudFront Origin Identity
- MFA