Amazon S3: The Definitive Guide to AWS Object Storage

What Is Amazon S3?

Ultimately, every organization generates data — and Amazon S3 has become the default place to store it. Indeed, customer transactions, application logs, product images, machine learning datasets, compliance records, video assets — the list grows faster than most teams can manage. Consequently, storing that data reliably, securing it against breaches, accessing it at millisecond speeds, and keeping costs under control is a challenge that has consumed IT budgets for decades.

Amazon S3 (Amazon Simple Storage Service) is a fully managed object storage service built by Amazon Web Services to solve exactly this problem — at any scale. Specifically, it lets you store and retrieve any amount of data, at any time, from anywhere on the internet. Whether you are backing up a few gigabytes of documents, hosting a static website, building a multi-petabyte data lake for analytics, or training a large language model on terabytes of text, Amazon S3 scales with you. Furthermore, there are no servers to provision, no disks to manage, no capacity to plan.

Launched on March 14, 2006 — Pi Day, for the trivia-inclined — Amazon S3 was the very first AWS service made generally available to the public. In fact, it predates EC2 by five months. Since then, in the nearly two decades since, it has grown from a simple storage API into the foundational data layer of modern cloud architecture. Notably, nearly every significant AWS service integrates with S3 natively, and the S3 API has become a de facto industry standard that even competing cloud providers and third-party storage vendors support.

500T+ objects

Stored in Amazon S3

200M req/sec

Requests Handled

11 Nines

Durability (99.999999999%)

Amazon S3 by the Numbers

Importantly, those numbers are not marketing flourishes. As of 2025, Amazon S3 stores over 500 trillion objects across hundreds of exabytes, handles 200 million requests per second, and peaks at approximately 1 petabyte per second in bandwidth. To illustrate, if each stored object if each stored object were a grain of sand, you would have enough to fill over 1,600 Olympic swimming pools.

The enterprises that depend on S3 read like a list of the world’s most demanding technology users. For instance, Netflix stores over 2 exabytes of streaming content on S3. Similarly, Pinterest manages nearly 1 exabyte across 300 billion+ Pins. Likewise, Reddit, Airbnb, Monzo Bank, and thousands of government agencies all run their data layer on S3. According to Statista, Amazon S3 held a 22.98% share of the global enterprise data storage software market in 2024 — more than its next two competitors combined. A separate analysis by 6sense found over 1.189 million companies using S3 globally in 2026.

Within the AWS ecosystem, S3 occupies a gravitational position. Specifically, it integrates natively with over 100 AWS services — from compute services like Amazon EC2 and AWS Lambda, to analytics engines like Amazon Athena, EMR, and Glue, to machine learning platforms like SageMaker and Bedrock. Therefore, if you are building on AWS, you are almost certainly using S3. Clearly, understanding it deeply is not optional — it is a prerequisite.

Key Takeaway

Amazon S3 is not just a storage service — it is the foundational data layer of modern cloud architecture. Understanding its architecture, storage classes, pricing model, and security controls is what separates cloud practitioners from cloud experts.

How Amazon S3 Works

Before you can use Amazon S3 effectively, you need to understand its underlying architecture. Unlike traditional file systems that organize data in directories and hierarchies, S3 uses a flat object storage model. Importantly, this is not just a technical distinction — it fundamentally affects how you design applications, manage access control, optimize performance, and control costs.

Objects, Buckets, and Keys

Essentially, Amazon S3 organizes data into three fundamental components that work together:

Objects: The actual data you store — a file plus its metadata. For instance, an object can be anything: a JPEG image, a CSV dataset, a 4K video file, a database backup, a Parquet file for analytics, or a trained machine learning model. Each object consists of the data itself, a set of system-defined metadata (content type, creation date, storage class), and optional user-defined metadata (custom key-value pairs you attach for application logic). Notably, individual objects can now be up to 50 TB in size — a 10x increase from the previous 5 TB limit, announced at AWS re:Invent 2025.
Buckets: Containers that hold objects. Importantly, every object in S3 lives inside a bucket. Essentially, think of a bucket as a top-level namespace or a root folder. Furthermore, bucket names are globally unique across all AWS accounts — no two buckets anywhere in the world can share the same name. Importantly, each bucket is created in a specific AWS Region, and there is no limit to the number of objects a bucket can hold. You can have up to 100 buckets per AWS account by default.
Keys: The unique identifier for each object within a bucket. For example, a key like reports/2026/april/sales.csv looks like a folder path, but S3 has no true directory hierarchy. It is a flat namespace. The forward slashes in the key are simply part of the string, and the AWS Management Console renders them as a folder-like structure purely for convenience. Importantly, understanding this distinction matters because operations like “listing all files in a folder” are actually prefix-filtered list operations on the entire bucket — which has performance and cost implications at scale.

Object Addressing and URLs

Every object in S3 is uniquely addressable via a URL structured as: https://bucket-name.s3.region.amazonaws.com/key. As a result, this makes S3 inherently web-accessible — a property that powers everything from static website hosting to API-driven data pipelines to content delivery networks.

Regions and Availability Zones

When you create a bucket, you choose an AWS Region — a physical geographic area where AWS operates data centers. Examples include us-east-1 (Northern Virginia), eu-west-1 (Ireland), ap-south-1 (Mumbai), and me-central-1 (UAE). As of April 2026, AWS operates 37 Regions globally with 117 Availability Zones.

Within your chosen Region, S3 automatically replicates your data across a minimum of three Availability Zones (AZs). Essentially, each AZ is a physically separate data center — or cluster of data centers — with independent power supplies, cooling systems, and networking infrastructure. Although they are connected to each other via low-latency, high-bandwidth private fiber, they are far enough apart geographically to ensure that a localized event (a fire, a flood, a power grid failure) in one AZ does not affect the others.

This multi-AZ replication is what delivers S3’s legendary eleven nines (99.999999999%) durability. In practical terms, if you store 10 million objects in S3, you can statistically expect to lose a single object once every 10,000 years. Your data is not going anywhere.

Region Selection Matters More Than You Think

Choose the Region closest to your end users or your primary compute resources to minimize latency and data transfer costs. Equally important: data stored in a Region never leaves that Region unless you explicitly replicate it elsewhere. This is critical for compliance with data residency regulations such as the EU’s GDPR, Singapore’s PDPA, India’s DPDP Act, or the UAE’s data protection laws. Choosing the wrong Region can create compliance violations that are expensive to remediate.

Data Consistency Model

As of December 2020, Amazon S3 delivers strong read-after-write consistency for all operations — including PUTs of new objects, overwrite PUTs, and DELETEs. This means that after a successful write operation completes, any subsequent read request will immediately return the latest version of the object. There is no lag, no propagation delay, and no risk of reading stale data.

This was a landmark architectural change. Previously, in S3’s first 14 years, overwrite PUTs and DELETEs were only eventually consistent, meaning that for a brief window after an overwrite or delete, some read requests might still return the old version. Consequently, this behavior forced application developers to build complex workarounds — Netflix, for example, built a tool called S3mper that stored filesystem metadata in DynamoDB specifically to compensate for S3’s eventual consistency.

As a result, the move to strong consistency eliminated an entire class of bugs and race conditions. Moreover, it made S3 suitable for workloads that previously required block storage or database-backed solutions: version-controlled document systems, metadata catalogs, configuration stores, and even lightweight transactional patterns. If you read old articles or documentation warning about S3 eventual consistency, that information is now outdated.

How Data Flows Through S3

When you upload an object to S3, the following happens behind the scenes. First, your application sends an HTTP PUT request to the S3 endpoint. Next, S3 receives the object, calculates checksums for integrity verification, encrypts it (using your configured encryption method), and replicates it across a minimum of three Availability Zones within the Region. Finally, once replication is complete and durability is guaranteed, S3 returns an HTTP 200 response confirming a successful upload.

When you read that object, S3 routes the GET request to the nearest available copy, decrypts it, verifies the checksum, and streams the data back to your application. Additionally, for large objects, S3 supports byte-range fetches — you can request specific byte ranges of an object rather than downloading the entire file, which is particularly invaluable for video streaming, database page retrieval, and parallel data processing.

Amazon S3 Storage Classes Explained

Obviously, not all data is accessed equally. A product image on your e-commerce website gets requested thousands of times per day. Meanwhile, a quarterly compliance report might be read once a year. That seven-year-old audit log sitting in your account? It may never be accessed again. Treating all of this data the same way — storing it all in the same storage tier at the same price — is a costly mistake that inflates your AWS bill unnecessarily.

Amazon S3 recognizes this reality and offers nine distinct storage classes, each optimized for a specific access pattern, performance requirement, and cost profile. Undoubtedly, choosing the right storage class is one of the highest-impact decisions you will make with S3. In fact, according to industry analyses, it can mean the difference between S3 representing 2% or 15% of your total AWS bill for the exact same dataset.

Standard Storage Class

Essentially, this is the default storage class. Designed for data you access frequently and need available with low-latency, high-throughput performance. This is where your active application assets, user-facing content, real-time datasets, and CI/CD artifacts should live.

Currently, pricing starts at $0.023/GB-month for the first 50 TB in us-east-1, with volume discounts kicking in at the 50 TB and 500 TB thresholds. Durability is 99.999999999% (eleven nines) and availability is 99.99%. Furthermore, S3 Standard stores data redundantly across a minimum of three Availability Zones.

Most teams default everything to S3 Standard because it is the path of least resistance. While that works fine for small datasets, once your storage grows past a few terabytes, the cost difference between Standard and the right alternative storage class becomes significant — often amounting to thousands of dollars per month.

Intelligent-Tiering Storage Class

The “set it and forget it” storage class. If you have data with unpredictable or changing access patterns and do not want to manage lifecycle policies manually, Intelligent-Tiering is the answer.

Specifically, it automatically moves objects between multiple access tiers based on actual usage patterns — with zero retrieval fees and no operational overhead. The tiering works across five levels. Initially, objects start in the Frequent Access tier. After 30 consecutive days without access, they move to Infrequent Access (saving ~40%). By 90 days, they move to Archive Instant Access (saving ~68%). You can optionally enable asynchronous Archive Access (90–730 days) and Deep Archive Access (180+ days) tiers for additional savings.

The only additional cost is a small monitoring and automation fee of $0.0025 per 1,000 objects per month. There are no retrieval fees when objects move back to the Frequent Access tier — consequently, this is a major advantage over Standard-IA, where every retrieval incurs a per-GB charge.

According to AWS’s March 2026 twentieth-anniversary post, S3 Intelligent-Tiering has collectively saved customers over $6 billion compared to what they would have spent on S3 Standard. That is not a theoretical projection — it is measured savings across real customer workloads.

Express One Zone Storage Class

Next up is the performance tier, purpose-built for the most latency-sensitive workloads in your environment. Express One Zone stores data in a single Availability Zone that you select, delivering consistent single-digit millisecond latency — up to 10x faster than S3 Standard — and 50% lower request costs.

Unlike standard S3 buckets, Express One Zone uses a different infrastructure called “directory buckets” with modified APIs optimized for high-throughput, low-latency operations. Additionally, the pricing structure is also different: storage costs approximately $0.16/GB-month (significantly higher than Standard), but request costs run roughly 50% cheaper per operation.

Express One Zone was designed for specific high-performance workloads. For instance, Amazon Athena queries against Express-backed buckets complete 2–3x faster than equivalent queries against Standard S3. Similarly, SageMaker training jobs experience 40–60% reduced training times when reading from Express-colocated storage due to eliminated I/O wait. AWS Glue and EMR jobs show similar improvements for iterative data processing.

Single-AZ Risk

Express One Zone stores data in only one Availability Zone. If that AZ experiences an outage, your data is temporarily unavailable. Use this class for performance-critical intermediate data, ML training datasets that can be regenerated, and analytics scratch space — not for your only copy of mission-critical records. Always keep a durable copy of source data in a multi-AZ storage class.

Infrequent Access Storage Classes

Standard-Infrequent Access (Standard-IA) is designed for data that you do not access regularly but need available with rapid performance when you do. Think disaster recovery backups, older application logs, or previous-quarter financial data. It offers the same low-latency, high-throughput performance as S3 Standard but at a significantly lower per-GB storage cost of approximately $0.0125/GB-month — a 46% saving over Standard.

However, the trade-off is a per-GB retrieval fee ($0.01/GB) and a minimum storage duration charge of 30 days. If you delete or transition an object before 30 days, you still pay for the full 30 days. There is also a minimum object size charge of 128 KB — objects smaller than 128 KB are billed as if they were 128 KB.

One Zone-IA correspondingly costs approximately 20% less than Standard-IA by storing data in a single Availability Zone instead of three. This makes it suitable for easily reproducible data such as thumbnail images that can be regenerated from originals, secondary backup copies where the primary copy exists elsewhere, or cross-region replicated data where the source Region holds the authoritative copy.

Glacier Archive Classes: Instant Retrieval, Flexible Retrieval, and Deep Archive

Beyond the IA classes, the Glacier family is designed for long-term archival storage at the lowest possible cost. These are the storage classes you use for data that must be retained — for compliance, legal, or business reasons — but is rarely or never accessed.

First, Glacier Instant Retrieval: Archive storage with millisecond access. Ideal for data you access approximately once per quarter but need available immediately when you do — medical imaging archives, news media asset libraries, or genomics research data. Storage costs approximately $0.004/GB-month, an 82% saving over Standard. Same retrieval latency as S3 Standard, but with higher per-GB retrieval fees.
Next, Glacier Flexible Retrieval: For data that does not need immediate access. Offers three retrieval speed options: Expedited (1–5 minutes, costs ~$0.03/GB), Standard (3–5 hours, costs ~$0.01/GB), and Bulk (5–12 hours, costs ~$0.0025/GB). Storage costs approximately $0.0036/GB-month. This is the sweet spot for compliance archives, historical datasets, and digital preservation workflows where retrieval time is flexible.
Finally, Glacier Deep Archive: The cheapest storage class in all of AWS at approximately $0.00099/GB-month — less than one-tenth of a cent per gigabyte. Retrieval takes 12–48 hours. Designed for data you must retain for years but may never access again: seven-year financial records for SEC compliance, legal hold documents, long-term scientific data archives. Companies like Nasdaq use Glacier Deep Archive for regulatory data retention.

How to Choose the Right Storage Class

To help you decide, the table below compares all nine storage classes across the dimensions that matter most for production decisions:

Storage Class	Access Pattern	Retrieval Latency	Durability	AZs	Storage Cost (GB/mo)	Retrieval Fee	Best For
S3 Standard	Frequent	Milliseconds	11 nines	≥ 3	$0.023	None	Active application data, websites
Intelligent-Tiering	Changing/Unknown	Milliseconds	11 nines	≥ 3	$0.023 (auto-tiered)	None	Unpredictable access patterns
Express One Zone	Ultra-frequent	Single-digit ms	11 nines	1	$0.16	None	ML training, real-time analytics
Standard-IA	Infrequent	Milliseconds	11 nines	≥ 3	$0.0125	$0.01/GB	Backups, DR copies
One Zone-IA	Infrequent	Milliseconds	11 nines	1	$0.01	$0.01/GB	Reproducible infrequent data
Glacier Instant	Quarterly	Milliseconds	11 nines	≥ 3	$0.004	$0.03/GB	Medical imaging, media archives
Glacier Flexible	1–2x per year	Minutes to hours	11 nines	≥ 3	$0.0036	$0.01/GB (Std)	Compliance archives
Glacier Deep Archive	Rarely/Never	12–48 hours	11 nines	≥ 3	$0.00099	$0.02/GB	Long-term regulatory retention

Practitioner’s Decision Framework

Unsure about your access patterns? Start with S3 Intelligent-Tiering. It automatically optimizes costs without retrieval fees and eliminates the risk of choosing the wrong class. For known, stable access patterns, select the class that matches your workload. With data that has clear lifecycle stages (active → infrequent → archive), use lifecycle policies to automate transitions between Standard → IA → Glacier based on object age.

Need Expert Guidance?

Let our AWS-certified architects design your S3 storage strategy

Amazon S3 Pricing Breakdown

On the surface, S3 pricing looks deceptively simple on the surface. However, it is not. The service charges across six independent dimensions simultaneously, and most teams only think about the first one — storage. Meanwhile, the other five quietly accumulate in the background until someone notices the bill is 3x higher than expected.

Therefore, understanding all six dimensions is what separates a manageable S3 bill from a shocking one. Let us break them down.

Storage Costs

Specifically, storage is billed per GB-month based on your selected storage class. Rates range from $0.00099/GB-month for Glacier Deep Archive to $0.16/GB-month for Express One Zone — a 160x spread between the cheapest and most expensive tiers. Currently, S3 Standard, the default, costs $0.023/GB-month for the first 50 TB in us-east-1, with tiered discounts at 50 TB ($0.022) and 500 TB ($0.021).

Importantly, storage billing calculates your daily average usage and bills monthly. In practice, 100 GB stored for 15 days costs approximately $0.0115 — not the full monthly rate. This nuance matters for data processing pipelines that temporarily stage large datasets, process them, and delete them within days.

Request and Data Retrieval Costs

Additionally, every API call to S3 incurs a request charge. For S3 Standard, PUT, COPY, POST, and LIST requests cost $0.005 per 1,000 requests. GET and SELECT requests cost $0.0004 per 1,000. Admittedly, these amounts seem negligible in isolation, but they compound quickly.

Consider, for example, a data pipeline that lists and reads 10 million objects daily. That translates to 10,000 LIST request batches ($0.05/day) plus 10 million GET requests ($4.00/day) — over $120/month just in request charges, before a single byte of storage is counted. Furthermore, for IA and Glacier classes, retrieving data incurs additional per-GB charges on top of request fees, making frequent access to these tiers more expensive than Standard.

Transfer and Egress Costs

Data transfer into S3 is always free. Likewise, data transfer between S3 and other AWS services in the same Region is also free. However, the cost hits when data leaves AWS — either out to the internet or across Regions.

Fortunately, the first 100 GB/month of internet-bound egress is free (aggregated across all AWS services). Beyond that, egress costs $0.09/GB for the next 10 TB, $0.085/GB for the next 40 TB, $0.07/GB for the next 100 TB, and $0.05/GB beyond 150 TB. To illustrate, for a company serving 10 TB/month of content directly from S3 to internet users, that is approximately $920/month in egress charges alone.

The CloudFront Trick Most Teams Miss

Data transfer from S3 to Amazon CloudFront in the same Region is completely free. If you serve content directly from S3 to the internet, you pay full egress pricing. Putting CloudFront in front of S3 eliminates S3 egress charges entirely — and CloudFront’s own egress pricing is often cheaper than direct S3 egress. For the 10 TB/month scenario above, routing through CloudFront eliminates the $920 S3 egress bill and replaces it with approximately $850 in CloudFront distribution costs — while also adding global edge caching that reduces origin requests by 80–95% and dramatically improves end-user latency.

Management and Analytics Costs

S3 also charges for optional management features. For example, S3 Inventory reports cost $0.0025 per million objects listed. Similarly, Storage Lens advanced metrics cost $0.20 per million objects monitored per month. Meanwhile, Object Tagging evaluations for lifecycle policies incur costs based on the number of tags evaluated. Individually small, these charges can add up for accounts managing billions of objects.

Versioning and Lifecycle Transition Costs

Enabling versioning is a critical data protection measure. However, it means that every overwrite creates a new version while the old version remains in storage — consequently doubling, tripling, or multiplying your storage footprint over time. Without lifecycle rules to expire non-current versions, versioning costs therefore grow silently month after month.

Similarly, lifecycle transitions incur request charges. For instance, transitioning 100 million objects from Standard to Glacier costs $500 in transition request fees alone ($0.05 per 10,000 lifecycle transition requests) — before you see a single dollar in storage savings. With datasets containing millions of small objects, the math needs to be done carefully to confirm that transitions actually save money.

Cost Optimization Strategies

Based on our experience optimizing S3 costs for enterprise clients, these are the highest-impact levers available:

Implement lifecycle policies aggressively: Automatically transition objects from Standard to IA after 30 days and to Glacier after 90 days. A well-designed lifecycle policy can reduce storage costs by 40–60% for datasets with natural access decay. Even a simple two-rule policy saves more than most teams expect.
Default to S3 Intelligent-Tiering: For datasets where access patterns are unpredictable or variable, Intelligent-Tiering eliminates guesswork. The $0.0025/1,000-object monitoring fee is a fraction of what misclassification costs.
Use CloudFront for all public content: Eliminate S3 egress charges entirely. This is one of the simplest, highest-ROI changes you can make.
Audit versioning regularly: Set lifecycle rules to expire non-current versions after 30–90 days. Configure abort rules for incomplete multipart uploads (which also consume storage indefinitely if not cleaned up).
Enable S3 Storage Lens: The free tier provides 28 days of usage and activity metrics across all your buckets with cost optimization recommendations. The advanced tier ($0.20/million objects/month) provides 15 months of historical data and deeper insights. For most organizations, the free tier is sufficient to identify the biggest optimization opportunities.
Likewise, use S3 Inventory instead of LIST operations: For buckets with millions of objects, using the S3 API to list contents is slow (LIST returns max 1,000 objects per call) and expensive. S3 Inventory generates a daily or weekly report in CSV, ORC, or Parquet format at a fraction of the cost.

Amazon S3 Security and Access Control

S3 security has come a long way since the early days of misconfigured public buckets making headlines on technology news sites. Over time, AWS has progressively tightened defaults, and as of 2026, every new bucket ships with Block Public Access enabled, server-side encryption (SSE-S3) turned on automatically, and ACLs disabled by default. Nevertheless, strong defaults only get you partway. A production-grade S3 security architecture still requires deliberate, layered controls.

IAM Policies and Bucket Policies

Essentially, access to S3 is governed through two complementary policy mechanisms that work together to determine effective permissions:

IAM Policies: Attached to IAM users, groups, or roles. They define what actions a principal (an identity) can perform across AWS services. Use IAM policies for identity-based access control — for example, granting a specific role the ability to read from any bucket in the production account, or allowing a CI/CD pipeline role to push artifacts to designated S3 paths.
Bucket Policies: Attached directly to the bucket resource. They define who can do what on that specific bucket. Use bucket policies for resource-based access control — for example, allowing cross-account access from an analytics account, restricting all access to requests originating from a specific VPC endpoint, or denying unencrypted uploads.

The effective permission for any request is the union of applicable policies, with one critical rule: an explicit deny always overrides any allow. In other words, if an IAM policy grants access but a bucket policy denies it (or vice versa), the deny wins. As a result, this makes it possible to create defense-in-depth security by layering restrictive bucket policies on top of broader IAM permissions.

For the most part, the recommended approach for enterprise setups is to use IAM roles (not users) with least-privilege policies for all programmatic access, and to supplement those with bucket policies for cross-account access, VPC endpoint restrictions, IP allowlisting, and encryption enforcement. Instead, avoid using bucket ACLs — they are a legacy mechanism that AWS actively recommends against.

Server-Side Encryption Options

Without a doubt, all data in S3 should be encrypted at rest. AWS offers three server-side encryption methods, and choosing between them depends on your compliance requirements and key management preferences:

SSE-S3 (S3-Managed Keys): AWS generates, manages, and rotates the encryption keys entirely on your behalf. This is the default for all new buckets. Zero additional cost, zero management overhead, and full transparency — each object is encrypted with a unique key, and that key is itself encrypted with a rotating root key. Suitable for the vast majority of workloads where compliance does not mandate customer-managed keys.
SSE-KMS (KMS-Managed Keys): Uses AWS Key Management Service for key management. This provides an audit trail of every key usage event via CloudTrail, supports automatic annual key rotation, and allows you to control who can decrypt objects independently from who can access them. Adds a cost of $0.03 per 10,000 requests for KMS API calls. Choose SSE-KMS when compliance frameworks (PCI DSS, HIPAA, SOC 2, FedRAMP) require auditable key management or separation of duties between storage administrators and key administrators.
SSE-C (Customer-Provided Keys): You provide your own encryption key with every PUT and GET request. AWS uses it to encrypt/decrypt the object but never stores the key itself. This gives you complete ownership of your encryption keys but adds significant operational complexity — you must manage key storage, rotation, and disaster recovery independently. If you lose the key, your data is unrecoverable.

Recent Encryption Default Changes

April 2026 Security Update

AWS is now deploying a new default bucket security setting that automatically disables SSE-C for all new general purpose buckets. For existing buckets in AWS accounts with no SSE-C encrypted objects, S3 will also disable SSE-C for new write requests. This change strengthens the default security posture by reducing the attack surface of customer-managed key mishandling. Accounts that actively use SSE-C are not affected.

Block Public Access and Access Points

Importantly, S3 Block Public Access is an account-level and bucket-level safety mechanism that overrides any policy or ACL that would otherwise grant public access. It operates as four independent settings that you can enable individually or together: BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, and RestrictPublicBuckets. The recommended practice is to enable all four at the account level and only disable specific settings on specific buckets with documented justification.

Additionally, S3 Access Points simplify permission management for shared datasets. Instead of a single bucket policy that grows to hundreds of lines as more teams and applications need access, you create named access points — each with its own IAM-like policy and optional VPC restriction. For instance, an analytics team gets an access point with read-only permissions restricted to their VPC. Similarly, a data engineering team gets a separate access point with read-write access. Additionally, each access point has its own DNS name and can enforce its own network controls independently. This is particularly valuable for data lake architectures where dozens of consumers access the same underlying bucket.

S3 Object Lock and Compliance

Significantly, S3 Object Lock enables you to store objects using a Write-Once-Read-Many (WORM) model. Once enabled, objects cannot be deleted or overwritten for a specified retention period. This capability is mandatory for organizations in regulated industries.

Specifically, Object Lock operates in two modes. Governance mode allows users with specific IAM permissions to override the lock — useful for testing and operational flexibility. Compliance mode is irrevocable — nobody, including the AWS root account, can delete the object until the retention period expires. Financial services firms use Compliance mode to meet SEC Rule 17a-4, healthcare organizations use it for HIPAA-compliant record retention, and legal departments use it for litigation hold scenarios.

Furthermore, you can also apply Legal Hold independently of retention periods. A legal hold prevents deletion regardless of the retention configuration and stays active until explicitly removed. This is valuable for e-discovery processes where you need to preserve specific objects indefinitely while legal proceedings are ongoing.

Key Amazon S3 Features for Production Workloads

Beyond storage and security, Amazon S3 provides a rich feature set that transforms it from a simple file repository into an intelligent data platform. As a result, these capabilities enable event-driven architectures, automated data management, and performance optimization at scale. Below are the features that matter most in production environments.

Core Data Management Features

Versioning

Maintains every version of every object. Protects against accidental overwrites and deletions. When versioning is enabled, a DELETE operation does not physically remove the object — it places a “delete marker” that hides it, while all previous versions remain recoverable.

Lifecycle Policies

Rules that automate storage class transitions and object expiration based on age, version status, or tags. A single lifecycle rule can transition objects from Standard to IA at 30 days, to Glacier at 90 days, and delete them at 365 days — all without manual intervention.

Cross-Region Replication (CRR)

Automatically and asynchronously replicates objects to a bucket in a different AWS Region. Enables disaster recovery, data sovereignty compliance, and lower-latency access for geographically distributed users. Same-Region Replication (SRR) is also available for in-Region compliance copies.

Event Notifications

Trigger AWS Lambda functions, SQS queues, SNS topics, or EventBridge rules when objects are created, deleted, modified, or restored. This powers event-driven architectures — automatically process an uploaded image, index a new document, or trigger a data pipeline when new data lands.

S3 Select

Retrieve only a subset of data from an object using SQL expressions. Instead of downloading a 10 GB CSV to find 10 rows, S3 Select filters server-side and returns only the matching data — reducing data transfer by up to 400% and dramatically improving query performance for large files.

S3 Batch Operations

Perform bulk operations across millions or billions of objects with a single API request — copy, tag, encrypt, invoke Lambda, restore from Glacier, or modify ACLs. Eliminates the need to write custom orchestration scripts for large-scale data management tasks.

Transfer Acceleration

Notably, S3 Transfer Acceleration uses Amazon CloudFront’s globally distributed edge locations to speed up uploads from distant geographic locations. When enabled, data uploaded to S3 is first routed to the nearest CloudFront edge location, then transferred to your S3 bucket over AWS’s optimized backbone network. This can improve upload speeds by 50–500% for teams uploading from locations far from the bucket’s Region.

Transfer Acceleration is particularly valuable for media companies uploading video content from production locations worldwide, global development teams pushing build artifacts, and any workflow that involves uploading large files across continents. Although it costs an additional $0.04–$0.08 per GB transferred, it only charges if the accelerated path is actually faster — if the regular path would be faster, you pay standard transfer rates instead.

Multipart Upload

For objects larger than 100 MB, AWS recommends multipart upload. Above 5 GB, it is required. Multipart upload breaks the file into independently uploaded parts (each between 5 MB and 5 GB), uploads them in parallel across multiple connections, and assembles them into the final object in S3. This approach provides three key benefits: first, improved throughput through parallelism; second, pause-and-resume capability for interrupted uploads; and third, reduced impact of network failures (if one part fails, you only need to re-upload that part).

# Upload a large file using the AWS CLI (automatically uses multipart for large files)
aws s3 cp large-dataset.tar.gz s3://my-bucket/datasets/ \
  --storage-class INTELLIGENT_TIERING

# Upload with explicit multipart configuration
aws s3 cp huge-backup.tar s3://my-bucket/backups/ \
  --expected-size 53687091200 \
  --storage-class STANDARD_IA

Temporary Access with Pre-Signed URLs

Pre-signed URLs enable you to grant temporary, time-limited access to private S3 objects without sharing your AWS credentials or making the bucket public. Essentially, you generate a URL that includes authentication parameters and an expiration time. As a result, anyone with the URL can upload or download the specified object until the URL expires.

For example, common use cases include allowing users to download invoices from a private bucket via a web application, enabling file uploads from mobile apps without embedding AWS credentials in the client code, and sharing sensitive reports with external partners for a limited window.

# Generate a pre-signed URL valid for 1 hour (3600 seconds)
aws s3 presign s3://my-bucket/reports/q1-financials.pdf \
  --expires-in 3600

What’s New in Amazon S3 (2025–2026)

Undeniably, Amazon S3 continues to evolve at a pace that surprises even long-time AWS users. The 2025–2026 cycle brought some of the most significant changes since the service’s launch, reflecting AWS’s strategy of transforming S3 from a passive storage layer into a compute-adjacent data platform.

Late 2024

S3 Tables (GA)

Native Apache Iceberg table format support, enabling direct analytics queries on structured data in S3 without ETL pipelines. Table-level cost attribution became practical for FinOps teams.

Dec 2025

50 TB Object Size Limit

Maximum individual object size increased 10x — from 5 TB to 50 TB. This enables storing massive single-file datasets, scientific instrument outputs, and uncompressed video masters without splitting them across multiple objects.

Also Dec 2025

S3 Vectors

Native vector storage for AI workloads with sub-100ms query latency. Stores and queries vector embeddings directly in S3, simplifying RAG architectures and reducing the need for standalone vector databases.

Apr 2026

S3 Files (GA)

Native NFS access to S3 buckets with full POSIX semantics, powered by an EFS caching layer. Mount S3 as a file system, access data via NFS and S3 APIs simultaneously. Available in 34 AWS Regions.

Apr 2026

SSE-C Disabled by Default

New default security setting automatically disables customer-provided encryption keys (SSE-C) for new and eligible existing general purpose buckets. Reduces key mismanagement attack surface.

New Feature: Files — NFS Access to Buckets

This is arguably the most transformative S3 feature in years. For as long as S3 has existed, the answer to “can I mount S3 as a file system?” was “sort of, with third-party FUSE hacks that break under load.” However, that changed on April 7, 2026.

Essentially, S3 Files creates a fully managed NFS file system backed by an S3 bucket. Subsequently, applications mount S3 over NFS 4.1/4.2 and use standard file operations — open, read, write, rename, lock — while the underlying data remains in S3. Importantly, both the NFS mount and the S3 API can access the same data simultaneously. Notably, built on Amazon EFS infrastructure, S3 Files maintains a high-performance caching layer for actively used data, delivering sub-millisecond latency on small files while keeping cold data at S3 storage rates.

The implications are significant. First, teams that previously maintained separate file systems alongside S3 — duplicating data and building complex synchronization pipelines — can eliminate that entire layer. Additionally, AI agents can persist memory and share state across pipelines through a mounted file system. Furthermore, ML teams can run data preparation workloads directly on S3 data without staging files. Most importantly, file-based tools and legacy applications that could never work with object storage now work natively, with no code changes required.

New Feature: Vectors for AI Workloads

Launched at re:Invent 2025, S3 Vectors provides native vector storage for AI and machine learning applications. Instead of provisioning and managing a separate vector database (Pinecone, Weaviate, or a self-managed pgvector instance), you can now store and query vector embeddings directly in S3 with sub-100ms latency.

As a result, this simplifies Retrieval Augmented Generation (RAG) architectures by keeping embeddings alongside their source documents in the same storage layer. Moreover, it reduces operational overhead, eliminates a separate infrastructure component, and integrates natively with Amazon Bedrock and SageMaker. For organizations already running their data layer on S3, therefore, adding vector search without introducing a new system is a meaningful simplification.

Amazon S3 Use Cases

Without question, S3’s versatility is unmatched in the cloud storage landscape. Here are the most common production use cases we implement and optimize for our clients:

Data Lakes & Analytics

S3 is the default foundation for AWS data lakes. Store raw, semi-structured, and structured data in S3, query it with Athena, process it with Glue or EMR, model it with S3 Tables (Apache Iceberg), and visualize it with QuickSight — all without moving the data between storage systems.

Backup & Disaster Recovery

Multi-AZ durability, cross-region replication, versioning, and Object Lock combine to provide enterprise-grade DR. Glacier Deep Archive brings long-term archival costs below $1/TB-month. AWS Backup automates backup policies across S3 and other AWS services centrally.

Static Website Hosting

Host static HTML, CSS, JavaScript, and media assets directly from S3 — no web server to manage, patch, or scale. Pair with CloudFront for HTTPS, custom domains, global edge caching, and DDoS protection. Ideal for marketing sites, documentation portals, and single-page applications.

Media Storage & Streaming

Store, transcode, and deliver video, audio, and image assets at any scale. Netflix stores over 2 exabytes of content on S3. Combine with AWS Elemental MediaConvert for transcoding, CloudFront for delivery, and S3 event notifications to trigger processing pipelines automatically.

AI/ML Training Data

Express One Zone provides the low-latency, high-throughput access that ML training jobs demand. Vectors stores embeddings natively for RAG architectures. Files lets ML teams mount training data as a filesystem. Integrates directly with SageMaker, Bedrock, and EMR.

Application Artifacts & Logs

Store CI/CD build artifacts, container images, CloudTrail audit logs, VPC Flow Logs, and application telemetry. Use S3 event notifications to trigger Lambda-based log processing, and lifecycle policies to archive old logs to Glacier automatically.

Amazon S3 vs Azure Blob Storage

If you are evaluating cloud storage across providers — whether for a greenfield deployment, a multi-cloud strategy, or a competitive assessment — here is how Amazon S3 compares with Microsoft Azure’s closest equivalent, Azure Blob Storage:

Feature	Amazon S3	Azure Blob Storage
Storage Model	✓ Object Storage	✓ Object Storage (Blob)
Durability	Yes — 11 nines (99.999999999%)	Yes — Up to 16 nines (LRS/ZRS/GRS/RA-GRS)
Storage Tiers	Yes — 9 Classes	◐ 4 Tiers (Hot, Cool, Cold, Archive)
Automatic Tiering	Yes — Intelligent-Tiering (5 tiers, no retrieval fees)	Yes — Access Tracking + Lifecycle Management
NFS File Access	✓ S3 Files (GA Apr 2026)	Yes — NFS 3.0 via Data Lake Storage Gen2
Native Vector Storage	Yes — S3 Vectors (sub-100ms)	✕ Not available natively
Native Table Format	Yes — S3 Tables (Apache Iceberg)	✕ Requires external services
Max Object Size	✓ 50 TB	Yes — ~190.7 TB (block blob)
Consistency Model	✓ Strong read-after-write	Yes — Strong consistency
API Standard Adoption	Yes — S3 API is the industry standard	◐ Azure-specific REST API
Ecosystem Integration	✓ 100+ AWS services	Yes — Deep Microsoft/Azure integration
Market Share (Enterprise Storage)	✓ 22.98% (2024 leader)	◐ Growing but smaller share

Which One Should You Choose?

Both are excellent, mature object storage services. In most cases, the primary decision factor is your cloud ecosystem: if your infrastructure runs on AWS, S3 is the natural choice with the deepest integration. Conversely, if you are a Microsoft shop running on Azure, Blob Storage integrates seamlessly with Azure services and Microsoft 365. Additionally, for multi-cloud architectures, S3 has an additional advantage — its API has become the de facto industry standard, supported by tools like MinIO, Cloudian, and even Azure’s own S3-compatible gateways.

Where S3 currently leads is in storage class granularity (nine classes versus four tiers), the AI-native features launched in 2025–2026 (S3 Vectors, S3 Tables, S3 Files), and the sheer breadth of its integration ecosystem. On the other hand, Azure Blob Storage counters with deeper integration into the Microsoft enterprise stack (Active Directory, SharePoint, Teams) and higher maximum object sizes for block blobs.

Getting Started with Amazon S3

Surprisingly, you can have your first S3 bucket up and running in under five minutes. Here is a complete walkthrough using both the AWS Management Console and the AWS CLI.

Creating Your First Bucket (Console)

First, log in to the AWS Management Console and navigate to the S3 service. Select Create bucket. Then, enter a globally unique name — lowercase letters, numbers, and hyphens only, between 3 and 63 characters (for example, my-company-assets-2026). Subsequently, select the AWS Region closest to your users or compute resources: us-east-1 for North America, eu-west-1 for Europe, ap-south-1 for India. Importantly, leave Block all public access enabled (the default). Additionally, leave default encryption as SSE-S3. Click Create bucket.

Uploading an Object

Next, click your new bucket name, then click Upload. Simply drag and drop files from your computer or click Add files. Optionally, expand Properties to select a storage class (defaults to S3 Standard). Click Upload. At this point, your file is now durably stored across multiple Availability Zones and accessible via its unique S3 URL.

Configuring Permissions

Then, navigate to your bucket’s Permissions tab. First, verify that Block Public Access is fully enabled. Then, to grant access to specific IAM roles or accounts, click Bucket policy and add a JSON policy. Here is an example granting read and write access to a specific IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAppRoleAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/MyAppRole"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-company-assets-2026",
        "arn:aws:s3:::my-company-assets-2026/*"
      ]
    }
  ]
}

Using the AWS CLI

Alternatively, the AWS CLI provides fast, scriptable access to S3 for automation and daily operations:

# Create a bucket in the Mumbai region
aws s3 mb s3://my-company-data-2026 --region ap-south-1

# Upload a single file
aws s3 cp report.pdf s3://my-company-data-2026/reports/

# Upload with a specific storage class
aws s3 cp archive.zip s3://my-company-data-2026/archives/ \
  --storage-class STANDARD_IA

# Sync an entire local directory to S3
aws s3 sync ./local-data/ s3://my-company-data-2026/backups/ \
  --exclude "*.tmp"

# List objects in a bucket
aws s3 ls s3://my-company-data-2026/reports/ --recursive

# Download a file
aws s3 cp s3://my-company-data-2026/reports/report.pdf ./downloads/

# Generate a pre-signed URL valid for 1 hour
aws s3 presign s3://my-company-data-2026/reports/report.pdf \
  --expires-in 3600

# Remove all objects with a specific prefix
aws s3 rm s3://my-company-data-2026/temp/ --recursive

Amazon S3 Best Practices

Based on our experience designing, deploying, and optimizing S3 architectures for dozens of enterprise clients across industries — from financial services and healthcare to e-commerce and media — these are the practices that consistently separate reliable, cost-effective setups from problematic ones.

Advantages

Virtually unlimited scalability — no capacity planning or provisioning required

Eleven nines durability with multi-AZ replication by default

Nine storage classes for granular cost optimization across any access pattern

Deep native integration with 100+ AWS services

Strong read-after-write consistency for all operations since December 2020

S3 API is the de facto industry standard adopted by third-party tools and competitors

AI-native features (S3 Vectors, S3 Tables, S3 Files) ahead of competitors

Limitations

Multi-dimensional pricing model can lead to surprise bills without monitoring

Not designed for low-latency transactional workloads (use EBS or EFS instead)

Egress costs accumulate fast for data-heavy public-facing applications

Small object overhead — minimum size charges in IA/Glacier classes penalize tiny files

Bucket names are globally unique and immutable — good names are permanently taken

Glacier retrieval times (hours to days) require planning for time-sensitive recoveries

Here are the core best practices every team should implement from day one:

Data Protection Best Practices

First, enable versioning on all production buckets. It protects against accidental overwrites and deletions. As a result, you can recover any previous version of an object instantly. Pair versioning with lifecycle rules to expire non-current versions after 30–90 days and abort incomplete multipart uploads after 7 days to prevent silent storage cost growth.
Additionally, implement lifecycle policies on every bucket. Even a simple two-rule policy — transition to Standard-IA after 30 days, transition to Glacier Flexible Retrieval after 90 days — can consequently reduce storage costs by 40–60% for datasets with natural access decay patterns.

Security Best Practices

Equally important, block public access at both the account level and bucket level. For content that needs to be publicly accessible, instead use Amazon CloudFront with an Origin Access Control (OAC) rather than making the bucket itself public. This approach maintains security while still enabling content delivery.
Furthermore, enable server access logging or CloudTrail data events. You need to know who accessed what, when, and from where — both for security incident response and for compliance auditing. Furthermore, store access logs in a separate dedicated bucket to prevent recursive logging.
Similarly, use VPC endpoints (Gateway endpoints) for S3 access from EC2. As a result, this keeps traffic on the AWS private network, eliminates the need for a NAT gateway or internet gateway, avoids data transfer charges, and moreover improves security by never routing S3 traffic over the public internet.
Lastly, enforce encryption in transit. Specifically, add a bucket policy condition that denies any request where aws:SecureTransport is false. This ensures all data in transit is encrypted via HTTPS and therefore prevents accidental unencrypted transfers.

Cost and Performance Optimization

Moreover, enable S3 Storage Lens. The free tier provides 28 days of usage and activity metrics with cost optimization recommendations across all your buckets and accounts. Consequently, it is the fastest way to identify your biggest cost optimization opportunities.
Use S3 Inventory instead of LIST API calls for large buckets. Listing millions of objects via the API is slow (1,000 objects per response, paginated) and expensive. In contrast, S3 Inventory generates a daily or weekly CSV/ORC/Parquet report of all objects with metadata at a fraction of the cost.
Finally, design key naming for performance. S3 automatically partitions data based on key prefixes for high request rates. Therefore, for write-heavy workloads, distribute writes across different prefixes rather than concentrating them under a single prefix path (for instance, use date-based or hash-based prefixes).

The Bottom Line on Amazon S3

Key Takeaway

Amazon S3 is deceptively simple to start with and deeply complex to master. The difference between a functional S3 setup and a production-grade architecture lies in storage class strategy, multi-layered security controls, lifecycle automation, cost monitoring, and performance optimization. Getting these right from day one saves months of rework and thousands of dollars in avoidable costs. This is exactly the kind of foundational work where having an experienced AWS partner pays for itself many times over.

Ready to Optimize Your Cloud Storage?

Let our AWS experts audit your S3 architecture, cut costs, and strengthen security

Frequently Asked Questions About Amazon S3

Common Questions Answered

What is Amazon S3 used for?

Amazon S3 is used for storing and retrieving any amount of data from anywhere on the internet. The most common use cases include building data lakes for analytics, backup and disaster recovery, static website hosting, media storage and streaming, AI and machine learning training data, and application log storage. Over 1.1 million companies globally rely on S3 as their primary cloud storage layer, including Netflix, Pinterest, Airbnb, and thousands of government agencies.

Is Amazon S3 free to use?

AWS offers a Free Tier that includes 5 GB of S3 Standard storage, 20,000 GET requests, and 2,000 PUT requests per month for the first 12 months after signing up. Beyond the Free Tier, S3 is a pay-as-you-go service with no upfront commitments. S3 Standard storage starts at $0.023 per GB per month in us-east-1, and lower-cost options are available through Infrequent Access (from $0.0125/GB-month) and Glacier storage classes (from $0.00099/GB-month).

What is the difference between Amazon S3 and EBS?

S3 is an object storage service designed for storing unstructured data (files, images, backups, datasets) accessible from anywhere via HTTP APIs. EBS (Elastic Block Store) is a block storage service that provides persistent virtual hard drives attached to a single EC2 instance — designed for databases, file systems, and operating systems. S3 scales infinitely with pay-per-use pricing, while EBS has provisioned capacity that you pay for whether used or not. Use S3 for data that needs to be shared, accessed over the network, or archived long-term. Choose EBS for data that requires low-latency disk I/O on a specific compute instance.

Storage and Pricing Questions

How many storage classes does Amazon S3 have?

Amazon S3 offers nine storage classes: S3 Standard, S3 Intelligent-Tiering, S3 Express One Zone, S3 Standard-IA (Infrequent Access), S3 One Zone-IA, S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive. Each is optimized for a different access pattern and cost profile, with storage costs ranging from $0.00099/GB-month (Deep Archive) to $0.16/GB-month (Express One Zone). Additionally, specialized storage types like S3 Tables (Apache Iceberg) and S3 Vectors (AI embeddings) provide purpose-built data access patterns.

How secure is Amazon S3?

Amazon S3 provides enterprise-grade security by default. As of 2026, all new buckets ship with Block Public Access enabled, server-side encryption (SSE-S3) turned on automatically, ACLs disabled, and SSE-C disabled by default. You can layer additional security using IAM policies, bucket policies, VPC endpoints for private access, KMS-managed encryption keys with audit trails, S3 Object Lock for WORM compliance, access logging via CloudTrail, and Access Points for fine-grained multi-tenant control. S3 is compliant with PCI DSS, HIPAA, FedRAMP, SOC 1/2/3, and ISO 27001.

What is S3 Intelligent-Tiering and when should I use it?

S3 Intelligent-Tiering is a storage class that automatically moves objects between up to five access tiers based on actual usage patterns — from Frequent Access down through Infrequent Access, Archive Instant, Archive, and Deep Archive tiers. Unlike other storage classes, there are no retrieval fees when objects move between tiers. The only additional cost is a monitoring fee of $0.0025 per 1,000 objects per month. Use Intelligent-Tiering when your data access patterns are unpredictable, variable, or unknown — it eliminates guesswork and manual lifecycle management. AWS reports it has saved customers over $6 billion collectively compared to S3 Standard pricing.

Features and Capabilities Questions

What is the maximum file size you can upload to Amazon S3?

As of December 2025 (announced at AWS re:Invent 2025), the maximum individual object size in Amazon S3 is 50 TB — a 10x increase from the previous 5 TB limit. A single PUT operation can upload up to 5 GB. For objects larger than 100 MB, AWS recommends multipart upload, which splits the file into parts (each between 5 MB and 5 GB), uploads them in parallel for faster transfer, and assembles them into the final object in S3. Multipart upload is required for objects over 5 GB and supports up to 10,000 parts per upload.

Can Amazon S3 host a website?

Yes. Amazon S3 can host static websites — HTML, CSS, JavaScript, images, and other client-side assets — directly from a bucket. You enable static website hosting in the bucket properties and configure an index document (e.g., index.html) and an error document (e.g., 404.html). For production use, pair S3 with Amazon CloudFront for HTTPS, custom domains via Route 53, and global edge caching. Note that S3 cannot host server-side applications — use EC2, ECS, Lambda, or App Runner for those.

What is new in Amazon S3 in 2026?

The most significant 2025–2026 updates include: S3 Files (GA April 2026), which provides native NFS access to S3 buckets with full POSIX semantics; S3 Vectors (December 2025), enabling native vector storage for AI workloads with sub-100ms query latency; the maximum object size increase from 5 TB to 50 TB (December 2025); S3 Tables for native Apache Iceberg analytics (December 2024); and a new default security setting disabling SSE-C on new buckets (April 2026). These changes collectively position S3 as a compute-adjacent data platform rather than just a passive storage layer.

Weekly Briefing

Security insights, delivered Tuesdays.

Join 1 million+ security professionals. Practical, vendor-neutral analysis of threats, tools, and architecture decisions.