What Is Amazon S3?
Ultimately, every organization generates data — and Amazon S3 has become the default place to store it. Indeed, customer transactions, application logs, product images, machine learning datasets, compliance records, video assets — the list grows faster than most teams can manage. Consequently, storing that data reliably, securing it against breaches, accessing it at millisecond speeds, and keeping costs under control is a challenge that has consumed IT budgets for decades.
Amazon S3 (Amazon Simple Storage Service) is a fully managed object storage service built by Amazon Web Services to solve exactly this problem — at any scale. Specifically, it lets you store and retrieve any amount of data, at any time, from anywhere on the internet. Whether you are backing up a few gigabytes of documents, hosting a static website, building a multi-petabyte data lake for analytics, or training a large language model on terabytes of text, Amazon S3 scales with you. Furthermore, there are no servers to provision, no disks to manage, no capacity to plan.
Launched on March 14, 2006 — Pi Day, for the trivia-inclined — Amazon S3 was the very first AWS service made generally available to the public. In fact, it predates EC2 by five months. Since then, in the nearly two decades since, it has grown from a simple storage API into the foundational data layer of modern cloud architecture. Notably, nearly every significant AWS service integrates with S3 natively, and the S3 API has become a de facto industry standard that even competing cloud providers and third-party storage vendors support.
Amazon S3 by the Numbers
Importantly, those numbers are not marketing flourishes. As of 2025, Amazon S3 stores over 500 trillion objects across hundreds of exabytes, handles 200 million requests per second, and peaks at approximately 1 petabyte per second in bandwidth. To illustrate, if each stored object if each stored object were a grain of sand, you would have enough to fill over 1,600 Olympic swimming pools.
The enterprises that depend on S3 read like a list of the world’s most demanding technology users. For instance, Netflix stores over 2 exabytes of streaming content on S3. Similarly, Pinterest manages nearly 1 exabyte across 300 billion+ Pins. Likewise, Reddit, Airbnb, Monzo Bank, and thousands of government agencies all run their data layer on S3. According to Statista, Amazon S3 held a 22.98% share of the global enterprise data storage software market in 2024 — more than its next two competitors combined. A separate analysis by 6sense found over 1.189 million companies using S3 globally in 2026.
Within the AWS ecosystem, S3 occupies a gravitational position. Specifically, it integrates natively with over 100 AWS services — from compute services like Amazon EC2 and AWS Lambda, to analytics engines like Amazon Athena, EMR, and Glue, to machine learning platforms like SageMaker and Bedrock. Therefore, if you are building on AWS, you are almost certainly using S3. Clearly, understanding it deeply is not optional — it is a prerequisite.
Amazon S3 is not just a storage service — it is the foundational data layer of modern cloud architecture. Understanding its architecture, storage classes, pricing model, and security controls is what separates cloud practitioners from cloud experts.
How Amazon S3 Works
Before you can use Amazon S3 effectively, you need to understand its underlying architecture. Unlike traditional file systems that organize data in directories and hierarchies, S3 uses a flat object storage model. Importantly, this is not just a technical distinction — it fundamentally affects how you design applications, manage access control, optimize performance, and control costs.
Objects, Buckets, and Keys
Essentially, Amazon S3 organizes data into three fundamental components that work together:
- Objects: The actual data you store — a file plus its metadata. For instance, an object can be anything: a JPEG image, a CSV dataset, a 4K video file, a database backup, a Parquet file for analytics, or a trained machine learning model. Each object consists of the data itself, a set of system-defined metadata (content type, creation date, storage class), and optional user-defined metadata (custom key-value pairs you attach for application logic). Notably, individual objects can now be up to 50 TB in size — a 10x increase from the previous 5 TB limit, announced at AWS re:Invent 2025.
- Buckets: Containers that hold objects. Importantly, every object in S3 lives inside a bucket. Essentially, think of a bucket as a top-level namespace or a root folder. Furthermore, bucket names are globally unique across all AWS accounts — no two buckets anywhere in the world can share the same name. Importantly, each bucket is created in a specific AWS Region, and there is no limit to the number of objects a bucket can hold. You can have up to 100 buckets per AWS account by default.
- Keys: The unique identifier for each object within a bucket. For example, a key like
reports/2026/april/sales.csvlooks like a folder path, but S3 has no true directory hierarchy. It is a flat namespace. The forward slashes in the key are simply part of the string, and the AWS Management Console renders them as a folder-like structure purely for convenience. Importantly, understanding this distinction matters because operations like “listing all files in a folder” are actually prefix-filtered list operations on the entire bucket — which has performance and cost implications at scale.
Object Addressing and URLs
Every object in S3 is uniquely addressable via a URL structured as: https://bucket-name.s3.region.amazonaws.com/key. As a result, this makes S3 inherently web-accessible — a property that powers everything from static website hosting to API-driven data pipelines to content delivery networks.
Regions and Availability Zones
When you create a bucket, you choose an AWS Region — a physical geographic area where AWS operates data centers. Examples include us-east-1 (Northern Virginia), eu-west-1 (Ireland), ap-south-1 (Mumbai), and me-central-1 (UAE). As of April 2026, AWS operates 37 Regions globally with 117 Availability Zones.
Within your chosen Region, S3 automatically replicates your data across a minimum of three Availability Zones (AZs). Essentially, each AZ is a physically separate data center — or cluster of data centers — with independent power supplies, cooling systems, and networking infrastructure. Although they are connected to each other via low-latency, high-bandwidth private fiber, they are far enough apart geographically to ensure that a localized event (a fire, a flood, a power grid failure) in one AZ does not affect the others.
This multi-AZ replication is what delivers S3’s legendary eleven nines (99.999999999%) durability. In practical terms, if you store 10 million objects in S3, you can statistically expect to lose a single object once every 10,000 years. Your data is not going anywhere.
Choose the Region closest to your end users or your primary compute resources to minimize latency and data transfer costs. Equally important: data stored in a Region never leaves that Region unless you explicitly replicate it elsewhere. This is critical for compliance with data residency regulations such as the EU’s GDPR, Singapore’s PDPA, India’s DPDP Act, or the UAE’s data protection laws. Choosing the wrong Region can create compliance violations that are expensive to remediate.
Data Consistency Model
As of December 2020, Amazon S3 delivers strong read-after-write consistency for all operations — including PUTs of new objects, overwrite PUTs, and DELETEs. This means that after a successful write operation completes, any subsequent read request will immediately return the latest version of the object. There is no lag, no propagation delay, and no risk of reading stale data.
This was a landmark architectural change. Previously, in S3’s first 14 years, overwrite PUTs and DELETEs were only eventually consistent, meaning that for a brief window after an overwrite or delete, some read requests might still return the old version. Consequently, this behavior forced application developers to build complex workarounds — Netflix, for example, built a tool called S3mper that stored filesystem metadata in DynamoDB specifically to compensate for S3’s eventual consistency.
As a result, the move to strong consistency eliminated an entire class of bugs and race conditions. Moreover, it made S3 suitable for workloads that previously required block storage or database-backed solutions: version-controlled document systems, metadata catalogs, configuration stores, and even lightweight transactional patterns. If you read old articles or documentation warning about S3 eventual consistency, that information is now outdated.
How Data Flows Through S3
When you upload an object to S3, the following happens behind the scenes. First, your application sends an HTTP PUT request to the S3 endpoint. Next, S3 receives the object, calculates checksums for integrity verification, encrypts it (using your configured encryption method), and replicates it across a minimum of three Availability Zones within the Region. Finally, once replication is complete and durability is guaranteed, S3 returns an HTTP 200 response confirming a successful upload.
When you read that object, S3 routes the GET request to the nearest available copy, decrypts it, verifies the checksum, and streams the data back to your application. Additionally, for large objects, S3 supports byte-range fetches — you can request specific byte ranges of an object rather than downloading the entire file, which is particularly invaluable for video streaming, database page retrieval, and parallel data processing.
Amazon S3 Storage Classes Explained
Obviously, not all data is accessed equally. A product image on your e-commerce website gets requested thousands of times per day. Meanwhile, a quarterly compliance report might be read once a year. That seven-year-old audit log sitting in your account? It may never be accessed again. Treating all of this data the same way — storing it all in the same storage tier at the same price — is a costly mistake that inflates your AWS bill unnecessarily.
Amazon S3 recognizes this reality and offers nine distinct storage classes, each optimized for a specific access pattern, performance requirement, and cost profile. Undoubtedly, choosing the right storage class is one of the highest-impact decisions you will make with S3. In fact, according to industry analyses, it can mean the difference between S3 representing 2% or 15% of your total AWS bill for the exact same dataset.
Standard Storage Class
Essentially, this is the default storage class. Designed for data you access frequently and need available with low-latency, high-throughput performance. This is where your active application assets, user-facing content, real-time datasets, and CI/CD artifacts should live.
Currently, pricing starts at $0.023/GB-month for the first 50 TB in us-east-1, with volume discounts kicking in at the 50 TB and 500 TB thresholds. Durability is 99.999999999% (eleven nines) and availability is 99.99%. Furthermore, S3 Standard stores data redundantly across a minimum of three Availability Zones.
Most teams default everything to S3 Standard because it is the path of least resistance. While that works fine for small datasets, once your storage grows past a few terabytes, the cost difference between Standard and the right alternative storage class becomes significant — often amounting to thousands of dollars per month.
Intelligent-Tiering Storage Class
The “set it and forget it” storage class. If you have data with unpredictable or changing access patterns and do not want to manage lifecycle policies manually, Intelligent-Tiering is the answer.
Specifically, it automatically moves objects between multiple access tiers based on actual usage patterns — with zero retrieval fees and no operational overhead. The tiering works across five levels. Initially, objects start in the Frequent Access tier. After 30 consecutive days without access, they move to Infrequent Access (saving ~40%). By 90 days, they move to Archive Instant Access (saving ~68%). You can optionally enable asynchronous Archive Access (90–730 days) and Deep Archive Access (180+ days) tiers for additional savings.
The only additional cost is a small monitoring and automation fee of $0.0025 per 1,000 objects per month. There are no retrieval fees when objects move back to the Frequent Access tier — consequently, this is a major advantage over Standard-IA, where every retrieval incurs a per-GB charge.
According to AWS’s March 2026 twentieth-anniversary post, S3 Intelligent-Tiering has collectively saved customers over $6 billion compared to what they would have spent on S3 Standard. That is not a theoretical projection — it is measured savings across real customer workloads.
Express One Zone Storage Class
Next up is the performance tier, purpose-built for the most latency-sensitive workloads in your environment. Express One Zone stores data in a single Availability Zone that you select, delivering consistent single-digit millisecond latency — up to 10x faster than S3 Standard — and 50% lower request costs.
Unlike standard S3 buckets, Express One Zone uses a different infrastructure called “directory buckets” with modified APIs optimized for high-throughput, low-latency operations. Additionally, the pricing structure is also different: storage costs approximately $0.16/GB-month (significantly higher than Standard), but request costs run roughly 50% cheaper per operation.
Express One Zone was designed for specific high-performance workloads. For instance, Amazon Athena queries against Express-backed buckets complete 2–3x faster than equivalent queries against Standard S3. Similarly, SageMaker training jobs experience 40–60% reduced training times when reading from Express-colocated storage due to eliminated I/O wait. AWS Glue and EMR jobs show similar improvements for iterative data processing.
Express One Zone stores data in only one Availability Zone. If that AZ experiences an outage, your data is temporarily unavailable. Use this class for performance-critical intermediate data, ML training datasets that can be regenerated, and analytics scratch space — not for your only copy of mission-critical records. Always keep a durable copy of source data in a multi-AZ storage class.
Infrequent Access Storage Classes
Standard-Infrequent Access (Standard-IA) is designed for data that you do not access regularly but need available with rapid performance when you do. Think disaster recovery backups, older application logs, or previous-quarter financial data. It offers the same low-latency, high-throughput performance as S3 Standard but at a significantly lower per-GB storage cost of approximately $0.0125/GB-month — a 46% saving over Standard.
However, the trade-off is a per-GB retrieval fee ($0.01/GB) and a minimum storage duration charge of 30 days. If you delete or transition an object before 30 days, you still pay for the full 30 days. There is also a minimum object size charge of 128 KB — objects smaller than 128 KB are billed as if they were 128 KB.
One Zone-IA correspondingly costs approximately 20% less than Standard-IA by storing data in a single Availability Zone instead of three. This makes it suitable for easily reproducible data such as thumbnail images that can be regenerated from originals, secondary backup copies where the primary copy exists elsewhere, or cross-region replicated data where the source Region holds the authoritative copy.
Glacier Archive Classes: Instant Retrieval, Flexible Retrieval, and Deep Archive
Beyond the IA classes, the Glacier family is designed for long-term archival storage at the lowest possible cost. These are the storage classes you use for data that must be retained — for compliance, legal, or business reasons — but is rarely or never accessed.
- First, Glacier Instant Retrieval: Archive storage with millisecond access. Ideal for data you access approximately once per quarter but need available immediately when you do — medical imaging archives, news media asset libraries, or genomics research data. Storage costs approximately $0.004/GB-month, an 82% saving over Standard. Same retrieval latency as S3 Standard, but with higher per-GB retrieval fees.
- Next, Glacier Flexible Retrieval: For data that does not need immediate access. Offers three retrieval speed options: Expedited (1–5 minutes, costs ~$0.03/GB), Standard (3–5 hours, costs ~$0.01/GB), and Bulk (5–12 hours, costs ~$0.0025/GB). Storage costs approximately $0.0036/GB-month. This is the sweet spot for compliance archives, historical datasets, and digital preservation workflows where retrieval time is flexible.
- Finally, Glacier Deep Archive: The cheapest storage class in all of AWS at approximately $0.00099/GB-month — less than one-tenth of a cent per gigabyte. Retrieval takes 12–48 hours. Designed for data you must retain for years but may never access again: seven-year financial records for SEC compliance, legal hold documents, long-term scientific data archives. Companies like Nasdaq use Glacier Deep Archive for regulatory data retention.
How to Choose the Right Storage Class
To help you decide, the table below compares all nine storage classes across the dimensions that matter most for production decisions:
| Storage Class | Access Pattern | Retrieval Latency | Durability | AZs | Storage Cost (GB/mo) | Retrieval Fee | Best For |
|---|---|---|---|---|---|---|---|
| S3 Standard | Frequent | Milliseconds | 11 nines | ≥ 3 | $0.023 | None | Active application data, websites |
| Intelligent-Tiering | Changing/Unknown | Milliseconds | 11 nines | ≥ 3 | $0.023 (auto-tiered) | None | Unpredictable access patterns |
| Express One Zone | Ultra-frequent | Single-digit ms | 11 nines | 1 | $0.16 | None | ML training, real-time analytics |
| Standard-IA | Infrequent | Milliseconds | 11 nines | ≥ 3 | $0.0125 | $0.01/GB | Backups, DR copies |
| One Zone-IA | Infrequent | Milliseconds | 11 nines | 1 | $0.01 | $0.01/GB | Reproducible infrequent data |
| Glacier Instant | Quarterly | Milliseconds | 11 nines | ≥ 3 | $0.004 | $0.03/GB | Medical imaging, media archives |
| Glacier Flexible | 1–2x per year | Minutes to hours | 11 nines | ≥ 3 | $0.0036 | $0.01/GB (Std) | Compliance archives |
| Glacier Deep Archive | Rarely/Never | 12–48 hours | 11 nines | ≥ 3 | $0.00099 | $0.02/GB | Long-term regulatory retention |
Unsure about your access patterns? Start with S3 Intelligent-Tiering. It automatically optimizes costs without retrieval fees and eliminates the risk of choosing the wrong class. For known, stable access patterns, select the class that matches your workload. With data that has clear lifecycle stages (active → infrequent → archive), use lifecycle policies to automate transitions between Standard → IA → Glacier based on object age.
Amazon S3 Pricing Breakdown
On the surface, S3 pricing looks deceptively simple on the surface. However, it is not. The service charges across six independent dimensions simultaneously, and most teams only think about the first one — storage. Meanwhile, the other five quietly accumulate in the background until someone notices the bill is 3x higher than expected.
Therefore, understanding all six dimensions is what separates a manageable S3 bill from a shocking one. Let us break them down.
Storage Costs
Specifically, storage is billed per GB-month based on your selected storage class. Rates range from $0.00099/GB-month for Glacier Deep Archive to $0.16/GB-month for Express One Zone — a 160x spread between the cheapest and most expensive tiers. Currently, S3 Standard, the default, costs $0.023/GB-month for the first 50 TB in us-east-1, with tiered discounts at 50 TB ($0.022) and 500 TB ($0.021).
Importantly, storage billing calculates your daily average usage and bills monthly. In practice, 100 GB stored for 15 days costs approximately $0.0115 — not the full monthly rate. This nuance matters for data processing pipelines that temporarily stage large datasets, process them, and delete them within days.
Request and Data Retrieval Costs
Additionally, every API call to S3 incurs a request charge. For S3 Standard, PUT, COPY, POST, and LIST requests cost $0.005 per 1,000 requests. GET and SELECT requests cost $0.0004 per 1,000. Admittedly, these amounts seem negligible in isolation, but they compound quickly.
Consider, for example, a data pipeline that lists and reads 10 million objects daily. That translates to 10,000 LIST request batches ($0.05/day) plus 10 million GET requests ($4.00/day) — over $120/month just in request charges, before a single byte of storage is counted. Furthermore, for IA and Glacier classes, retrieving data incurs additional per-GB charges on top of request fees, making frequent access to these tiers more expensive than Standard.
Transfer and Egress Costs
Data transfer into S3 is always free. Likewise, data transfer between S3 and other AWS services in the same Region is also free. However, the cost hits when data leaves AWS — either out to the internet or across Regions.
Fortunately, the first 100 GB/month of internet-bound egress is free (aggregated across all AWS services). Beyond that, egress costs $0.09/GB for the next 10 TB, $0.085/GB for the next 40 TB, $0.07/GB for the next 100 TB, and $0.05/GB beyond 150 TB. To illustrate, for a company serving 10 TB/month of content directly from S3 to internet users, that is approximately $920/month in egress charges alone.
Data transfer from S3 to Amazon CloudFront in the same Region is completely free. If you serve content directly from S3 to the internet, you pay full egress pricing. Putting CloudFront in front of S3 eliminates S3 egress charges entirely — and CloudFront’s own egress pricing is often cheaper than direct S3 egress. For the 10 TB/month scenario above, routing through CloudFront eliminates the $920 S3 egress bill and replaces it with approximately $850 in CloudFront distribution costs — while also adding global edge caching that reduces origin requests by 80–95% and dramatically improves end-user latency.
Management and Analytics Costs
S3 also charges for optional management features. For example, S3 Inventory reports cost $0.0025 per million objects listed. Similarly, Storage Lens advanced metrics cost $0.20 per million objects monitored per month. Meanwhile, Object Tagging evaluations for lifecycle policies incur costs based on the number of tags evaluated. Individually small, these charges can add up for accounts managing billions of objects.
Versioning and Lifecycle Transition Costs
Enabling versioning is a critical data protection measure. However, it means that every overwrite creates a new version while the old version remains in storage — consequently doubling, tripling, or multiplying your storage footprint over time. Without lifecycle rules to expire non-current versions, versioning costs therefore grow silently month after month.
Similarly, lifecycle transitions incur request charges. For instance, transitioning 100 million objects from Standard to Glacier costs $500 in transition request fees alone ($0.05 per 10,000 lifecycle transition requests) — before you see a single dollar in storage savings. With datasets containing millions of small objects, the math needs to be done carefully to confirm that transitions actually save money.
Cost Optimization Strategies
Based on our experience optimizing S3 costs for enterprise clients, these are the highest-impact levers available:
- Implement lifecycle policies aggressively: Automatically transition objects from Standard to IA after 30 days and to Glacier after 90 days. A well-designed lifecycle policy can reduce storage costs by 40–60% for datasets with natural access decay. Even a simple two-rule policy saves more than most teams expect.
- Default to S3 Intelligent-Tiering: For datasets where access patterns are unpredictable or variable, Intelligent-Tiering eliminates guesswork. The $0.0025/1,000-object monitoring fee is a fraction of what misclassification costs.
- Use CloudFront for all public content: Eliminate S3 egress charges entirely. This is one of the simplest, highest-ROI changes you can make.
- Audit versioning regularly: Set lifecycle rules to expire non-current versions after 30–90 days. Configure abort rules for incomplete multipart uploads (which also consume storage indefinitely if not cleaned up).
- Enable S3 Storage Lens: The free tier provides 28 days of usage and activity metrics across all your buckets with cost optimization recommendations. The advanced tier ($0.20/million objects/month) provides 15 months of historical data and deeper insights. For most organizations, the free tier is sufficient to identify the biggest optimization opportunities.
- Likewise, use S3 Inventory instead of LIST operations: For buckets with millions of objects, using the S3 API to list contents is slow (LIST returns max 1,000 objects per call) and expensive. S3 Inventory generates a daily or weekly report in CSV, ORC, or Parquet format at a fraction of the cost.
Amazon S3 Security and Access Control
S3 security has come a long way since the early days of misconfigured public buckets making headlines on technology news sites. Over time, AWS has progressively tightened defaults, and as of 2026, every new bucket ships with Block Public Access enabled, server-side encryption (SSE-S3) turned on automatically, and ACLs disabled by default. Nevertheless, strong defaults only get you partway. A production-grade S3 security architecture still requires deliberate, layered controls.
IAM Policies and Bucket Policies
Essentially, access to S3 is governed through two complementary policy mechanisms that work together to determine effective permissions:
- IAM Policies: Attached to IAM users, groups, or roles. They define what actions a principal (an identity) can perform across AWS services. Use IAM policies for identity-based access control — for example, granting a specific role the ability to read from any bucket in the production account, or allowing a CI/CD pipeline role to push artifacts to designated S3 paths.
- Bucket Policies: Attached directly to the bucket resource. They define who can do what on that specific bucket. Use bucket policies for resource-based access control — for example, allowing cross-account access from an analytics account, restricting all access to requests originating from a specific VPC endpoint, or denying unencrypted uploads.
The effective permission for any request is the union of applicable policies, with one critical rule: an explicit deny always overrides any allow. In other words, if an IAM policy grants access but a bucket policy denies it (or vice versa), the deny wins. As a result, this makes it possible to create defense-in-depth security by layering restrictive bucket policies on top of broader IAM permissions.
For the most part, the recommended approach for enterprise setups is to use IAM roles (not users) with least-privilege policies for all programmatic access, and to supplement those with bucket policies for cross-account access, VPC endpoint restrictions, IP allowlisting, and encryption enforcement. Instead, avoid using bucket ACLs — they are a legacy mechanism that AWS actively recommends against.
Server-Side Encryption Options
Without a doubt, all data in S3 should be encrypted at rest. AWS offers three server-side encryption methods, and choosing between them depends on your compliance requirements and key management preferences:
- SSE-S3 (S3-Managed Keys): AWS generates, manages, and rotates the encryption keys entirely on your behalf. This is the default for all new buckets. Zero additional cost, zero management overhead, and full transparency — each object is encrypted with a unique key, and that key is itself encrypted with a rotating root key. Suitable for the vast majority of workloads where compliance does not mandate customer-managed keys.
- SSE-KMS (KMS-Managed Keys): Uses AWS Key Management Service for key management. This provides an audit trail of every key usage event via CloudTrail, supports automatic annual key rotation, and allows you to control who can decrypt objects independently from who can access them. Adds a cost of $0.03 per 10,000 requests for KMS API calls. Choose SSE-KMS when compliance frameworks (PCI DSS, HIPAA, SOC 2, FedRAMP) require auditable key management or separation of duties between storage administrators and key administrators.
- SSE-C (Customer-Provided Keys): You provide your own encryption key with every PUT and GET request. AWS uses it to encrypt/decrypt the object but never stores the key itself. This gives you complete ownership of your encryption keys but adds significant operational complexity — you must manage key storage, rotation, and disaster recovery independently. If you lose the key, your data is unrecoverable.
Recent Encryption Default Changes
AWS is now deploying a new default bucket security setting that automatically disables SSE-C for all new general purpose buckets. For existing buckets in AWS accounts with no SSE-C encrypted objects, S3 will also disable SSE-C for new write requests. This change strengthens the default security posture by reducing the attack surface of customer-managed key mishandling. Accounts that actively use SSE-C are not affected.
Block Public Access and Access Points
Importantly, S3 Block Public Access is an account-level and bucket-level safety mechanism that overrides any policy or ACL that would otherwise grant public access. It operates as four independent settings that you can enable individually or together: BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, and RestrictPublicBuckets. The recommended practice is to enable all four at the account level and only disable specific settings on specific buckets with documented justification.
Additionally, S3 Access Points simplify permission management for shared datasets. Instead of a single bucket policy that grows to hundreds of lines as more teams and applications need access, you create named access points — each with its own IAM-like policy and optional VPC restriction. For instance, an analytics team gets an access point with read-only permissions restricted to their VPC. Similarly, a data engineering team gets a separate access point with read-write access. Additionally, each access point has its own DNS name and can enforce its own network controls independently. This is particularly valuable for data lake architectures where dozens of consumers access the same underlying bucket.
S3 Object Lock and Compliance
Significantly, S3 Object Lock enables you to store objects using a Write-Once-Read-Many (WORM) model. Once enabled, objects cannot be deleted or overwritten for a specified retention period. This capability is mandatory for organizations in regulated industries.
Specifically, Object Lock operates in two modes. Governance mode allows users with specific IAM permissions to override the lock — useful for testing and operational flexibility. Compliance mode is irrevocable — nobody, including the AWS root account, can delete the object until the retention period expires. Financial services firms use Compliance mode to meet SEC Rule 17a-4, healthcare organizations use it for HIPAA-compliant record retention, and legal departments use it for litigation hold scenarios.
Furthermore, you can also apply Legal Hold independently of retention periods. A legal hold prevents deletion regardless of the retention configuration and stays active until explicitly removed. This is valuable for e-discovery processes where you need to preserve specific objects indefinitely while legal proceedings are ongoing.
Key Amazon S3 Features for Production Workloads
Beyond storage and security, Amazon S3 provides a rich feature set that transforms it from a simple file repository into an intelligent data platform. As a result, these capabilities enable event-driven architectures, automated data management, and performance optimization at scale. Below are the features that matter most in production environments.
Core Data Management Features
Transfer Acceleration
Notably, S3 Transfer Acceleration uses Amazon CloudFront’s globally distributed edge locations to speed up uploads from distant geographic locations. When enabled, data uploaded to S3 is first routed to the nearest CloudFront edge location, then transferred to your S3 bucket over AWS’s optimized backbone network. This can improve upload speeds by 50–500% for teams uploading from locations far from the bucket’s Region.
Transfer Acceleration is particularly valuable for media companies uploading video content from production locations worldwide, global development teams pushing build artifacts, and any workflow that involves uploading large files across continents. Although it costs an additional $0.04–$0.08 per GB transferred, it only charges if the accelerated path is actually faster — if the regular path would be faster, you pay standard transfer rates instead.
Multipart Upload
For objects larger than 100 MB, AWS recommends multipart upload. Above 5 GB, it is required. Multipart upload breaks the file into independently uploaded parts (each between 5 MB and 5 GB), uploads them in parallel across multiple connections, and assembles them into the final object in S3. This approach provides three key benefits: first, improved throughput through parallelism; second, pause-and-resume capability for interrupted uploads; and third, reduced impact of network failures (if one part fails, you only need to re-upload that part).
# Upload a large file using the AWS CLI (automatically uses multipart for large files)
aws s3 cp large-dataset.tar.gz s3://my-bucket/datasets/ \
--storage-class INTELLIGENT_TIERING
# Upload with explicit multipart configuration
aws s3 cp huge-backup.tar s3://my-bucket/backups/ \
--expected-size 53687091200 \
--storage-class STANDARD_IA
Temporary Access with Pre-Signed URLs
Pre-signed URLs enable you to grant temporary, time-limited access to private S3 objects without sharing your AWS credentials or making the bucket public. Essentially, you generate a URL that includes authentication parameters and an expiration time. As a result, anyone with the URL can upload or download the specified object until the URL expires.
For example, common use cases include allowing users to download invoices from a private bucket via a web application, enabling file uploads from mobile apps without embedding AWS credentials in the client code, and sharing sensitive reports with external partners for a limited window.
# Generate a pre-signed URL valid for 1 hour (3600 seconds)
aws s3 presign s3://my-bucket/reports/q1-financials.pdf \
--expires-in 3600
What’s New in Amazon S3 (2025–2026)
Undeniably, Amazon S3 continues to evolve at a pace that surprises even long-time AWS users. The 2025–2026 cycle brought some of the most significant changes since the service’s launch, reflecting AWS’s strategy of transforming S3 from a passive storage layer into a compute-adjacent data platform.
New Feature: Files — NFS Access to Buckets
This is arguably the most transformative S3 feature in years. For as long as S3 has existed, the answer to “can I mount S3 as a file system?” was “sort of, with third-party FUSE hacks that break under load.” However, that changed on April 7, 2026.
Essentially, S3 Files creates a fully managed NFS file system backed by an S3 bucket. Subsequently, applications mount S3 over NFS 4.1/4.2 and use standard file operations — open, read, write, rename, lock — while the underlying data remains in S3. Importantly, both the NFS mount and the S3 API can access the same data simultaneously. Notably, built on Amazon EFS infrastructure, S3 Files maintains a high-performance caching layer for actively used data, delivering sub-millisecond latency on small files while keeping cold data at S3 storage rates.
The implications are significant. First, teams that previously maintained separate file systems alongside S3 — duplicating data and building complex synchronization pipelines — can eliminate that entire layer. Additionally, AI agents can persist memory and share state across pipelines through a mounted file system. Furthermore, ML teams can run data preparation workloads directly on S3 data without staging files. Most importantly, file-based tools and legacy applications that could never work with object storage now work natively, with no code changes required.
New Feature: Vectors for AI Workloads
Launched at re:Invent 2025, S3 Vectors provides native vector storage for AI and machine learning applications. Instead of provisioning and managing a separate vector database (Pinecone, Weaviate, or a self-managed pgvector instance), you can now store and query vector embeddings directly in S3 with sub-100ms latency.
As a result, this simplifies Retrieval Augmented Generation (RAG) architectures by keeping embeddings alongside their source documents in the same storage layer. Moreover, it reduces operational overhead, eliminates a separate infrastructure component, and integrates natively with Amazon Bedrock and SageMaker. For organizations already running their data layer on S3, therefore, adding vector search without introducing a new system is a meaningful simplification.
Amazon S3 Use Cases
Without question, S3’s versatility is unmatched in the cloud storage landscape. Here are the most common production use cases we implement and optimize for our clients:
Amazon S3 vs Azure Blob Storage
If you are evaluating cloud storage across providers — whether for a greenfield deployment, a multi-cloud strategy, or a competitive assessment — here is how Amazon S3 compares with Microsoft Azure’s closest equivalent, Azure Blob Storage:
| Feature | Amazon S3 | Azure Blob Storage |
|---|---|---|
| Storage Model | ✓ Object Storage | ✓ Object Storage (Blob) |
| Durability | Yes — 11 nines (99.999999999%) | Yes — Up to 16 nines (LRS/ZRS/GRS/RA-GRS) |
| Storage Tiers | Yes — 9 Classes | ◐ 4 Tiers (Hot, Cool, Cold, Archive) |
| Automatic Tiering | Yes — Intelligent-Tiering (5 tiers, no retrieval fees) | Yes — Access Tracking + Lifecycle Management |
| NFS File Access | ✓ S3 Files (GA Apr 2026) | Yes — NFS 3.0 via Data Lake Storage Gen2 |
| Native Vector Storage | Yes — S3 Vectors (sub-100ms) | ✕ Not available natively |
| Native Table Format | Yes — S3 Tables (Apache Iceberg) | ✕ Requires external services |
| Max Object Size | ✓ 50 TB | Yes — ~190.7 TB (block blob) |
| Consistency Model | ✓ Strong read-after-write | Yes — Strong consistency |
| API Standard Adoption | Yes — S3 API is the industry standard | ◐ Azure-specific REST API |
| Ecosystem Integration | ✓ 100+ AWS services | Yes — Deep Microsoft/Azure integration |
| Market Share (Enterprise Storage) | ✓ 22.98% (2024 leader) | ◐ Growing but smaller share |
Which One Should You Choose?
Both are excellent, mature object storage services. In most cases, the primary decision factor is your cloud ecosystem: if your infrastructure runs on AWS, S3 is the natural choice with the deepest integration. Conversely, if you are a Microsoft shop running on Azure, Blob Storage integrates seamlessly with Azure services and Microsoft 365. Additionally, for multi-cloud architectures, S3 has an additional advantage — its API has become the de facto industry standard, supported by tools like MinIO, Cloudian, and even Azure’s own S3-compatible gateways.
Where S3 currently leads is in storage class granularity (nine classes versus four tiers), the AI-native features launched in 2025–2026 (S3 Vectors, S3 Tables, S3 Files), and the sheer breadth of its integration ecosystem. On the other hand, Azure Blob Storage counters with deeper integration into the Microsoft enterprise stack (Active Directory, SharePoint, Teams) and higher maximum object sizes for block blobs.
Getting Started with Amazon S3
Surprisingly, you can have your first S3 bucket up and running in under five minutes. Here is a complete walkthrough using both the AWS Management Console and the AWS CLI.
Creating Your First Bucket (Console)
First, log in to the AWS Management Console and navigate to the S3 service. Select Create bucket. Then, enter a globally unique name — lowercase letters, numbers, and hyphens only, between 3 and 63 characters (for example, my-company-assets-2026). Subsequently, select the AWS Region closest to your users or compute resources: us-east-1 for North America, eu-west-1 for Europe, ap-south-1 for India. Importantly, leave Block all public access enabled (the default). Additionally, leave default encryption as SSE-S3. Click Create bucket.
Uploading an Object
Next, click your new bucket name, then click Upload. Simply drag and drop files from your computer or click Add files. Optionally, expand Properties to select a storage class (defaults to S3 Standard). Click Upload. At this point, your file is now durably stored across multiple Availability Zones and accessible via its unique S3 URL.
Configuring Permissions
Then, navigate to your bucket’s Permissions tab. First, verify that Block Public Access is fully enabled. Then, to grant access to specific IAM roles or accounts, click Bucket policy and add a JSON policy. Here is an example granting read and write access to a specific IAM role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAppRoleAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/MyAppRole"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-company-assets-2026",
"arn:aws:s3:::my-company-assets-2026/*"
]
}
]
}
Using the AWS CLI
Alternatively, the AWS CLI provides fast, scriptable access to S3 for automation and daily operations:
# Create a bucket in the Mumbai region
aws s3 mb s3://my-company-data-2026 --region ap-south-1
# Upload a single file
aws s3 cp report.pdf s3://my-company-data-2026/reports/
# Upload with a specific storage class
aws s3 cp archive.zip s3://my-company-data-2026/archives/ \
--storage-class STANDARD_IA
# Sync an entire local directory to S3
aws s3 sync ./local-data/ s3://my-company-data-2026/backups/ \
--exclude "*.tmp"
# List objects in a bucket
aws s3 ls s3://my-company-data-2026/reports/ --recursive
# Download a file
aws s3 cp s3://my-company-data-2026/reports/report.pdf ./downloads/
# Generate a pre-signed URL valid for 1 hour
aws s3 presign s3://my-company-data-2026/reports/report.pdf \
--expires-in 3600
# Remove all objects with a specific prefix
aws s3 rm s3://my-company-data-2026/temp/ --recursive
Amazon S3 Best Practices
Based on our experience designing, deploying, and optimizing S3 architectures for dozens of enterprise clients across industries — from financial services and healthcare to e-commerce and media — these are the practices that consistently separate reliable, cost-effective setups from problematic ones.
Here are the core best practices every team should implement from day one:
Data Protection Best Practices
- First, enable versioning on all production buckets. It protects against accidental overwrites and deletions. As a result, you can recover any previous version of an object instantly. Pair versioning with lifecycle rules to expire non-current versions after 30–90 days and abort incomplete multipart uploads after 7 days to prevent silent storage cost growth.
- Additionally, implement lifecycle policies on every bucket. Even a simple two-rule policy — transition to Standard-IA after 30 days, transition to Glacier Flexible Retrieval after 90 days — can consequently reduce storage costs by 40–60% for datasets with natural access decay patterns.
Security Best Practices
- Equally important, block public access at both the account level and bucket level. For content that needs to be publicly accessible, instead use Amazon CloudFront with an Origin Access Control (OAC) rather than making the bucket itself public. This approach maintains security while still enabling content delivery.
- Furthermore, enable server access logging or CloudTrail data events. You need to know who accessed what, when, and from where — both for security incident response and for compliance auditing. Furthermore, store access logs in a separate dedicated bucket to prevent recursive logging.
- Similarly, use VPC endpoints (Gateway endpoints) for S3 access from EC2. As a result, this keeps traffic on the AWS private network, eliminates the need for a NAT gateway or internet gateway, avoids data transfer charges, and moreover improves security by never routing S3 traffic over the public internet.
- Lastly, enforce encryption in transit. Specifically, add a bucket policy condition that denies any request where
aws:SecureTransportis false. This ensures all data in transit is encrypted via HTTPS and therefore prevents accidental unencrypted transfers.
Cost and Performance Optimization
- Moreover, enable S3 Storage Lens. The free tier provides 28 days of usage and activity metrics with cost optimization recommendations across all your buckets and accounts. Consequently, it is the fastest way to identify your biggest cost optimization opportunities.
- Use S3 Inventory instead of LIST API calls for large buckets. Listing millions of objects via the API is slow (1,000 objects per response, paginated) and expensive. In contrast, S3 Inventory generates a daily or weekly CSV/ORC/Parquet report of all objects with metadata at a fraction of the cost.
- Finally, design key naming for performance. S3 automatically partitions data based on key prefixes for high request rates. Therefore, for write-heavy workloads, distribute writes across different prefixes rather than concentrating them under a single prefix path (for instance, use date-based or hash-based prefixes).
The Bottom Line on Amazon S3
Amazon S3 is deceptively simple to start with and deeply complex to master. The difference between a functional S3 setup and a production-grade architecture lies in storage class strategy, multi-layered security controls, lifecycle automation, cost monitoring, and performance optimization. Getting these right from day one saves months of rework and thousands of dollars in avoidable costs. This is exactly the kind of foundational work where having an experienced AWS partner pays for itself many times over.
Frequently Asked Questions About Amazon S3
Storage and Pricing Questions
Features and Capabilities Questions
Join 1 million+ security professionals. Practical, vendor-neutral analysis of threats, tools, and architecture decisions.