What Is Amazon Bedrock?
Just two years ago, adding generative AI to a business application meant hiring ML engineers, provisioning GPU clusters, negotiating contracts with multiple model providers, and stitching together inference pipelines from scratch. Today, you make one API call. Essentially, that is Amazon Bedrock.
Amazon Bedrock is a fully managed service from Amazon Web Services that gives you secure, serverless access to foundation models (FMs) from the world’s leading AI companies — all through a single unified API. Rather than managing infrastructure, you pick a model, send a prompt, and get a response. Whether you need text generation, image creation, code assistance, or autonomous AI agents, Bedrock handles the compute, scaling, and security behind the scenes.
Since its general availability launch in September 2023, Bedrock has rapidly evolved from a simple model marketplace into the most comprehensive enterprise AI platform on any major cloud. According to AWS, it now powers generative AI for more than 100,000 organizations worldwide — from startups building their first chatbot to global enterprises running billions of daily inferences.
Notably, what separates Bedrock from directly calling model provider APIs is the enterprise layer it adds: built-in safety controls via Guardrails, managed RAG pipelines through Knowledge Bases, autonomous agent capabilities via AgentCore, cost optimization tools like Intelligent Prompt Routing and Model Distillation, and compliance certifications that span SOC, PCI DSS, HIPAA, FedRAMP High, and ISO 27001. Essentially, Bedrock lets you build AI applications that are not only functional but also secure, compliant, and cost-effective from day one.
The Platform by the Numbers
Importantly, those numbers reflect explosive adoption. Bedrock’s customer base grew 4.7x in a single year, and its API usage has doubled quarter over quarter since launch. For example, Robinhood, one of the platform’s most visible adopters, scaled from 500 million to 5 billion tokens daily in just six months — while simultaneously reducing AI costs by 80% and cutting development time in half.
Where Bedrock Fits in the AWS AI Ecosystem
Notably, AWS offers multiple AI and machine learning services, and understanding where Bedrock sits relative to them is essential for making the right architectural choice:
- Amazon Bedrock: Fully managed access to pre-trained foundation models via API. Designed for teams building generative AI applications without managing infrastructure. Choose Bedrock when you want to use existing models, not train your own from scratch.
- Amazon SageMaker: A complete ML platform for building, training, and deploying custom models. Alternatively, choose SageMaker when you need to train proprietary models on your own datasets or require full control over the ML lifecycle.
- Amazon Q: An AI assistant for business users and developers. Built on top of Bedrock’s infrastructure, Amazon Q provides a ready-to-use conversational interface rather than requiring you to build one.
In practice, many organizations use Bedrock and SageMaker together — Bedrock for inference and application integration, SageMaker for training and custom model development. They are complementary, not competing services.
Amazon Bedrock is the fastest path to production-grade generative AI on AWS. Rather than managing models and infrastructure yourself, Bedrock lets you focus entirely on building the application logic that delivers business value.
How Amazon Bedrock Works
Ultimately, understanding Bedrock’s architecture helps you make better decisions about model selection, cost management, and application design. At its core, Bedrock is an abstraction layer — it sits between your application and the foundation models, handling inference, scaling, and security so you do not have to.
Traditionally, integrating an AI model into an application required provisioning GPU-backed instances, managing model weights, implementing autoscaling, building request queuing, and handling model version management. Consequently, teams spent months on infrastructure before writing a single line of application logic. Bedrock eliminates this entire layer. You interact with models through API calls, and AWS handles everything behind the endpoint — from GPU allocation to load balancing to failover.
Foundation Models and the Unified API
Currently, Bedrock provides access to nearly 100 foundation models from a diverse ecosystem of AI providers. Specifically, these include Anthropic (Claude family), OpenAI (via Project Mantle), Meta (Llama family), Mistral AI, Amazon (Titan and Nova families), Cohere, AI21 Labs, Stability AI, DeepSeek, MiniMax, Moonshot, and Qwen — spanning text, code, image, video, and audio workloads.
Crucially, all models are accessible through a single, unified API. Whether you are calling Claude for complex reasoning or Nova Micro for lightweight classification, the integration pattern remains the same. Specifically, Bedrock offers three API interfaces:
- Converse API: The recommended interface for conversational workloads. Provides a consistent request/response format across all text models, regardless of provider. Handles model-specific formatting differences automatically.
- InvokeModel API: The lower-level interface that passes model-specific payloads. Offers maximum flexibility but requires provider-specific formatting for each model family.
- OpenAI-compatible endpoints (Project Mantle): Launched in 2026, this allows teams already building on OpenAI’s API format to use Bedrock without changing their code structure. Powered by a new distributed inference engine optimized for large-scale model serving.
Choosing the Right Model for Your Workload
With nearly 100 models available, selecting the right one can feel overwhelming. However, the decision framework is straightforward when you map task complexity to model capability. Essentially, you should consider these categories:
- Simple classification, extraction, and routing tasks: Use lightweight models like Amazon Nova Micro or Nova Lite. These handle routine tasks at a fraction of the cost of frontier models — often delivering comparable quality for structured, well-defined prompts.
- General-purpose reasoning, summarization, and content generation: Mid-tier models like Claude Sonnet, Llama, or Mistral offer an excellent balance of capability and cost. Consequently, most production workloads settle in this tier.
- Complex reasoning, multi-step analysis, and agentic workflows: Frontier models like Claude Opus or GPT-4 class models deliver the highest quality for challenging tasks. However, reserve these for workloads where quality genuinely justifies the premium.
- Image and video generation: Amazon Nova Canvas (images), Nova Reel (video), and Stability AI models serve creative and media workloads with per-asset pricing.
- Embeddings and search: Cohere Embed and Amazon Titan Embeddings power semantic search, recommendation engines, and RAG pipelines. Choose based on language coverage and dimensionality needs.
Always test the cheapest model first. Move up only when the output quality genuinely does not meet your requirements. In our experience, over 60% of enterprise workloads perform well on mid-tier models — the remaining 40% that need frontier models usually involve complex reasoning, ambiguous instructions, or multi-step agentic workflows.
Inference Modes and Service Tiers
Bedrock offers multiple service tiers that let you balance cost, latency, and throughput based on your workload requirements. Specifically, the five tiers are:
- Standard (On-Demand): Pay-per-token with no commitment. Best for variable workloads, experimentation, and development environments.
- Priority: Higher throughput with a time-based commitment. Designed for production workloads that need guaranteed availability.
- Flex: Lower-cost access for workloads that can tolerate slightly higher latency. Delivers the same savings as batch processing but through the regular API — no workflow restructuring needed.
- Reserved (Provisioned Throughput): Dedicated capacity with a term commitment. Provides predictable performance for high-volume, latency-sensitive production systems.
- Batch: Asynchronous processing for non-time-sensitive workloads. Submit prompts as a file, receive results within 24 hours at roughly half the on-demand cost.
Architecture Under the Hood
When your application sends a request to Bedrock, several things happen behind the scenes. First, the request is authenticated via IAM and routed to the appropriate model endpoint. Next, if you have configured Guardrails, the input is evaluated against your safety policies before reaching the model. Then, the model processes the request and generates a response. Finally, if Guardrails are active, the output is also evaluated before being returned to your application.
Importantly, Bedrock is fully serverless — there are no instances to provision, no GPUs to manage, and no capacity planning required. Additionally, your data is never used to train or improve the base models, and it is encrypted both in transit and at rest. As a result, this architecture makes Bedrock suitable for regulated industries where data sovereignty and privacy are non-negotiable.
Core Amazon Bedrock Features and Capabilities
Since its launch, Bedrock has evolved far beyond a simple model API. As of 2026, it is a comprehensive AI platform with capabilities spanning agent orchestration, safety controls, knowledge management, model customization, and cost optimization. Below are the features that matter most for production deployments.
Bedrock AgentCore — Building Production AI Agents
Essentially, AgentCore is Bedrock’s agentic platform for building, deploying, and operating AI agents at scale. Unlike basic API calls where you send a prompt and receive a response, agents can reason about tasks, break them into steps, call external tools, query databases, and take actions across your systems — all autonomously.
Specifically, AgentCore provides three critical capabilities for enterprise agent deployment. First, policy controls (GA since March 2026) give you precise control over what actions agents can take — verified outside the agent’s reasoning loop before reaching tools or data. Second, stateful MCP (Model Context Protocol) server support enables agents to maintain context across sessions and interact with external services through a standardized protocol. Third, memory streaming notifications let agents share state across pipelines, enabling complex multi-agent workflows where one agent’s output feeds into another’s reasoning.
Furthermore, the adoption numbers tell the real story: the AgentCore SDK has been downloaded over 2 million times in just five months since preview. Moreover, in a recent AWS AI agent hackathon, 80% of 600 agents were built using AgentCore. For example, Epsilon used AgentCore to transform their marketing operations, enabling intelligent agents to automate complex campaign workflows while maintaining enterprise-grade security and compliance.
Importantly, AgentCore is framework-agnostic — you can build agents using any orchestration framework (LangChain, AutoGen, AWS Strands) and deploy them on AgentCore for production-grade management. This decoupling means your agent logic is portable, while AgentCore handles the operational complexity of scaling, monitoring, and securing agents in production.
Guardrails for Responsible AI
Importantly, Guardrails is Bedrock’s safety layer for evaluating both user inputs and model outputs against your organization’s policies. According to AWS, Guardrails can block up to 88% of harmful content and identify correct model responses with up to 99% accuracy using Automated Reasoning checks.
Specifically, Guardrails lets you define denied topics (subjects the model must not discuss), content filters (hate speech, violence, sexual content, insults), word blocklists, PII redaction rules, and grounding checks that reduce hallucinations. Furthermore, you can apply Guardrails across all models in your account consistently — so safety policies follow the organization, not individual applications.
Moreover, Guardrails operates as an independent evaluation layer — it works regardless of which foundation model you are using. Whether you switch from Claude to Llama or from Nova to Mistral, your safety policies remain enforced. For regulated industries like financial services and healthcare, this organizational consistency is essential for compliance. Additionally, Guardrails pricing has been reduced by up to 85% for content filters and denied topics, making it economically viable to apply safety controls across every interaction rather than selectively.
Knowledge Bases and RAG Pipelines
Essentially, Knowledge Bases enable Retrieval Augmented Generation (RAG) — the technique of grounding model responses in your organization’s actual data. Rather than relying solely on a model’s training data, Knowledge Bases connect your S3 data sources, automatically chunk and embed the documents, store the vectors in a managed vector store, and retrieve relevant context at query time.
As a result, the model’s responses are grounded in your proprietary data — product catalogs, internal documentation, compliance policies, customer records — rather than general knowledge. Consequently, this dramatically reduces hallucinations and makes the output genuinely useful for enterprise applications.
Furthermore, Knowledge Bases support multiple vector store backends including Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, and Redis Enterprise. Additionally, Bedrock Data Automation handles document parsing for complex file types — PDFs, images, spreadsheets, and multi-page documents — extracting structured data that the Knowledge Base can index and retrieve. For organizations building internal search tools, customer support bots, or compliance assistants, Knowledge Bases provide the critical bridge between general AI capability and organization-specific knowledge.
Model Customization and Fine-Tuning
While Bedrock’s pre-trained models handle most use cases well out of the box, some applications require domain-specific customization. Specifically, Bedrock supports two customization approaches. First, fine-tuning lets you train a private copy of a model on your labeled dataset to improve performance on specific tasks. Second, continued pre-training lets you expose the model to your unlabeled domain data to improve its general understanding of your business context.
Additionally, the Nova Forge SDK — launched in 2026 — enables businesses to customize Amazon Nova models for their specific domain without requiring ML engineering expertise. This brings enterprise-grade model customization to a self-service workflow.
Intelligent Prompt Routing and Cost Controls
Notably, one of Bedrock’s most impactful cost optimization features is Intelligent Prompt Routing. Rather than sending every request to your most expensive model, this feature automatically routes prompts to the most cost-effective model within a family based on complexity. Simple queries go to the lighter model; complex ones go to the more capable one.
According to AWS, Intelligent Prompt Routing can reduce costs by up to 30% while maintaining quality. Similarly, combined with other optimization features — Model Distillation (up to 500% faster and 75% cheaper), prompt caching (up to 90% cost reduction), and batch processing (roughly 50% savings) — Bedrock provides a comprehensive toolkit for controlling AI spend at scale.
Pricing Model and Cost Optimization
Fundamentally, Bedrock uses a token-based, pay-per-use pricing model for text workloads. Essentially, a token is approximately four characters or 0.75 words in English. Importantly, input tokens (your prompt) are always cheaper than output tokens (the model’s response), typically by a factor of 3–5x. Similarly, for image and video models, pricing is per image generated or per second of video.
Understanding the Five Pricing Modes
Rather than listing specific dollar amounts that change over time, here is how each pricing mode works and when to use it:
- On-Demand: Pay per token with no commitment. Most flexible but highest per-token cost. Best for experimentation, development, and variable workloads.
- Batch: Submit prompts asynchronously and receive results within 24 hours. Roughly 50% cheaper than on-demand for the same model. Ideal for content generation, data processing, and any non-real-time workload.
- Flex: Same savings as batch but through the standard API — no workflow restructuring required. Suitable for workloads that can tolerate slightly higher latency without going fully asynchronous.
- Priority: Higher throughput guarantee with a time-based commitment. Designed for production workloads that need consistent performance and availability.
- Provisioned Throughput: Dedicated capacity purchased in model units with a term commitment. Provides the most predictable latency and highest throughput, but at a fixed cost regardless of usage. Therefore, reserve this for high-volume, latency-critical production systems only.
Additionally, Bedrock supports cross-region inference that automatically routes traffic across AWS Regions to avoid capacity constraints — with no additional charge beyond the source region’s pricing. Consequently, this provides a valuable availability guarantee for production workloads without adding cost complexity.
Start with on-demand pricing to validate your use case. Once you have predictable traffic patterns, move to Flex or batch for immediate savings. Only commit to Provisioned Throughput when your volume and latency requirements justify the fixed cost. For current pricing by model, see the official Bedrock pricing page.
Strategies to Reduce Your Bedrock Bill
Based on our experience deploying Bedrock for enterprise clients, these strategies deliver the most significant cost reductions:
- Right-size your model selection: The cost difference between the most expensive and cheapest models on Bedrock spans over 400x. Consequently, using a frontier model for simple classification or extraction tasks wastes budget. Match model capability to task complexity — lightweight models handle routine tasks just as well at a fraction of the cost.
- Enable Intelligent Prompt Routing: Automatically routes each request to the most cost-effective model in a family. Saves up to 30% on inference costs with minimal quality impact.
- Use prompt caching: For applications that share common context across requests (system prompts, reference documents), prompt caching avoids reprocessing the same tokens repeatedly. Reductions of up to 90% on cached portions are possible.
- Distill models for production: If you have a high-volume, well-defined task, Model Distillation creates a smaller, faster model trained on the outputs of a larger one — up to 500% faster and 75% cheaper.
- Leverage batch and Flex tiers: Any workload that does not require sub-second responses should use batch (50% savings) or Flex for cost efficiency.
Amazon Bedrock Security and Compliance
Undoubtedly, for enterprise adoption, security is not optional — it is the prerequisite. Amazon Bedrock was architecturally designed for regulated industries from the ground up, and its security posture reflects that commitment.
Data Protection Architecture
Specifically, Bedrock provides several critical security guarantees. First, your data is never used to train or improve the base foundation models — this is a fundamental architectural commitment, not just a policy. Second, all data is encrypted both in transit (TLS) and at rest (AWS KMS). Third, you can use AWS PrivateLink to establish private connectivity between your VPC and Bedrock, ensuring that inference traffic never traverses the public internet. Fourth, Bedrock maintains complete data isolation — your prompts, responses, and custom model data are not accessible to other customers or to the model providers themselves.
Additionally, Bedrock supports identity-based access control through IAM policies, allowing you to control which users, roles, and applications can access specific models. As of April 2026, Bedrock also supports cost allocation by IAM principal — meaning you can attribute inference costs to specific users, teams, projects, or applications using IAM tags in Cost Explorer and CUR 2.0. For organizations with multiple teams or business units consuming AI services, this capability is essential for cost governance and chargeback.
Furthermore, comprehensive monitoring and logging capabilities support governance and audit requirements. CloudWatch integration provides metrics on model invocations, token usage, latency, and error rates. CloudTrail records all API calls for security auditing. Together, these controls create a complete audit trail from request to response — a requirement for many regulated industries.
Compliance Certifications
Currently, Amazon Bedrock is in scope for a comprehensive set of compliance standards: SOC 1, SOC 2, SOC 3, PCI DSS, ISO 27001, FedRAMP High, CSA STAR Level 2, HIPAA eligible, and GDPR. Furthermore, AWS offers an uncapped intellectual property (IP) indemnity for copyright claims arising from generative output of covered Amazon AI services — protecting customers from third-party copyright infringement claims when using Amazon’s own models responsibly.
What’s New in Amazon Bedrock (2025–2026)
Undeniably, the pace of innovation on Bedrock has been extraordinary. Here are the most significant updates from the past 18 months:
Bedrock has evolved from a text-first platform to a genuinely multimodal one. The current model roster spans language, code, vision, image generation, video generation (Nova Reel), audio, and safety workloads. In February 2026 alone, six new models were added: DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next.
Real-World Amazon Bedrock Use Cases
Given its breadth, Amazon Bedrock’s versatility makes it applicable across virtually every industry. Organizations across financial services, healthcare, retail, media, and technology are deploying Bedrock-powered solutions to solve real business problems — not just experimental proofs of concept. According to a 2025 Forrester TEI study, organizations implementing generative AI on AWS with partner support achieve an average 240% ROI and $16.5 million in benefits over three years. Below are the use cases we implement most frequently for our enterprise clients:
Amazon Bedrock vs Azure OpenAI Service
If you are evaluating enterprise AI platforms across cloud providers, here is how Amazon Bedrock compares with Microsoft’s Azure OpenAI Service:
| Capability | Amazon Bedrock | Azure OpenAI Service |
|---|---|---|
| Model Diversity | ✓ ~100 models from 15+ providers | ◐ Primarily OpenAI models (GPT, DALL·E, Whisper) |
| Provider Lock-In | ✓ Multi-provider, swap models freely | ✕ Tightly coupled to OpenAI’s roadmap |
| Agentic Infrastructure | ✓ AgentCore with policy controls (GA) | ◐ Azure AI Agent Service (newer) |
| Safety Controls | Yes — Guardrails with Automated Reasoning | Yes — Azure AI Content Safety |
| RAG / Knowledge Bases | ✓ Managed Knowledge Bases with vector store | Yes — Azure AI Search integration |
| OpenAI API Compatibility | Yes — via Project Mantle (2026) | ✓ Native OpenAI API format |
| Custom Model Training | Yes — Fine-tuning + continued pre-training | Yes — Fine-tuning for select models |
| Cost Optimization Tools | ✓ Distillation, routing, caching, Flex, batch | ◐ PTUs + standard on-demand |
| Ecosystem Integration | Yes — Deep AWS native integration | Yes — Deep Microsoft/Azure integration |
| Compliance | Yes — SOC, PCI, ISO, FedRAMP High, HIPAA | Yes — SOC, PCI, ISO, FedRAMP, HIPAA |
Making the Right Platform Decision
Ultimately, the choice depends on your cloud ecosystem and AI strategy. If you are an AWS-native organization wanting access to the broadest model selection with maximum flexibility, Bedrock is the stronger choice. Conversely, if your stack runs on Azure and you primarily need GPT-family models with deep Microsoft integration, Azure OpenAI is the natural fit.
Importantly, Bedrock’s key advantage is model diversity and freedom from single-provider dependency — a critical consideration as the AI landscape continues to evolve rapidly. If OpenAI changes pricing, experiences outages, or deprecates a model, Azure OpenAI customers are directly impacted. On the other hand, Bedrock users can switch to an alternative provider with a configuration change. Additionally, Bedrock’s cost optimization toolkit (Distillation, Intelligent Routing, caching, batch, Flex) is more comprehensive than Azure’s current offerings, making it easier to control costs at scale.
Getting Started with Amazon Bedrock
Surprisingly, you can make your first Bedrock API call in under five minutes. Here is a step-by-step walkthrough.
Enabling Model Access
First, navigate to the Amazon Bedrock console in the AWS Management Console. Click Model access in the left navigation. Next, select the models you want to enable — by default, no models are accessible until you explicitly grant access. Additionally, some models (like Claude) require you to accept the provider’s end-user license agreement. Importantly, once enabled, there is no cost for enabling access — you only pay when you make inference calls.
Your First API Call
Below is a minimal Python example using the Converse API with Boto3. Before running this code, ensure you have the AWS CLI configured with appropriate credentials and the boto3 library installed (pip install boto3):
import boto3
# Initialize the Bedrock Runtime client
client = boto3.client('bedrock-runtime', region_name='us-east-1')
# Send a request using the Converse API
response = client.converse(
modelId='anthropic.claude-sonnet-4-6-v1',
messages=[
{
'role': 'user',
'content': [{'text': 'Explain Amazon Bedrock in two sentences.'}]
}
]
)
# Print the response
print(response['output']['message']['content'][0]['text'])
Alternatively, if your team already uses the OpenAI Python SDK, you can use Bedrock’s OpenAI-compatible endpoint via Project Mantle without changing your code:
from openai import OpenAI
client = OpenAI() # Uses Bedrock's Mantle endpoint
response = client.chat.completions.create(
model="anthropic.claude-sonnet-4-6-v1",
messages=[
{"role": "user", "content": "Explain Amazon Bedrock in two sentences."}
]
)
print(response.choices[0].message.content)
Alternatively, for a no-code experience, the Bedrock console includes playgrounds for text, chat, and image generation where you can test models interactively before writing any code.
Best Practices and Common Pitfalls
Based on our experience building production AI applications on Bedrock for enterprise clients across industries, these are the practices that consistently determine success or failure.
Recommendations for Production Deployment
- First, start with the Converse API: Unless you have a specific reason to use InvokeModel, the Converse API provides a consistent interface across all text models and simplifies future model switching. Moreover, it handles provider-specific formatting differences automatically, so your code remains clean as you experiment with different models.
- Next, enable Guardrails from day one: Retrofitting safety controls after launch is significantly harder than building them in from the start. Therefore, define your content policies, PII redaction rules, and denied topics before deploying to production.
- Additionally, implement cost monitoring early: Use CloudWatch metrics, the new IAM cost allocation tags, and S3 logging to track token consumption by team, project, and application. Consequently, you can identify cost spikes before they become budget problems.
Architectural Best Practices
- Furthermore, design for model portability: Structure your application so that switching models requires a configuration change, not a code rewrite. This is one of Bedrock’s greatest architectural advantages — do not negate it by hardcoding model-specific logic throughout your application.
- Also, test with the playground before coding: Use the Bedrock console playgrounds to evaluate model quality, test prompts, and compare outputs before committing to integration work. This saves significant development time by validating assumptions early.
- Finally, build evaluation pipelines: Use Bedrock’s model evaluation capabilities to systematically compare models on your actual tasks. Rather than selecting a model based on benchmarks alone, evaluate it on representative samples of your real-world data to ensure it meets your quality requirements at the right cost point.
The single most common mistake we see is organizations defaulting every request to a frontier model (like Claude Opus or GPT-4) when a lighter model would produce equally good results at a fraction of the cost. Specifically, tasks like classification, extraction, summarization, and routing rarely need the most capable model. Start by testing your workload on the lightest model available, then move up only when quality genuinely requires it.
Amazon Bedrock eliminates the infrastructure complexity of generative AI, but the strategic complexity remains. Choosing the right models, designing effective prompts, implementing safety guardrails, and optimizing costs all require hands-on expertise. This is exactly where working with an experienced AWS partner accelerates your time to production and protects your investment.
Frequently Asked Questions About Amazon Bedrock
Technical and Security Questions
Cost and Integration Questions
Join 1 million+ security professionals. Practical, vendor-neutral analysis of threats, tools, and architecture decisions.