Back to Blog
Cloud Computing

Amazon Comprehend: The Complete Guide to AWS NLP

Amazon Comprehend extracts insights from unstructured text — sentiment, entities, key phrases, PII, and custom classifications — through simple API calls with no ML expertise required. This guide covers all core APIs, custom classification and entity recognition, PII redaction, toxicity detection, pricing, the April 2026 maintenance mode announcement, and a comparison with Azure AI Language.

Cloud Computing
Service Deep Dive
17 min read
4 views

What Is Amazon Comprehend?

Inevitably, every organization sits on mountains of unstructured text — customer emails, support tickets, product reviews, social media mentions, legal contracts, medical records, survey responses. However, extracting actionable insights from this text manually is slow, inconsistent, and expensive. Amazon Comprehend automates the entire process with machine learning.

Amazon Comprehend is a fully managed natural language processing (NLP) service from Amazon Web Services that uses machine learning to extract insights, relationships, and meaning from unstructured text. Importantly, without requiring any ML expertise, you can analyze text for sentiment, extract named entities (people, places, organizations), identify key phrases, detect the dominant language, classify documents into custom categories, and detect or redact personally identifiable information (PII).

Importantly, Amazon Comprehend is not a single API — it is a suite of pre-trained and customizable NLP capabilities that work together to turn raw text into structured, actionable data. Whether you need to gauge customer sentiment across thousands of reviews, automatically categorize support tickets by topic, or redact PII from documents before indexing, Comprehend handles the ML complexity behind the scenes while you focus on business logic.

Amazon Comprehend by the Numbers

100+ languages
Detected by Language API
50K units
Free Tier (Monthly)
50% faster
Document Review (HMLR)

Notably, Amazon Comprehend powers NLP workloads across industries — from financial services (ExxonMobil uses it for procurement classification) to healthcare (Chick-fil-A uses it to detect foodborne illness signals) to legal (HMLR doubled document review speed and cut review time by 50%) to retail (schuh uses sentiment analysis to prioritize customer support tickets). Furthermore, Comprehend integrates natively with other AWS services including S3, Lambda, SageMaker, QuickSight, and Augmented AI (A2I) for human review workflows.

Moreover, Amazon Comprehend plays a critical role in the broader AWS AI ecosystem. While Amazon Textract extracts text from documents and Amazon Rekognition analyzes images, Comprehend is the service that understands what extracted text means — identifying sentiment, entities, topics, and privacy-sensitive information. For organizations building intelligent document processing pipelines, Comprehend is typically the analysis layer that sits between raw text extraction and business decision-making.

Key Takeaway

Amazon Comprehend turns unstructured text into structured insights — sentiment scores, named entities, key phrases, language detection, PII identification, and custom classifications — through simple API calls. If your organization processes text at scale and needs to extract meaning from it, Comprehend is the fastest path to production NLP on AWS.


How Amazon Comprehend Works

Fundamentally, Amazon Comprehend operates as a serverless API service. You send text (up to 100 KB per request for synchronous calls, or batch files in S3 for asynchronous processing), specify which analysis you want, and receive structured JSON results — all without provisioning servers or training models. Under the hood, Comprehend’s pre-trained models are built on deep learning architectures trained by AWS on massive, diverse text corpora. Consequently, they deliver strong accuracy across a wide range of text types and domains without any customization required.

Core Amazon Comprehend APIs

Currently, Amazon Comprehend provides several specialized NLP APIs:

  • Sentiment Analysis: Essentially, determines whether text is positive, negative, neutral, or mixed. Returns sentiment scores for each category. Widely used for analyzing customer feedback, product reviews, social media monitoring, and brand reputation tracking.
  • Entity Recognition: Additionally, identifies and categorizes named entities in text — people, organizations, locations, dates, quantities, and more. Essential for extracting structured data from unstructured documents.
  • Key Phrase Extraction: Furthermore, identifies the most important phrases and talking points in text. Useful for summarizing documents, powering search, and identifying trending topics across large text corpora.
  • Language Detection: Moreover, automatically identifies the dominant language in text across over 100 languages. Returns a confidence score for the detected language. Critical for routing multilingual content to the appropriate processing pipeline.
  • PII Detection and Redaction: Critically, identifies personally identifiable information — names, addresses, email addresses, phone numbers, credit card numbers, Social Security numbers — and can redact it automatically. Essential for GDPR and HIPAA compliance workflows.
  • Toxicity Detection: Finally, classifies text as toxic or non-toxic, designed for moderating peer-to-peer conversations in online platforms and filtering generative AI inputs and outputs.

Custom Models in Amazon Comprehend

Beyond the pre-trained APIs, Amazon Comprehend supports two types of custom models that let you tailor NLP to your specific domain:

  • Custom Classification: Essentially, train a model to categorize text into your own defined categories. For example, classify customer support tickets as “billing,” “technical,” “shipping,” or “returns” — or categorize legal documents by type, clause, or risk level. You provide labeled training data and Comprehend trains the model automatically.
  • Custom Entity Recognition: Similarly, train a model to detect entities specific to your business that the pre-trained entity recognition does not cover. For example, detect product SKUs, internal project codes, proprietary terminology, or industry-specific identifiers in your documents.

Importantly, both custom model types use Comprehend’s AutoML pipeline — you provide labeled training data, and Comprehend handles feature engineering, model selection, hyperparameter tuning, and deployment automatically. No ML expertise whatsoever is required to build and deploy production-quality custom NLP models.

Furthermore, custom models can be deployed in two ways. For real-time applications requiring low-latency responses, deploy the model to a persistent endpoint that processes requests synchronously. For batch workloads, use asynchronous inference to process large document collections stored in S3. The choice between these deployment modes has significant cost implications — persistent endpoints bill continuously regardless of traffic, while batch inference only charges for documents processed.

Additionally, Comprehend supports multi-label classification, where a single document can belong to multiple categories simultaneously. For example, a customer email might be classified as both “billing inquiry” and “cancellation risk” — enabling routing workflows that address multiple concerns in a single interaction rather than forcing each document into a single category.


Core Amazon Comprehend Features

Beyond the APIs described above, several capabilities make Amazon Comprehend particularly powerful for enterprise text processing. These features work together to handle virtually any text analysis requirement — from simple sentiment scoring on individual reviews to complex multi-language entity extraction across millions of documents. Below are the capabilities organized by function:

PII Detection and Redaction
Automatically detects personal, financial, and technical PII in text — names, addresses, credit card numbers, SSNs, IP addresses, passwords. Can redact detected PII in place, replacing it with character strings for GDPR and HIPAA compliance.
Toxicity Detection
Pre-trained classifier that identifies toxic content in text, designed for moderating online conversations and screening generative AI inputs/outputs. Available out of the box with no training required.
Custom Classification
Train custom text classifiers on your labeled data to categorize documents into business-specific categories. Handles support ticket routing, document categorization, content tagging, and compliance classification.
Custom Entity Recognition
Train models to detect domain-specific entities — product codes, policy numbers, medical terms, legal citations — that pre-trained models do not recognize. Requires labeled training data but no ML expertise.
Batch Processing
Process large document collections asynchronously by submitting batch jobs against S3-stored text files. Results are delivered to S3 when processing completes. Optimized for high-volume text analysis workflows.
Multi-Language Support
Language detection across 100+ languages. Sentiment analysis, entity recognition, and key phrase extraction support multiple languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Arabic, and Hindi.

Need NLP in Your Applications?
Our AWS team designs and deploys Comprehend-powered text analysis pipelines for enterprise workloads


Amazon Comprehend Pricing Model

Fundamentally, Amazon Comprehend uses pay-per-unit pricing based on the volume of text processed, with no minimum commitments or upfront fees required. Rather than listing specific dollar amounts that change over time, here is how the cost structure works and what dimensions to watch for when estimating your monthly spend:

Understanding Amazon Comprehend Costs

  • Pre-trained API calls: Essentially, charged per unit of text (1 unit = 100 characters, with a minimum of 3 units per request). Separate rates apply for each API — sentiment, entities, key phrases, language detection, PII, and syntax. Volume-tiered pricing reduces per-unit costs as monthly usage increases.
  • Custom model training: Additionally, charged per second of compute time for training custom classifiers and custom entity recognizers. Training costs are one-time per model version.
  • Custom model inference: Furthermore, charged per unit of text processed when using custom models for classification or entity recognition. Alternatively, you can deploy a custom model to a persistent endpoint (charged per second of uptime) for low-latency, real-time inference.
  • Batch processing: Similarly, asynchronous batch jobs use the same per-unit pricing as synchronous calls but are optimized for processing large document collections stored in S3.
Cost Optimization Strategy

Critically, only call the APIs you need — do not run sentiment, entities, and key phrases on every document if you only need one. Additionally, use batch processing for large document sets to optimize throughput. For custom models, evaluate whether real-time endpoints (continuous cost) or batch inference (per-request cost) better matches your access pattern. Importantly, the free tier provides 50,000 units per month for the first 12 months. For current pricing, see the official Comprehend pricing page.


Amazon Comprehend Security and Compliance

Since Comprehend processes potentially sensitive text data — customer communications, medical records, financial documents — security is critical.

Specifically, all data processed by Amazon Comprehend is encrypted in transit (TLS) and at rest (AWS KMS). Importantly, text data is not persistently stored after processing — Comprehend analyzes the input text, returns structured results with confidence scores, and immediately discards the original text data. Furthermore, Comprehend supports VPC endpoints via PrivateLink for private connectivity, and IAM policies provide fine-grained access control over API access.

Additionally, Amazon Comprehend is HIPAA eligible, making it suitable for healthcare organizations processing clinical notes, patient communications, and medical records. It also supports SOC 1/2/3, PCI DSS, and ISO 27001 compliance standards. The PII detection and redaction capabilities directly support GDPR compliance by automatically identifying and masking personal data before it enters downstream systems — a proactive approach that prevents privacy violations rather than discovering them after the fact.

Moreover, for organizations processing text in regulated environments, Comprehend’s ability to operate entirely within a single AWS Region ensures data residency requirements are met. Combined with KMS customer-managed keys for encryption and CloudTrail logging of all API calls, organizations maintain complete control and auditability over how their text data is processed.


What’s New in Amazon Comprehend

Important: Feature Maintenance Mode

As of April 30, 2026, Amazon Comprehend’s Topic Modeling, Event Detection, and Prompt Safety Classification features are entering maintenance mode. These features will no longer be available to new customers after this date. Existing customers who have used them in the last 12 months retain access. Core Comprehend APIs — sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection/redaction, toxicity detection, custom classification, and custom entity recognition — remain fully active and supported.

Despite the maintenance mode announcement for select features, Comprehend’s core capabilities continue to evolve. Specifically, recent additions include toxicity detection for online content moderation and generative AI output screening, enhanced PII detection with broader entity type coverage, and improved accuracy across multilingual sentiment analysis. Furthermore, the integration between Comprehend and Amazon Bedrock Guardrails enables organizations to apply NLP-based content safety checks directly within their generative AI applications.

Additionally, Comprehend’s custom classification and entity recognition models have received performance improvements, with faster training times and higher accuracy on smaller training datasets. For organizations building multi-service AI pipelines, the ability to chain Comprehend with Textract (for document OCR), Translate (for multilingual processing), and SageMaker (for custom ML models) through Lambda orchestration creates a powerful, modular text processing architecture that can handle virtually any document analysis workflow.


Real-World Amazon Comprehend Use Cases

Given its versatility, Amazon Comprehend serves organizations across every industry that processes text at scale — from retail and financial services to healthcare, legal, and government. According to AWS customer case studies, organizations report significant efficiency gains: HMLR doubled document review speed, schuh automated support ticket prioritization that was previously entirely manual, and ExxonMobil improved procurement contract utilization through automated text classification. Below are the use cases we implement most frequently for our clients:

Customer Sentiment Analysis
Analyze customer reviews, survey responses, and social media mentions to gauge satisfaction, detect negative trends, and identify improvement opportunities. Schuh uses Comprehend to color-code support tickets by sentiment before agents even log in.
Support Ticket Classification
Automatically categorize incoming support tickets by topic, urgency, and sentiment using custom classifiers. Route tickets to the most appropriate agent based on expertise, reducing resolution time and improving customer satisfaction.
PII Redaction for Compliance
Detect and redact personally identifiable information from documents before indexing, sharing, or archiving. Essential for GDPR, HIPAA, and data privacy compliance workflows across customer communications and internal records.
Legal Document Analysis
Extract entities, key phrases, and relationships from contracts and legal documents. HMLR uses Comprehend to compare thousands of property transfer documents weekly, doubling review speed and flagging discrepancies that could become legal disputes.
Supply Chain Compliance
Extract business-specific entities from compliance documents at scale. Combined with Amazon Textract for OCR and A2I for human review, Comprehend powers intelligent document processing pipelines for supply chain risk assessment.
Content Moderation
Screen user-generated content and generative AI outputs for toxicity using Comprehend’s toxicity detection API. Integrate with Amazon Bedrock Guardrails for end-to-end content safety in AI-powered applications.

Amazon Comprehend vs Azure Text Analytics

If you are evaluating NLP services across cloud providers, here is how Amazon Comprehend compares with Microsoft’s Azure AI Language (formerly Text Analytics):

Capability Amazon Comprehend Azure AI Language
Sentiment Analysis Yes — Document and sentence level ✓ Document, sentence, and aspect level
Entity Recognition ✓ Pre-trained + Custom entity models Yes — Pre-built + Custom NER
Key Phrase Extraction Yes — With confidence scores Yes — With confidence scores
Language Detection ✓ 100+ languages Yes — 120+ languages
PII Detection/Redaction ✓ Detect + automatic redaction Yes — Detect + redaction
Custom Classification ✓ AutoML with labeled data Yes — Custom text classification
Toxicity Detection Yes — Pre-trained classifier Yes — Azure Content Safety
Healthcare NLP ◐ Via Comprehend Medical (separate) ✓ Built-in healthcare NLP models
Ecosystem Integration Yes — S3, Lambda, SageMaker, QuickSight Yes — Blob Storage, Functions, Power BI
Compliance Yes — HIPAA, SOC, PCI, ISO Yes — HIPAA, SOC, PCI, ISO

Choosing the Right Amazon Comprehend Alternative

Clearly, both services offer mature NLP capabilities. Ultimately, your cloud ecosystem determines the best fit. If you build on AWS, Comprehend’s native integration with S3, Lambda, and SageMaker makes it the natural choice. Conversely, if your infrastructure runs on Azure, Azure AI Language integrates natively with Azure Functions, Power BI, and Cognitive Services.

Notably, Azure holds an advantage in healthcare NLP with built-in clinical models, while Comprehend’s healthcare capabilities require the separate Comprehend Medical service. However, Comprehend’s custom classification and entity recognition via AutoML is particularly streamlined — you provide labeled data and Comprehend automatically handles all of the training complexity, making it accessible to teams without ML expertise — a significant advantage for organizations that want to deploy NLP quickly without hiring specialized data scientists.

Furthermore, for organizations considering the broader AI landscape, Amazon Bedrock with foundation models like Claude can now handle many NLP tasks that previously required Comprehend — sentiment analysis, entity extraction, and text classification can all be performed through natural language prompts. For new projects, evaluate whether Comprehend’s purpose-built APIs or Bedrock’s flexible generative AI approach better matches your use case. Comprehend excels at high-volume, structured NLP tasks with predictable per-unit pricing, while Bedrock offers more flexibility for complex, open-ended text understanding at token-based pricing.


Getting Started with Amazon Comprehend

Fortunately, Amazon Comprehend requires no setup for the pre-trained APIs. You call the API with your text and receive structured results immediately. The free tier provides 50,000 units per month for the first 12 months — enough to process thousands of documents for evaluation and prototyping before committing to production workloads.

Your First Amazon Comprehend API Call

Below is a minimal Python example that performs sentiment analysis:

import boto3

# Initialize the Comprehend client
client = boto3.client('comprehend', region_name='us-east-1')

# Analyze sentiment
response = client.detect_sentiment(
    Text='The new product launch exceeded our expectations. '
         'Customer response has been overwhelmingly positive.',
    LanguageCode='en'
)

# Print results
print(f"Sentiment: {response['Sentiment']}")
for key, score in response['SentimentScore'].items():
    print(f"  {key}: {score:.2f}")

Subsequently, you can extend this pattern to any Comprehend API — replace detect_sentiment with detect_entities, detect_key_phrases, detect_pii_entities, or detect_dominant_language depending on your use case. For production deployments, trigger Comprehend from Lambda functions when new text arrives in S3 or SQS, creating a fully automated, event-driven text analysis pipeline that scales to handle any volume of incoming text without manual intervention or infrastructure management. For more details, see the Amazon Comprehend documentation.


Amazon Comprehend Best Practices and Pitfalls

Advantages
No ML expertise required — production-grade NLP via API calls
PII detection and redaction supports GDPR and HIPAA compliance
Custom classification and entity recognition via AutoML
Language detection across 100+ languages
Toxicity detection for content moderation and GenAI safety
Deep AWS integration with S3, Lambda, SageMaker, and QuickSight
Limitations
Topic modeling, event detection, and prompt safety entering maintenance mode
Custom model endpoints incur continuous costs when deployed
Not all NLP features support all languages — check language availability per API
Healthcare NLP requires separate Comprehend Medical service
Per-unit pricing can be complex to estimate for variable text volumes

Recommendations for Amazon Comprehend Deployment

  • First, start with pre-trained APIs before building custom models: The pre-trained sentiment, entity, and key phrase APIs handle most common NLP tasks out of the box. Only invest in custom classification or entity recognition when the pre-trained capabilities genuinely do not meet your accuracy requirements.
  • Additionally, use PII redaction as a preprocessing step: Run PII detection and redaction on text before storing, indexing, or sharing it with downstream systems. This ensures compliance by default rather than retroactively discovering sensitive data in your pipelines.
  • Furthermore, batch process large document sets: For analyzing thousands of documents, use asynchronous batch jobs against S3-stored text files rather than individual synchronous API calls. Batch processing optimizes throughput and simplifies error handling.
  • Moreover, combine Comprehend with Textract for document pipelines: Use Textract to extract text from scanned documents and PDFs, then pipe the extracted text into Comprehend for entity recognition, sentiment analysis, and classification. This combination powers complete intelligent document processing workflows.
  • Finally, monitor custom endpoint costs: Custom model endpoints run continuously and bill per second of uptime. If your workload is intermittent, use batch inference instead of persistent endpoints to avoid paying for idle capacity.
Key Takeaway

Amazon Comprehend transforms unstructured text into structured business intelligence — sentiment scores, named entities, key phrases, PII locations, and custom classifications. The key to successful deployment is matching the right API to each use case, combining Comprehend with Textract for document processing pipelines, and using PII redaction proactively for compliance. An experienced AWS partner can help you design NLP architectures that maximize insight extraction while controlling costs.

Ready to Extract Insights from Your Text Data?
Let our AWS team build NLP-powered text analysis pipelines for your organization


Frequently Asked Questions About Amazon Comprehend

Common Questions Answered
What is Amazon Comprehend used for?
Essentially, Amazon Comprehend is used for extracting insights from unstructured text data. Common use cases include sentiment analysis of customer feedback, entity extraction from documents, automated support ticket classification, PII detection and redaction for compliance (GDPR, HIPAA), content moderation via toxicity detection, legal document analysis, and supply chain compliance document processing. It works through simple API calls with no ML expertise required.
What is the difference between Amazon Comprehend and Amazon Textract?
Amazon Textract extracts text from documents — it handles the OCR and structural understanding (tables, forms, key-value pairs) from images and PDFs. In contrast, Amazon Comprehend analyzes text that has already been extracted — determining sentiment, identifying entities, extracting key phrases, and detecting PII. Therefore, they are complementary services often used together: Textract extracts the text, then Comprehend analyzes it for meaning and insights.
Does Amazon Comprehend support healthcare data?
Yes, through two mechanisms. First, Amazon Comprehend itself is HIPAA eligible, so it can process protected health information (PHI) with appropriate compliance controls. Second, Amazon Comprehend Medical is a separate, specialized service that extracts medical entities — conditions, medications, dosages, procedures, anatomical terms — from clinical text. Choose standard Comprehend for general NLP on healthcare communications and Comprehend Medical for clinical entity extraction.

Technical and Pricing Questions

Is Amazon Comprehend free?
Indeed, Comprehend offers a free tier for the first 12 months, providing 50,000 units of text per month for most pre-trained APIs. Beyond the free tier, it uses pay-per-unit pricing based on text volume, with costs decreasing at higher volumes through tiered pricing. Custom model training and inference have separate pricing. For current pricing and detailed rates, visit the official Comprehend pricing page.
How many languages does Amazon Comprehend support?
Currently, language detection works across over 100 languages. However, other NLP features have varying language coverage. Sentiment analysis, entity recognition, and key phrase extraction support major languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Arabic, and Hindi. Therefore, check the AWS documentation for the complete list of supported languages per API, as coverage varies by feature.
Weekly Briefing
Security insights, delivered Tuesdays.

Join 1 million+ security professionals. Practical, vendor-neutral analysis of threats, tools, and architecture decisions.