Penetration testing (pen testing) is an authorised, simulated cyberattack against a computer system, network, or application to evaluate its security posture. Specifically, security professionals use the same tools and techniques as attackers to find and demonstrate the business impact of vulnerabilities. In addition, the test produces a remediation roadmap that security teams can act on. Importantly, pen testing differs from vulnerability assessment in depth, intent, and output. Notably, it sits at the heart of regulatory frameworks such as PCI DSS, FedRAMP, and HIPAA. This article translates authoritative guidance from NIST, OWASP, PCI DSS, and CISA into a practical procurement framework for buyers evaluating pen testing engagements.
Penetration Testing: A Working Definition
Penetration testing has a precise definition rooted in international standards. Specifically, NIST Special Publication 800-115 provides the authoritative framework used by US federal agencies and many regulators worldwide. Also, this definition shapes procurement decisions across regulated industries. As a result, buyers and providers share a common vocabulary when scoping engagements. Importantly, vendor marketing has blurred boundaries between pen testing and adjacent services, so the standards-based definition matters for procurement.
The NIST Definition
NIST SP 800-115 defines penetration testing as security testing in which evaluators mimic real-world attacks to identify methods for circumventing security controls. Specifically, the testing exploits vulnerabilities to confirm their existence and demonstrate business impact. Furthermore, NIST positions pen testing within Section 5 of the publication as a “Target Vulnerability Validation” technique. In addition, the test must be authorised in writing before any activity begins. Critically, this written authorisation is what distinguishes pen testing from criminal hacking. By contrast, vulnerability assessment scans for known weaknesses without exploiting them.
Why the Definition Matters for Procurement
The standards-based definition matters because vendor offerings vary widely under the “pen testing” label. For example, some vendors deliver automated scanning and call it pen testing. By contrast, true pen testing combines automated tooling with manual exploitation by skilled testers. Notably, the difference shows up in the depth of findings and the realism of the attack simulation. Therefore, buyers should map vendor methodology to the NIST framework during procurement. As a result, scope mismatches and unmet expectations are reduced. Likewise, regulatory frameworks like PCI DSS reference NIST and similar industry methodologies as the accepted standard.
How Penetration Testing Works
Penetration testing operates through a structured methodology and five essential components. Firstly, the methodology defines what testers do and in what order. Secondly, the essential components define the operational guardrails that make testing safe and defensible. Together, these mechanisms produce findings that buyers can act on. In addition, the methodology supports audit defensibility when pen testing is performed for regulatory compliance.
The NIST SP 800-115 Four-Phase Methodology
NIST SP 800-115 Section 5.2 defines a four-phase penetration testing methodology. Firstly, Planning establishes test goals, scope, rules of engagement, and authorisation documents. Secondly, Discovery combines information gathering with vulnerability analysis to identify targets and weaknesses. Thirdly, Attack involves attempted exploitation of identified vulnerabilities to confirm their existence and assess impact. Finally, Reporting documents findings, business impact, remediation guidance, and lessons learned. Notably, some vendors expand this into five phases (separating Information Gathering from Vulnerability Analysis) or seven phases (PTES style). However, the NIST four-phase framework remains the authoritative foundation referenced by US federal agencies and PCI DSS.
Five Essential Components of Every Pen Test
Beyond methodology, every penetration test rests on five essential components. Firstly, written authorisation from the system owner — without this, the activity is criminal regardless of intent. Secondly, defined scope including target systems, IP ranges, application endpoints, and exclusion lists. Thirdly, rules of engagement covering timing, escalation paths, and prohibited techniques (e.g., denial-of-service). Critically, the test requires evidence handling and chain-of-custody procedures for sensitive findings. Finally, a deliverable specification covering report format, severity ratings, and remediation guidance. Critically, weakness in any component undermines the test’s defensibility — particularly during regulatory audits.
Penetration Testing vs Vulnerability Assessment
Penetration testing and vulnerability assessment are frequently confused, yet they differ across six dimensions. Specifically, both identify security weaknesses but through different approaches and produce different outputs. Furthermore, regulations often mandate both, not one or the other. As a result, buyers must scope engagements that distinguish the two clearly. Importantly, scope mismatches between buyer expectations and vendor delivery are the most common pen testing engagement problems.
Purpose and Depth
Vulnerability assessment is breadth-focused, automated, and scan-based. By contrast, penetration testing is depth-focused, manual where it counts, and exploitation-based. Specifically, a vulnerability assessment might flag 500 potential weaknesses across an environment. A penetration test takes a subset of those weaknesses and demonstrates which ones an attacker could actually exploit. Besides, the pen test demonstrates business impact — what an attacker could see, take, or do once a weakness is exploited. As a result, pen testing produces fewer findings but each finding is exploitable and prioritised.
Six-Dimension Comparison
The comparison below covers six dimensions that drive most procurement decisions.
| Dimension | Vulnerability Assessment | Penetration Testing |
|---|---|---|
| Purpose | Identify known weaknesses | Exploit weaknesses to demonstrate impact |
| Depth | Broad coverage, surface-level | Narrower scope, deep exploitation |
| Approach | Automated scanning | Manual exploitation with automated assistance |
| Frequency | Quarterly or monthly (often automated) | Annual + after significant changes (PCI DSS 11.4) |
| Output | List of vulnerabilities with severity ratings | Exploitation evidence, business impact, remediation roadmap |
| Performed by | Internal security team or scanner | Skilled testers (internal or external) with offensive expertise |
Types of Penetration Testing
Penetration testing spans several distinct types, each targeting a different layer of the technology stack or attack surface. Specifically, the type chosen depends on the asset under test, the threat model, and any applicable compliance trigger. Indeed, mature security programmes use multiple types in combination over an annual testing cycle. As a result, scope definition during procurement must specify which types are in and out of scope.
Network Penetration Testing (Internal and External)
Network penetration testing examines the network infrastructure for exploitable weaknesses. Specifically, external network testing targets internet-facing systems from an attacker’s perimeter perspective. By contrast, internal network testing assumes an attacker has gained initial access and tests lateral movement, privilege escalation, and segmentation. Furthermore, PCI DSS Requirement 11.4.2 mandates internal network testing at least annually for organisations handling cardholder data. Importantly, PCI DSS Requirement 11.4.3 mandates external network testing at the same frequency.
Web Application Penetration Testing
Web application testing examines applications for exploitable code-level and configuration weaknesses. Notably, web app pen testing follows the OWASP Web Security Testing Guide (WSTG) v4.2 framework. In addition, the testing covers authentication, session management, input validation, business logic, and access control. Critically, web app pen testing matters because public-facing applications expose substantial attack surface that network-perimeter controls cannot reach. Equally, PCI DSS Requirement 6.4 strengthens application-layer testing requirements for public-facing web applications.
Wireless, Social Engineering, and Physical Testing
Beyond network and web applications, three additional types address specific attack surfaces. Specifically, wireless testing examines Wi-Fi networks for weak authentication, insecure configurations, and unauthorised access points. By the same token, social engineering testing simulates phishing, pretexting, and other human-targeted attacks to measure organisational awareness. In addition, physical testing examines facility security — tailgating, badge cloning, discarded credentials, rogue device placement. Notably, these types are often bundled into “red team” engagements that simulate a sophisticated, persistent attacker using all three vectors.
Cloud and API Penetration Testing
Cloud and API testing have emerged as distinct types as organisations have moved workloads to public cloud and exposed more functionality through APIs. Specifically, cloud pen testing examines cloud-specific weaknesses including IAM misconfigurations, exposed storage buckets, and serverless function vulnerabilities. Furthermore, API testing examines REST and GraphQL endpoints for authentication weaknesses, broken object-level authorisation, and excessive data exposure. Importantly, cloud providers impose specific rules of engagement on customer pen testing — buyers must verify provider authorisation before scoping cloud testing. By contrast, API testing typically falls under web application testing scope but warrants distinct methodology given the architectural differences.
Penetration Testing Frameworks and Standards
Four authoritative frameworks shape penetration testing practice in the US. Specifically, NIST SP 800-115, OWASP WSTG, PCI DSS Requirement 11.4, and CISA’s red team guidance form the practical foundation. Notably, these frameworks complement rather than compete — buyers can require multiple frameworks for different test scopes. As a result, sound procurement scopes each engagement against the appropriate framework or framework combination.
NIST SP 800-115
NIST SP 800-115, “Technical Guide to Information Security Testing and Assessment,” is the foundational US federal framework. Specifically, it defines the four-phase pen testing methodology (Planning, Discovery, Attack, Reporting) discussed earlier. Also, it covers broader information security testing techniques beyond pen testing — review techniques, target identification, vulnerability validation. In addition, NIST SP 800-115 was published in 2008 and remains the authoritative reference despite the date — federal agencies, FedRAMP, and FISMA all reference it. Notably, the publication’s flexibility allows organisations to scale the methodology from small assessments to enterprise-wide programmes.
OWASP Web Security Testing Guide
The OWASP Web Security Testing Guide (WSTG) v4.2 is the leading open framework for web application pen testing. Specifically, WSTG covers web application testing categories including information gathering, configuration and deployment management, identity management, authentication, authorisation, session management, data validation, error handling, cryptography, business logic, and client-side testing. Furthermore, the WSTG identifier format is “WSTG-
PCI DSS Requirement 11.4 Family
PCI DSS Requirement 11.4 governs penetration testing for organisations handling cardholder data. Specifically, the requirement family includes several sub-requirements that buyers must address.
- 11.4.1: Documented penetration testing methodology covering industry-accepted approaches (such as NIST SP 800-115 or PTES)
- 11.4.2: Internal penetration testing performed at least annually and after significant infrastructure or application changes
- 11.4.3: External penetration testing performed at least annually and after significant changes
- 11.4.4: Exploitable vulnerabilities and security weaknesses corrected, with retesting to verify correction
- 11.4.5: Penetration testing on segmentation controls at least every twelve months (service providers: every six months)
- 11.4.7: Multi-tenant service providers support customers for external penetration testing per 11.4.3 and 11.4.4
Notably, PCI DSS v4.0 replaced v3.2.1 as the only active standard on 31 March 2024. Furthermore, PCI SSC published v4.0.1 — a limited revision with clarifications but no new requirements — on 11 June 2024, and v4.0.1 became the sole active version when v4.0 retired on 31 December 2024. Likewise, future-dated requirements became mandatory on 31 March 2025. As a result, buyers procuring pen testing under PCI DSS must scope engagements against the current v4.0.1 sub-requirements rather than legacy v3.2.1 language.
CISA Pen Testing Services and SILENTSHIELD
The US Cybersecurity and Infrastructure Security Agency (CISA) provides pen testing services to federal agencies and publishes red team learnings. Specifically, CISA offers Penetration Testing services via federal shared services (through the Department of Justice). Critically, CISA operates the Federal Attack Surface Testing (FAST) programme under FY21 NDAA authority. Notably, CISA also runs SILENTSHIELD red team assessments — long-term, no-notice simulations of nation-state attacks against Federal Civilian Executive Branch agencies. Critically, the July 2024 CISA Cybersecurity Advisory AA24-193A documented SILENTSHIELD findings: an unidentified FCEB agency failed to detect a CISA red team intrusion for five months, exposing weaknesses in detection, segmentation, log collection, and credential management. As a result, the SILENTSHIELD lessons translate into defense-in-depth procurement guidance applicable beyond federal context.
Penetration Testing Tools
Penetration testing tools span several categories that support different phases of the methodology. Specifically, tools support reconnaissance, vulnerability scanning, exploitation, application testing, wireless testing, and reporting. Furthermore, no single tool covers all phases — testers combine tools based on scope and target. Importantly, tools assist judgement rather than replace it.
Reconnaissance and Discovery Tools
Reconnaissance and discovery tools support the NIST Discovery phase. Specifically, network discovery tools include Nmap (port and service identification) and Shodan (internet-exposed device search). Besides, OSINT tools include theHarvester (email and subdomain enumeration), Maltego (relationship mapping), and Recon-ng (open-source intelligence framework). In addition, certificate transparency log searches expose subdomains and infrastructure relationships. Notably, public DNS and breach databases provide further reconnaissance signal.
Exploitation Frameworks
Exploitation frameworks support the NIST Attack phase. Specifically, Metasploit is the most widely-used open-source exploitation framework, with thousands of pre-built modules. Indeed, Cobalt Strike is a commercial framework popular for red team operations. In addition, vulnerability scanners like Nessus and OpenVAS support both vulnerability identification and limited exploitation. Critically, exploitation frameworks must be used with explicit authorisation — running Metasploit against systems without authorisation is criminal regardless of intent.
Application and Wireless Testing Tools
Application and wireless testing tools target specific attack surfaces. Specifically, web application testing relies on Burp Suite (commercial; the de facto standard for web app testing) and OWASP ZAP (open-source). Furthermore, wireless testing uses Aircrack-ng for Wi-Fi security testing and Wireshark for network protocol analysis. In addition, mobile application testing uses Frida (dynamic instrumentation) and MobSF (Mobile Security Framework). Notably, API testing increasingly uses Postman, Burp Suite, and dedicated tools like API Pentest.
Automation versus Manual Judgement
Pen testing tools assist but do not replace manual judgement. Specifically, automated tools efficiently identify known vulnerabilities and produce broad coverage. By contrast, manual testing uncovers business logic flaws, chained vulnerabilities, and creative exploitation paths that scanners miss. Equally, NIST SP 800-115 explicitly notes that automated tools cannot detect all security vulnerabilities on their own. As a result, the value of a pen test correlates with the tester’s skill more than with the tool inventory. Importantly, PCI DSS Requirement 11.4.1 also notes that automated scanning alone does not satisfy the penetration testing requirement.
Pen Tester Credentialing
Pen tester credentialing matters during procurement because it signals offensive-security competence. Specifically, several credentials distinguish offensive testers from generalist security professionals. By the same token, the credential mix on an engagement team is a useful procurement signal. Importantly, no single credential guarantees competence — but the absence of any offensive credential is a procurement red flag.
Offensive Security Credentials
Several credentials specifically address offensive security and pen testing. Firstly, the OSCP (Offensive Security Certified Professional) from Offensive Security is widely regarded as the leading hands-on pen testing credential. Secondly, the CEH (Certified Ethical Hacker) from EC-Council is more knowledge-based and entry-level. Thirdly, the GPEN (GIAC Penetration Tester) and GXPN (GIAC Exploit Researcher and Advanced Penetration Tester) from SANS focus on practical offensive skill. In addition, the CRTO (Certified Red Team Operator) and CRTP (Certified Red Team Professional) address red team operations specifically.
How Buyers Should Read Credentials
Buyers should read pen tester credentials as competence signals rather than guarantees. Specifically, an engagement team with multiple OSCP, GXPN, or CRTO holders signals genuine offensive capability. By contrast, an engagement team with only CISSP and CEH holders may indicate generalist security professionals rather than offensive specialists. Furthermore, request specific tester biographies during procurement — generic team descriptions are a yellow flag. Critically, ask about recent CVE discoveries, conference presentations, and published research — these signal active offensive practice. As a result, credential review combined with team biography review produces sound procurement signal.
Engagement Scoping for Buyers
Sound engagement scoping is the single largest determinant of pen testing value. Specifically, scope mismatches between buyer expectations and vendor delivery account for most engagement disappointments. Notably, scoping is where regulatory and threat-model considerations meet operational reality. As a result, mature buyers invest meaningful time in scoping before issuing pen testing RFPs.
Rules of Engagement
Rules of Engagement define the operational guardrails for the test. Specifically, the rules cover testing windows (typically outside business hours for production systems), prohibited techniques (denial-of-service, destructive actions), escalation procedures for critical findings, and communication protocols. Also, the rules specify what happens when the tester finds an active compromise — does the test pause for incident response, or continue? In addition, the rules cover evidence handling, data exfiltration limits, and post-test data destruction. Importantly, ambiguous Rules of Engagement create operational risk during the test.
Scope, Exclusions, and Deliverables
Scope, exclusions, and deliverables form the engagement’s commercial backbone. Specifically, the scope defines what systems, applications, and networks are in-scope. Furthermore, exclusions explicitly list out-of-scope assets (production payment processing, life-safety systems, third-party services). In addition, the deliverables specification covers report format (executive summary, technical findings, remediation guidance), severity rating methodology (CVSS or vendor framework), and remediation retesting (whether and how). Notably, PCI DSS Requirement 11.4.4 specifies remediation retesting for exploitable vulnerabilities. As a result, retesting cost should be priced into the engagement upfront rather than added later.
When Pen Testing Is the Right Choice
Pen testing is the right choice when specific conditions hold. The checklist below identifies the principal decision factors.
- Regulatory mandate: PCI DSS, FedRAMP, HIPAA, SOC 2, or sector regulator requires pen testing
- Significant change: New application, infrastructure refresh, M&A integration, cloud migration
- Threat model triggers: Specific threat actor concern or recent incident in industry
- Mature vulnerability management: Vulnerability assessment programme exists and produces findings ready for exploitation testing
- Defensive capability validation: Need to validate detection and response capabilities under adversarial conditions
- Pre-launch validation: Application or service about to go to production needs adversarial review
By contrast, pen testing is the wrong choice when basic vulnerability management is absent, when scope is too broad for the budget, or when the buyer cannot act on the findings. Specifically, a pen test that produces 50 findings with no remediation capacity is worse than no pen test at all — it documents risk that the organisation has chosen not to address. As a result, scoping must include capacity to remediate the findings the test will produce.
Conclusion
Penetration testing is best understood through the multi-framework synthesis of NIST SP 800-115, OWASP WSTG, PCI DSS Requirement 11.4, and CISA red team learnings. Specifically, NIST defines the methodology, OWASP defines web application testing scope, PCI DSS defines regulatory frequency and granularity, and CISA’s SILENTSHIELD demonstrates defense-in-depth procurement lessons. Likewise, the framework synthesis matters more than any single vendor’s methodology — buyers should require vendor alignment with these frameworks rather than vendor-proprietary methodologies. As a result, the work for any firm is translating that framework foundation into specific engagement scoping suited to its threat model, regulatory context, and operational maturity. In essence, successful pen testing depends on disciplined scoping, credentialed testers, and remediation capacity matched to expected findings.
For independent guidance on scoping a pen testing engagement, talk to Signisys.
Frequently Asked Questions
The questions below address the most common queries about penetration testing. Specifically, definitional, comparative, and operational questions appear most frequently. Critically, each answer below is grounded in the standards and regulatory framework discussed above.
What is penetration testing in simple terms?
In simple terms, penetration testing is an authorised, simulated cyberattack against a computer system, network, or application to evaluate its security. Specifically, security professionals use the same tools and techniques as attackers — but with written authorisation — to find and demonstrate the business impact of vulnerabilities. As a result, the test produces a remediation roadmap that security teams can act on. Importantly, the test must be authorised in writing before any activity begins.
What is the difference between penetration testing and vulnerability assessment?
The principal difference is depth versus breadth. Specifically, vulnerability assessment scans broadly for known weaknesses and produces a list of potential vulnerabilities. By contrast, penetration testing takes a subset of those weaknesses and exploits them to confirm impact. Furthermore, vulnerability assessment is typically automated and recurring; pen testing combines automated tooling with manual exploitation by skilled testers. As a result, organisations typically run both — vulnerability assessment quarterly or monthly, pen testing annually and after significant changes.
How often should penetration testing be performed?
Penetration testing frequency depends on regulatory requirements and risk tolerance. Specifically, PCI DSS Requirement 11.4 mandates annual internal (11.4.2) and external (11.4.3) testing for organisations handling cardholder data. Besides, significant infrastructure or application changes trigger additional testing. In addition, mature security programmes test more frequently — quarterly for high-risk applications, after major releases for critical systems. Notably, annual is a regulatory floor, not a ceiling.
Is penetration testing the same as ethical hacking?
Ethical hacking is a broader category that includes pen testing. Specifically, ethical hacking refers to any use of hacking skills with authorisation to improve security. Indeed, pen testing is one specific form of ethical hacking — there are also red team engagements, bug bounty research, malware analysis, and offensive security research. As a result, all pen testing is ethical hacking, but not all ethical hacking is pen testing.
What are the main types of penetration testing?
The principal types are network (internal and external), web application, wireless, social engineering, physical, cloud, and API. Specifically, network testing targets infrastructure, web application testing targets application code, and the others target specific attack surfaces. Furthermore, mature security programmes use multiple types in combination over an annual cycle. As a result, scope definition during procurement must specify which types are in and out of scope.
References
- National Institute of Standards and Technology. Special Publication 800-115: Technical Guide to Information Security Testing and Assessment. Scarfone, K., Souppaya, M., Cody, A., and Orebaugh, A. View on csrc.nist.gov
- Open Web Application Security Project (OWASP). Web Security Testing Guide (WSTG) v4.2. View on owasp.org
- PCI Security Standards Council. Payment Card Industry Data Security Standard v4.0 — Requirement 11.4 Penetration Testing. View on pcisecuritystandards.org
- Cybersecurity and Infrastructure Security Agency (CISA). Cybersecurity Advisory AA24-193A: CISA Red Team’s Operations Against a Federal Civilian Executive Branch Organization Highlights the Necessity of Defense-in-Depth. View on cisa.gov
- Cybersecurity and Infrastructure Security Agency (CISA). Penetration Testing Service / Federal Attack Surface Testing Programme. View on cisa.gov
Join 1 million+ technology professionals. Weekly digest of new terms, threat intelligence, and architecture decisions.