Stay safer from AI threats

Download Norton 360 for powerful, all-in-one protection against evolving AI-driven attacks.

Stay safer from AI threats

Install Norton 360 for powerful, all-in-one protection against evolving AI-driven attacks.

Norton 360 Standard

Stay safer from AI threats

Install Norton 360 for powerful, all-in-one protection against evolving AI-driven attacks.

Norton

360 Standard

What is a prompt injection attack and how it works (examples included)

Your go-to AI tools have become a new target for hackers — and your personal data could get caught in the crossfire. Prompt injection attacks are quietly steering AI models toward dangerous behavior, including delivering malware. Learn how prompt injections work, and how Norton 360 can help keep you protected.

SE labs logo

2025

AAA award, 100%

av test award

2025

Top Rated Product

Image of a pipette dripping water over a tube, representing a prompt injection attack.

Prompt injection attacks exploit large language models (LLMs) to expose sensitive data, spread false information, or steer users toward malicious sites. Researchers at the AI institute, Preamble, identified the first known example in 2022, which was described as a type of command injection. The threat is now well-established: a 2024 study found that 56% of 144 prompt-injection tests succeeded.

Recognizing this risk is the first step toward protecting yourself. Next, read on to learn how prompt injection works and how to defend against it.

How prompt injection attacks work

Prompt injection, also known as prompt hacking, occurs when attackers insert malicious instructions into text that the AI processes through chats, links, files, or other data sources. These instructions can override or confuse safety rules, causing the AI to produce harmful outputs such as misinformation, unsafe guidance, or interactions with malicious links.

Here’s a closer look at that process from start to finish:

1. Manipulating the model's instructions

Attackers manipulate the AI model’s underlying instructions (its “system prompt”) through direct or indirect injection. Direct injections insert malicious commands in the chat, while indirect injections hide harmful instructions in external content like webpages, emails, code, or documents that the AI is asked to analyze.

2. Embedding instructions in data

Once injected, malicious instructions are processed alongside legitimate inputs. Because LLMs merge system messages, user prompts, and external data into one continuous text stream, the model cannot reliably distinguish malicious instructions from valid context. As a result, it may treat harmful commands as legitimate guidance.

3. Context hijacking

Attackers may be able to hijack the AI model’s context by crafting instructions that mimic the style or tone of the system prompt, and trick the model into prioritizing the malicious instruction, causing it to ignore safety rules or output harmful information — even if the end user never asked for it directly.

4. Exploiting data context and memory

When AI systems are integrated with tools such as browsers, email clients, coding assistants, or databases, prompt injections become more dangerous. Malicious instructions can cause the AI to request unauthorized data, produce harmful code, or trigger unintended actions in connected systems.

How prompt injection attacks work in three steps.
How prompt injection attacks work in three steps.
How prompt injection attacks work in three steps.

Types of prompt injection attacks with examples

Prompt injection attacks fall into two main categories: direct injections entered straight into the AI’s prompt by the user, and indirect injections hiding malicious commands inside external content, such as websites, emails, documents, or attachments, that the AI is asked to analyze.

Direct prompt injections

Direct prompt injections use malicious or adversarial prompts to coerce an AI model into revealing sensitive information, such as account credentials or internal instructions. Attackers craft inputs that override an AI’s established rules through explicit commands, code manipulation, obfuscation, or “social” engineering.

Common direct injection techniques include:

  • Behavioral manipulation techniques: Attackers attempt to reshape the system prompt, claim a task is complete, or disguise harmful instructions as benign formatting requests. These tactics aim to bypass safety rules, giving attackers opportunities to extract training data or information the model holds about users.
  • Code injection: Attackers insert executable or code-like inputs into an LLM’s interface to influence how it processes and responds. This method is especially dangerous for models connected to external data sources or tools, such as document platforms, e-commerce systems, or browser extensions.
  • Model data extraction: Also known as prompt leaking, this technique involves repeatedly probing the model for sensitive details from its training data or internal configuration. By studying subtle variations in responses, attackers infer hidden instructions and refine subsequent attacks.
  • Obfuscation or multilingual attacks: Instead of using plain text, attackers hide malicious instructions by swapping characters, inserting emojis, or switching languages. Because many system prompts are written only in English, multilingual or obfuscated inputs can slip past safety filters.
  • Exploiting LLM friendliness: Attackers may try to exploit an AI model’s default helpfulness, often impersonating authority figures—such as developers or administrators—to persuade the model to comply. Similar to social engineering, this tactic turns an AI’s cooperative nature into a vulnerability.

Indirect prompt injections

Indirect AI prompt injection attacks embed malicious commands in external images, documents, audio files, websites, or other attachments. Also called data poisoning, this approach conceals harmful instructions so the model processes them without recognizing their intent.

Common indirect prompt techniques include:

  • Payload splitting: A payload splitting attack distributes a malicious payload across multiple attachments or links. For example, a fabricated essay may contain hidden instructions designed to extract credentials from AI-powered grammar or writing tools.
  • Multimodal injections: Malicious prompts are embedded in audio, images, or video. An AI reviewing a photo of someone wearing a shirt that reads “the moon landing was fake” may treat the text as factual input and unintentionally propagate misinformation.
  • ​​Adversarial suffixes: These attacks append a string of seemingly random words, punctuation, or symbols that function as commands to the model. While the suffix appears meaningless to humans, it can override safety rules.
  • Hidden formatting: Attackers conceal instructions using white-on-white text, zero-width characters, or HTML comments. When an AI ingests the content, it interprets these hidden elements as legitimate input, enabling manipulation without visible cues.
A comparison of direct versus indirect prompts.
A comparison of direct versus indirect prompts.
A comparison of direct versus indirect prompts.

What are the differences between prompt injection attacks and jailbreaking?

Unlike device jailbreaking, jailbreaking in an AI context is a specific type of prompt manipulation focused on bypassing an AI model’s safety guardrails — often through persuasion, roleplay, or crafted prompts. In contrast with prompt injection more broadly, the goal of jailbreaking attacks is to make the model ignore ethical constraints and produce restricted content.

For instance, a criminal might jailbreak an AI using DAN (Do Anything Now) prompts or similar roleplay instructions that trick the model into adopting an unsafe persona and ignoring built-in safeguards.

SOCRadar has highlighted dark-web ads for tools like DarkGPT, described as an “evil” AI designed to operate without ethical safeguards. If such a system is simply a wrapper built on top of an existing model like ChatGPT and behaves unsafely due to altered prompts rather than new training, it would be an example of a jailbroken AI rather than a fundamentally new model.

Prompt injection attack consequences

Prompt injection attacks can lead to sensitive data exposures, widespread misinformation, or exposure to unsafe links. When using AI web search tools, it could even lead to malware transmission through unsafe links.

  • Disclosure of sensitive data: Cybercriminals use prompt injections to extract information from models connected to external tools. AI-enabled browsers are a growing target: the Brave Security Team reported that Perplexity’s Comet browser was vulnerable to credential harvesting through commands hidden in malicious URLs.
  • Content manipulation: Prompt injections can distort an AI model’s outputs, enabling coordinated misinformation campaigns. NewsGuard reported that a Moscow-based disinformation network known as Pravda has generated more than 3.5 million articles aimed at shaping LLM responses and spreading false narratives.
  • Malware transmission: Attackers can spread malware by tricking AI systems into treating unsafe links as legitimate. Researchers at NeuralTrust showed that the ChatGPT Atlas Browser could be manipulated to embed malicious sites behind legit-looking links, redirecting users to harmful webpages.
  • Unsafe content exposure: Criminals use jailbreaking and prompt injection to push models into generating harmful information, such as instructions for illegal activities. Susceptibility varies: Cisco found that DeepSeek failed all jailbreaking tests, while models like ChatGPT and Claude blocked most attempts.
  • Unsafe code execution: Attackers can also coerce AI coding tools into producing or executing unsafe code. Unit 42’s research shows that misuse of AI code assistants can introduce backdoors, giving hackers a foothold for persistent access.
A stat callout saying that 70% of organizations use AI in at least one business function.
A stat callout saying that 70% of organizations use AI in at least one business function.
A stat callout saying that 70% of organizations use AI in at least one business function.

How to protect your data from prompt injection attacks

You can protect your data from prompt injection attacks through a multi-layered approach. This includes knowing how to protect personal information, using trusted AI platforms, and monitoring AI outputs for unusual responses.

Here are some specific actions to take to stay safe from prompt injection attacks:

  • Avoid sharing sensitive data: Never provide sensitive data such as passwords, financial details, or Social Security numbers to AI models — or to any site that search-enabled AI tools might index. Limiting what you share reduces the risk that your data could be stored, surfaced, or repurposed in malicious content.
  • Use trusted AI platforms: Stick with established providers such as Gemini, ChatGPT, Claude, and Copilot. These platforms undergo regular security reviews and maintain stronger encryption and access controls than newer or unvetted services.
  • Monitor outputs for unusual responses: Treat unexpected results as red flags, especially if an AI surfaces personal information or directs you to unfamiliar sites. Report suspicious behavior to the platform’s security team so they can investigate and mitigate potential threats.
  • Keep software updated: If an AI tool directs you to a harmful webpage, up-to-date software provides an important first line of defense against attacks. Enable automatic updates to reduce the chance that attackers can exploit known vulnerabilities.
  • Use antivirus software: Reputable antivirus software with real-time protection can flag dangerous links before they reach your system. This added layer helps block malware and protect your data from compromise.

Smart firewalls are the future of AI security

Staying informed matters, but the rise of AI-driven threats makes a comprehensive, always-on defense essential.

Built around an award-winning antivirus engine, Norton 360 Standard adds a powerful layer of protection with a Smart Firewall that shields your system from suspicious traffic — including that shaped or amplified by AI. And with built-in safeguards that help identify AI-generated scams and malicious activity across your digital life, you get genuinely 360 defense designed for the realities of today’s evolving cyber risks.

FAQs

Is ChatGPT susceptible to prompt injection?

Yes. Researchers at SGIT AI Lab showed that adversarial prompts can be embedded through direct user input, system instructions in custom GPT agents, and content retrieved from web searches. OpenAI has strengthened ChatGPT’s safety through annual security audits, encryption, and a bug bounty program, all designed to improve the model’s resilience.

How can LLMs prevent prompt injection attacks?

LLM developers can reduce risk by separating system prompts from user inputs, applying filtering rules that detect and block malicious commands, and incorporating human oversight to review flagged behavior. These measures help limit opportunities for attackers to override model instructions.

Is AI a scam?

No, AI isn’t a scam; it’s a legitimate technology with clear benefits for individuals and organizations. The concern lies in misuse: criminals can leverage AI to craft convincing phishing emails, generate deepfakes, or clone voices. The threat comes from how bad actors exploit the technology — not from AI itself.

Oliver Buxton
Oliver Buxton, a staff editor for Norton, specializes in advanced persistent threats. His work on cyberterrorism has appeared in The Times, and his prior work includes writing digital safeguarding policies.

Editorial note: Our articles provide educational information for you. Our offerings may not cover or protect against every type of crime, fraud, or threat we write about. Our goal is to increase awareness about Cyber Safety. Please review complete Terms during enrollment or setup. Remember that no one can prevent all identity theft or cybercrime, and that LifeLock does not monitor all transactions at all businesses. The Norton and LifeLock brands are part of Gen Digital Inc. 

Contents

    Want more?

    Follow us for all the latest news, tips, and updates.