Quantcast
Channel: Artificial Intelligence | Network World
Viewing all articles
Browse latest Browse all 773

Cisco researchers highlight emerging threats to AI models

$
0
0

Cisco security researchers this week detailed a number of threats they are seeing from bad actors trying to infect or attack AI’s most common component – the large language model.

Some techniques used to hide messages or attacks from anti-spam systems are familiar to security specialists: “Hiding the nature of the content displayed to the recipient from anti-spam systems is not a new technique. Spammers have included hidden text or used formatting rules to camouflage their actual message from anti-spam analysis for decades,” wrote Martin Lee, a security engineer with Cisco Talos, in a blog post about current and future AI threats. “However, we have seen increase in the use of such techniques during the second half of 2024.”

Being able to disguise and hide content from machine analysis or human oversight is likely to become a more important vector of attack against AI systems, according to Lee. “Fortunately, the techniques to detect this kind of obfuscation are well known and already integrated into spam detection systems such as Cisco Email Threat Defense. Conversely, the presence of attempts to obfuscate content in this manner makes it obvious that a message is malicious and can be classed as spam,” Lee wrote.

Separately, Adam Swanda, an AI researcher, and Emile Antone, a product marketing manager with Cisco Security, wrote a blog post about emerging AI cyber threats. The pair cited three specific attack methods:

  • Single-Turn Crescendo Attack: “In previous threat analyses, we’ve seen multi-turn interactions with LLMs use gradual escalation to bypass content moderation filters. The Single-Turn Crescendo Attack (STCA) represents a significant advancement as it simulates an extended dialogue within a single interaction, efficiently jailbreaking several frontier models. The Single-Turn Crescendo Attack establishes a context that builds towards controversial or explicit content in one prompt, exploiting the pattern continuation tendencies of LLMs. Alan Aqrawi and Arian Abbasi, the researchers behind this technique, demonstrated its success against models including GPT-4o, Gemini 1.5, and variants of Llama 3. The real-world implications of this attack are undoubtedly concerning and highlight the importance of strong content moderation and filter measures.”
  • Jailbreak via Simple Assistive Task Linkage (SATA): “SATA is a novel paradigm for jailbreaking LLMs by leveraging Simple Assistive Task Linkage. This technique masks harmful keywords in a given prompt and uses simple assistive tasks such as masked language model (MLM) and element lookup by position (ELP) to fill in the semantic gaps left by the masked words. The researchers from Tsinghua University, Hefei University of Technology, and Shanghai Qi Zhi Institute demonstrated the remarkable effectiveness of SATA with attack success rates of 85% using MLM and 76% using ELP on the AdvBench dataset. This is a significant improvement over existing methods, underscoring the potential impact of SATA as a low-cost, efficient method for bypassing LLM guardrails.”
  • Jailbreak through Neural Carrier Articles: “A new, sophisticated jailbreak technique known as Neural Carrier Articles embeds prohibited queries into benign carrier articles in order to effectively bypass model guardrails. Using only a lexical database like WordNet and composer LLM, this technique generates prompts that are contextually similar to a harmful query without triggering model safeguards. As researchers from Penn State, Northern Arizona University, Worcester Polytechnic Institute, and Carnegie Mellon University demonstrate, the Neural Carrier Activities jailbreak is effective against several frontier models in a black box setting and has a relatively low barrier to entry.”

Cisco authors pointed to additional research from the Ellis Institute and University of Maryland that looks at adversarial attacks on LLMs. The Ellis Institute and University of Maryland researchers highlighted the ease with which current-generation LLMs can be coerced into a number of unintended behaviors. Their research showed these behaviors included misdirection attacks in which an LLM outputs URLs or malicious instructions to a user or LLM and denial-of-service attacks where LLMs were made to produce extreme numbers of tokens to exhaust GPU resources.

>

Viewing all articles
Browse latest Browse all 773

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>