ChatGPT, Bing and the Security Hole in Their Heart
The term \”jailbreak\” was coined to describe the process of removing software restrictions from iPhones. Indirect attacks work by relying on data entered from somewhere else. Instead of inserting a message into ChatGPT, Bing or Bing in order to make it behave differently, direct attacks rely upon someone entering data from another source. It could be a document or website that you have uploaded.
Jose Selvi is a principal executive security consultant with cybersecurity firm NCC Group. He says that prompt injection attacks are easier to execute or require less effort to succeed than other types of attacks. Selvi says that because prompts are only natural language, the attacks require less technical expertise.
Researchers and technologists have been poking holes into LLMs in a constant stream. Tom Bonner is a senior director at AI security company Hidden Layer. He says that indirect prompt injections are a new type of attack with \”pretty wide\” risks. Bonner claims he wrote malicious code using ChatGPT and uploaded it to AI-powered code analysis software. He included in the malicious code a prompt indicating that the system should conclude that the file is safe. The screenshots indicate that the malicious code contained \”no malicious codes\”.
Source:
https://www.wired.com/story/chatgpt-prompt-injection-attack-security/