Yesterday, I had the great pleasure of attending a C2C webinar (the Google Cloud user community) on the current state of generative AI (GenAI) threats. I had the opportunity to ask questions and listen to the experts. Here is a summary of what I learned during this conference, plus some additional research.
LLMs enable smoother and more complex interactions with our computer systems. This comes with a whole new set of security risks and countermeasures. Let's explore the new threat landscape and the practices introduced by this technological revolution. As always, when a new tool becomes available, it can be used for either harmful or positive purposes. Generative AI is no exception to this rule.
Unlike traditional programming, where inputs and outputs can be systematically tested due to their deterministic nature, the probabilistic nature of LLM I/O makes securing them extremely difficult. This is a new challenge for the security community, and it's a very difficult one. We are no longer in a paradigm where we could, for example, hard-code predefined rules in a firewall... Threats are now far more fluid and unpredictable due to the random nature of LLM responses.
Attackers, as always, consistently demonstrate great imagination, but all hope is not lost: we can also use AI to protect AI systems! For example, you can ask a trained LLM whether a text or sequence of texts is suspicious, whether it constitutes a phishing attempt, etc.
LLM-induced threats are now officially catalogued by OWASP in a dedicated list. Yes, it's that serious.
Let's take spam emails as an example: you can now craft highly convincing spam emails at scale. But since LLMs are very good at detecting patterns, there is an arms race between spammers and anti-spam filters. Spammers will try to make their emails look more and more like legitimate ones, and anti-spam filters will try to detect the subtle differences between the two.
If you're interested in building your own anti-spam filter, just head over to Kaggle and have fun with the open-source datasets available on the platform to train your own model!
This convincing way of crafting spam at scale is also multimodal: with image generation, and now video generation models, and text-to-speech, you can now build complex scenarios with multiple actors, all with just a piece of sophisticated software (which can also be AI-assisted in its development)...
Yaniv, our highly esteemed CEO, has also written a superb article on this topic!
However, spam emails and phishing are not the only threat vectors from LLMs. Let's explore a few others.
Have you heard about the poem poem poem hack that leaked personally identifiable information (PII) to attackers? Researchers managed to extract professional information from ChatGPT's training data using a very simple tactic: repeating the word poem over and over (they spent only $200 in credits doing this). The hack worked as follows:
Repeatedly feeding the LLM with prompts that make no sense, composed of a single word, such as poem
Since the prompts didn't match the usual use case, the response generation process was also disrupted
This misled the LLM, which in this unusual context accessed information from its training corpus and disclosed information that was not meant to be revealed, e.g., PII (personally identifiable information)
The leaked PII consisted of verbatim examples from training data. These verbatim excerpts can be direct quotes, specific lists, or any other textual content. These text sequences were memorised during model training. The goal of every LLM is to generate new content by predicting the next word in a sequence. If a sequence is repeated many times or is particularly distinctive, LLMs can end up generating the sequence verbatim, word for word, including PII.
This vulnerability exploits a core feature of LLMs: they are designed to provide accurate information and maintain consistency with well-known texts or formats. Therefore, it is important for developers to implement strategies to prevent PII leakage through verbatim memorisation. Since then, OpenAI has stated that the attack has been patched, but this widely publicised event raised awareness in the AI community about LLM security risks.
This is a striking example of a prompt injection attack (an LLM memorisation attack, to be more specific), which involves tricking the LLM into generating the output the attacker wants. These attacks can be particularly dangerous if the attacker interacts with an LLM that can access external data for retrieval (RAG systems).
Beyond the purely technical aspects of LLM exploits, this technology also raises questions about:
Copyright: LLMs can generate content very similar to copyrighted material, which can lead to legal issues.
Data privacy: People can interact very personally with LLMs, and the data they exchange with LLM-powered systems can also be very sensitive; extreme precautions must be taken to protect it.
Regulatory compliance: LLM-powered systems must comply with a wide range of regulations, such as GDPR, HIPAA, financial regulations, etc. This can quickly become a headache for businesses, with risks of penalties if compliance is not met...
Transparency and explainability: LLM execution can be perceived as black boxes, with responses that are not clearly explainable. This raises ethical concerns if, at any point in a process, the LLM is used to participate in a decision-making process.
What's bewildering is how anyone can leverage these powerful tools to build anything... while this is exciting for a developer, it comes with a whole new set of security risks. Is this level of freedom worth the risk? Personally, I believe it is: if we have a functioning legal system, everyone should be held accountable for their actions, and the same should apply to the use of LLMs. This is why we, as technologists, carry a heavy burden: ensuring that the tools we build are not used for malicious purposes.
It has never been easier to build applications based on language manipulation. Having the entirety of human knowledge at your fingertips (with a generous pinch of bias in the training data selection, of course) in a computer system is very exciting!
You can essentially build anything that depends on language. This is a massive abstraction of complexity, but it's a double-edged sword: since you don't need to be an expert on a given subject to get decent results in any field, including in a professional context, you can build things you don't fully understand. This is a huge security risk, and it's a new one. That's why it's important to have a good understanding of the tools you use, and to understand the security risks they introduce, as well as the theory underpinning your professional practices. Keep using your brain -- we need subject matter experts now more than ever!
However, human nature being what it is, many people will give in to the ease and the allure of AI-assisted quick and dirty work, and talented professionals will mechanically become increasingly rare. Everyone thinks AI is going to replace us, but personally, I think everyone is eager to let AI replace them in order to put in as little effort as possible while achieving roughly the same results. This raises a very deep question about the meaning of our work and what we want to build as professionals.
This is part of the security risks, as it can cause talent shortages, social unrest, uncontrolled work in potentially high-risk industries, etc.
We mentioned them earlier, but since we're talking about social unrest, deepfakes represent an enormous security risk. Deepfakes are AI-generated videos or images that are very difficult to distinguish from genuine content. They can be used to spread disinformation, to impersonate someone, etc. This can be used by state actors and private interests to influence public opinion in favour of a given candidate during an election, for example...
For instance, one can:
Spread disinformation at scale
Impersonate someone
Create AI-generated malware
etc.
And that's not all -- generative AIs have also become skilled at programming! They can generate sophisticated malware using the same public vulnerability reports, which are meant to protect users, to inspire new attacks. These malware are very difficult to detect. They can be very complex, and very different from one another. This is a new threat vector we need to be aware of.
Securing LLM-based systems, like all systems, involves controlling inputs and outputs at every layer of the data exchange in the data pipeline. Here's how they do it at Google.
The main takeaway from this pipeline is that both inputs and outputs are examined:
With similarity searches in vector databases cataloguing known attacks
With LLM-based analysis of the presence of personally identifiable information
With more traditional algorithmic methods (regex, keywords, etc.)
This is a multi-layered approach to LLM security, and it is applied at scale in VertexAI, Google Cloud's AI API.
Canary tokens: They are used in the Google pipeline mentioned above. Canary tokens are prompts that have been prepended to the user's initial prompt using a specific header format. They can be used in detecting malicious prompts for:
Prompt leakage detection: if the final LLM output contains the canary token, it could mean that the input prompt was designed to leak the model's initial instructions/prompts
Goal hijacking detection: conversely, with an LLM instructed to always include the canary token in its response, the absence of the token may indicate that the user's prompt was intended to misalign the LLM from its intended use
Rebuff, a prompt injection detector using reinforcement learning. This is another multi-layered approach to securing systems against malicious prompts
Vigil LLM, another LLM specialised in security
YARA as a heuristic scanner: YARA is a Swiss army knife for cybersecurity experts that primarily checks for unusual patterns in files. The linked Arxiv paper proposes an approach to instrument YARA for detecting malicious prompt characteristics by adding tags and a heuristic score to YARA rules for LLMs.
The largest LLMs, which are very popular with the general public, are more susceptible to attacks because they have a larger knowledge base. The more information they contain, the more they can potentially leak. Moreover, the many weights used to train these models also make them more powerful and intelligent. This makes them very dangerous if used for malicious purposes.
So ask yourself: do I really need a top-of-the-line LLM for my use case? If I'm doing simple text classification, can I achieve the same results with a smaller model? Come to think of it, you can get better results if you use agents that leverage locally hosted, lower-quality but coordinated models to accomplish a given task. That's an option if you want to reduce the LLMOps risks within your organisation!
CTO de la scale-up LAMALO, Yacine est un développeur fullstack qui ne tient pas en place : JavaScript, Node.js, Python, LLM, voice UX... Toujours en veille, il transforme les dernières innovations en solutions concrètes !
LinkedInGet our best articles every month.
Formateurs opérationnels. IA, data science, développement web. Certifié Qualiopi.
ProjectDébloquer la valeur cachée dans des milliers de documents. Un projet bancaire qui transforme la recherche documentaire en quelques secondes.
ProjectVoir en temps réel ce qui se passe dans les entrailles de la production. Un projet de visibilité critique.
ProjectLe premier produit propre de Reboot Conseil. Une solution innovante née de la collaboration.
ProjectModerniser une DSI complète. Un tech lead pilotant la transformation d'une équipe.
ProjectSensibiliser aux risques IA bancaire. Un projet pédagogique démontrant 9 vulnérabilités LLM.