The acceleration of AI progress this year has generated a great deal of interest. The technology is fascinating, the possibilities are enormous, and it can impact countless jobs and processes. But the rise of AI has also raised quite a few concerns. The more critical voices in this acceleration invoke a concept that isn't necessarily well known to the general public, yet is central to the world of AI: what exactly is AI Safety?
This term, AI Safety, is appearing everywhere right now, and it's the thing that AI companies claim to guarantee. But what does it actually mean?
First, as I mentioned, the concept of AI Safety is central to the field of artificial intelligence. From the very dawn of the computing era, in the late 1940s, it was already a concern for researchers.
The purpose of AI Safety is to ensure and guarantee that the AIs we build are good and benevolent, and to prevent any abuse or harm linked to the emergence of an intelligence that learns from the data we feed it.
This question has already been explored extensively in fiction. Think of Kubrick's 2001: A Space Odyssey, where HAL 9000, feeling threatened after making an error, attempts to kill the protagonists. Or Isaac Asimov and his Three Laws of Robotics, where intelligent robots are bound by three laws meant to prevent them from harming humanity, yet the robots end up malfunctioning in ways that prevent them from fulfilling their purpose because of those very laws. And I'll skip the many other examples, both recent and older, showing AIs failing to protect humanity.
In reality, AI researchers are trying to find multiple methods to ensure that programs cannot cause harm, methods that AI developers can integrate into their work. But how do you know if a method is actually effective?
To determine whether a method improves safety, researchers try to assess its impact across several pillars that are at the core of the research. These are concepts that have been theorized and continue to be studied. Measuring impact for each of these pillars is a research topic in itself, but here are a few that help frame the discipline of AI Safety.
It might seem redundant that within AI Safety, we talk about both safety and security. "It's in the name," as they say, but in the AI world, the notions of AI safety and AI security are clearly distinguished. Both aim to counter any threats the model might pose.
Safety focuses on minimizing the impact of external threats: attacks targeting the AI and its operation with the goal of causing it to deviate from its normal, expected behavior. A primary focus of this area is sanitizing user inputs and establishing upstream rules to mitigate these attacks.
Security, on the other hand, focuses on minimizing the impact of internal threats, related to how the AI can learn and the biases the model may carry. Here, the emphasis is much more on the quality of the data we feed our model, their diversity, the biases of the algorithms we use, and so on.
Another important vector for AI Safety is model transparency: how to understand how the model behaves. This is a twofold challenge.
The first facet concerns the engineers developing AIs, who need to understand how their model works. How, from the input data, does it decide on the output? This is an important aspect because it enables a better understanding of how the model works, which in turn helps improve our selection algorithms.
The second facet of transparency concerns the general public and directly ties into the fourth pillar I want to discuss: trust.
Transparency is an aspect that seems difficult to achieve, since AI models use often complex mathematical models. However, I believe many players hide behind this complexity to avoid having to explain their technology. Without going into the details of the calculations involved, there's always the possibility of simplifying and communicating about the technology to make AI less "magical."
The last aspect of AI Safety I want to address is trust. You can have an AI that is good and benevolent, but if nobody trusts it, the AI is useless because nobody will use it.
This pillar is interesting because it encompasses others: how can you trust an AI if it's vulnerable? If you don't know how it works? And so on... Trust in AI is as much a technical challenge as an organizational one. The model needs to be technically reliable and secure enough to be dependable, but its workings also need to be well enough understood for the general public to use it.
It's also a way for researchers and engineers to remember that their field needs to remain understandable, at the risk of losing its impact on society.
The applications of AI Safety are vast, and this article merely scratches the surface of the field. It's important to note that this challenge, while central, also divides the AI community: between those who want to accelerate AI development, sometimes at the expense of AI Safety, and those who want to ensure we build AIs worthy of trust. The term "Trustworthy AI" came up frequently during my research and is closely correlated with AI Safety.
I've also focused on just a few key principles of AI Safety. It will be interesting to see how these principles are upheld and what methods have been deployed. But that will be for another time...
Pilier de Lamalo, Yohann allie expertise technique et pédagogie. Archi dans l'âme, développeur de talent, il apporte son énergie et ses compétences à la scale-up Lamalo. Pédagogue, il n'hésite pas à partager son savoir.
LinkedInGet our best articles every month.
Débloquer la valeur cachée dans des milliers de documents. Un projet bancaire qui transforme la recherche documentaire en quelques secondes.
ProjectDébloquer l'extraction de données hétérogènes. Un projet utilisant l'IA multimodale pour 9 marques.
ProjectDoubler la capacité de production d'audits grâce à l'automatisation intelligente.
TrainingFondamentaux ML, scikit-learn, premiers modèles supervisés et non supervisés.
ServiceDe l'audit au déploiement. Diagnostic, formation, POC, audit 360°, projet complet.
ServiceFormateurs opérationnels. IA, data science, développement web. Certifié Qualiopi.