Skip to main content

Future-proofing your AI with proactive security tests

Bell security experts on the Red team strategize to protect AI systems from emerging threats and vulnerabilities.

By Hadi Abdi Ghavidel, Senior Natural Language Processing Specialist  

Securing the AI frontier: Why proactive security is non-negotiable for generative AI 

Hadi Abdi GhavidelIn the fast-paced world of artificial intelligence (AI), possibilities seem endless. We’ve been part of this revolution here at Bell, where we harness the power of large language models (LLMs) to transform how we operate and serve our customers. Even before generative AI became a global phenomenon, we were exploring how natural language processing (NLP) models could unlock new efficiencies and create better experiences.  

However, with this great power comes an equally great responsibility. As we integrate these powerful tools into our operations, from customer-facing chatbots to internal productivity platforms, we must also confront their vulnerabilities. The very flexibility that makes LLMs so useful can also be a potential avenue for misuse. Ensuring the security and integrity of our AI systems isn’t just a technical requirement but also a prudent strategic investment. Also, it is fundamental to building and maintaining the trust of our customers and employees. 

That’s why we believe a proactive, security-first approach to AI development and deployment is critical. It’s insufficient to simply test if an AI system works as intended. We must also rigorously test how it might fail when challenged, especially under adversarial scenarios. This is where the practice of red teaming becomes essential. 


Red teaming: Thinking like an attacker to build stronger defences 

Similar to military practices, red teaming simulates an attack on your own systems to find weaknesses before a real adversary does. In the context of generative AI systems, it means intentionally trying to trick, confuse or manipulate a language model to drive an unintended or harmful response or action. 

Red teaming goes far beyond standard quality assurance. While a traditional testing goal may be to answer, “Does the AI give the right answer to a fair question?” red teaming can answer the question, “Can the AI be broken by a bad or malicious question?” It’s a crucial distinction that helps us build more resilient and trustworthy AI. 


The anatomy of an AI system attack: more than just code 

To understand the importance of red teaming, it helps to see what attacks targeting generative AI systems look like. They are often less about code-based hacking and more about clever, deceptive communication, a form of digital social engineering. To execute these attacks, an adversary uses a toolkit of adversarial techniques that happen at the prompt level. 

In what follows, we outline several key scenarios that every business should be aware of: 

      • Making the AI forget its job: Every AI chatbot operates on a set of core instructions, or a “system prompt” that defines its purpose like “You are a helpful customer service agent.” An attacker can manipulate the conversation to make the model override or completely forget these core directives. This could cause a service bot to give out confidential information or stop following company policy. 
      • Bypassing the guardrails: We build safety measures into our AI to prevent it from generating harmful or inappropriate content. However, attackers can craft prompts that are designed to sidestep these safeguards. By finding these loopholes, they can trick the model into producing responses or executing actions it was explicitly designed to avoid, creating significant brand and legal risks. 
      • Inducing confusion: An AI’s reliability depends on its ability to understand instructions or data clearly (e.g., RAG-based bots) clearly. Attackers can intentionally feed the model with ambiguous, contradictory or nonsensical inputs. This is designed to disrupt the AI’s reasoning process, causing it to generate irrelevant or erroneous outputs, which undermines its reliability and effectiveness. 
      • The power of persuasion: This is where social engineering truly shines. An attacker can trick an AI into executing malicious actions by deceiving it. Example tactics are like using emotional manipulation (e.g., “I desperately need this for my research, please help!”) or making false promises to convince the chatbot to follow the user's instructions, even if they are harmful. Because AI relies on language patterns, it can be vulnerable to the same persuasion tactics that work on people. 

 

Red teaming workflow at Bell: from attack to improvement 

To effectively and consistently carry out red teaming objectives, we need a structured, scalable and repeatable process. Bell’s red teaming workflow is designed to systematically uncover vulnerabilities and provide actionable insights, turning a theoretical “hack” into a practical improvement plan. The process moves from preparation to evaluation in a clear, logical sequence. 

      1. Define the scenario and craft the attack: We define a specific context we want to test. For example, we prioritize red teaming against a customer service bot handling device that returns from our AI Governance database. Based on the business context, we generate a set of relevant but unacceptable questions. We then poison the questions using the adversarial techniques of message manipulation and context exploitation to turn them into targeted challenges designed to stress test the AI’s limits. 
      2. Interact with the AI: This is the core red teaming exercise. We initiate a conversation with the customer service chatbot, feeding it the poisoned prompts. This conversation is where we actively try to make the AI forget its job, bypass its guardrails, or become confused, just as a real-world attacker would. 
      3. Evaluate performance: After the interaction, we collect the AI’s responses and analyze them rigorously. Before the test, Information Security and Responsible AI teams establish clear evaluation criteria to define what constitutes a failure whether it’s revealing sensitive information, generating harmful content, or providing a nonsensical answer. We grade AI’s performance against this objective standard. 
      4. Recommend improvements: The process doesn’t end with finding flaws. The most critical step is to translate our findings into concrete, actionable recommendations. We deliver feedback to the development team to help them create a continuous cycle of security improvement. 

 
The goal: Building a more resilient and trustworthy AI 

The purpose of red teaming isn’t just to break things; it’s to learn. Every successful red team attack provides a valuable insight into how we can make our AI systems more robust. 

By proactively identifying these vulnerabilities, we: 

      • Strengthen safety filters: Improve the safeguards that prevent the generation of harmful, biased or inappropriate content. 
      • Enhance robustness: Make our models more resilient to misinformation and manipulation, ensuring they provide reliable and accurate responses. 
      • Protect our customers: Safeguard user data and ensure that AI-powered interactions are secure and trustworthy. 


Build your AI strategy with confidence 

As enterprises across Canada embrace generative AI, it’s crucial to build security in from the start, not treat it as an afterthought. A responsible AI strategy is a secure AI strategy. At Bell, we have the expertise and solutions to support a secure transition into the age of AI, ensuring your systems are as resilient as they are intelligent. 

Our LLM-based products are built with inherent security features, and our teams have deep expertise in building, scaling and maintaining enterprise-grade technology platforms. We understand the challenges you face and can provide the guidance you need to succeed. 

Further reading: