Now, after a year into Gen AI, we ask: is GenAI worth the hype in Cyber Defense?
Just over a year ago, ChatGPT captured the world's attention. Product developers rushed to make promises that AI would solve problems normally assigned to knowledge workers. These claims were met with both excitement and fear. Now, after a year into Gen AI, what is real, and what does it mean for Cyber Defense?
A perfect application
I have a theory of why ChatGPT was so successful. OpenAI, whether meaning to or not, stumbled upon the perfect use case for Generative Text AI: a chatbot. Generative Text AI models such as the OPEN AI GPT-3 and GPT-4, take a text input (Prompt) and predict the next best word, one word at a time. The key to its stunning success at constructing a language in this way is the model's ability to focus its “attention” on the nuance of the user's prompt. The model was trained on data primarily gathered from the Internet, which has oriented it well to how people communicate through the written word.
It’s perfect because a chatbot is insensitive to the three major problems with this new wave of AI technology:
- Imperfect Reasoning – Gen AI models were trained on how we speak, but not necessarily on what is empirically true. Asking a Gen AI model about a company such as Salem Cyber produces a wide range of facts that seem convincing but are often factually inaccurate. The response is more reflective of what type of data you would know about any privately held company, and less what is true about THIS specific privately held company. The model aces the language component (as it was trained to do) but fails on the facts.
- Slow request processing –There is a reason you often hear the term large Language Model; it’s because they are huge. To run the best ones, you need specialized computer chips that have a lot of memory to load the model, and a lot of processing power to make billions of calculations for each prompt. The net result is that requests take time, oftentimes many seconds, which is an eternity when you compare it to the sub-millisecond processing times of other applications that help make our lives what they are today.
- High Costs – Hardware is expensive, hard to get, and you need a lot of it. The R&D investment into models such as the ones behind ChatGPT are in the hundreds of millions of dollars. As a result, the AI is being provided as a service by someone who has to money to run these computer chips.
And yet, as mentioned before, the AI chatbot is the perfect application because practically all these problems don’t matter to it. When a user asks a question, we read along as the chatbot takes its time to type out its response. Social media has gotten us used to reading with a critical eye and double-checking facts where it seems like a good idea too. Once we’ve asked our question and maybe had a follow-up or two, we move on to something else. It's okay that it's slow, it’s okay that it's sometimes wrong, and it's okay that it's expensive because we only use it sparingly anyway.
Early products are experiments, and there have been many Gen AI products that have felt like it. The most noteworthy and heralded is Github Copilot. As a casual observer, it is seemingly another perfect application of AI. However, when you dig deeper, you see that this product has higher sensitivities to two of the three problems mentioned above.
- High Cost: Github Copilot is always on, and constantly making recommendations. This is helpful as a developer but makes the services more sensitive to the cost of each request made to the underlying AI model. Also, many of the recommendations will be ignored by the end user who simply didn’t ask for the help. In October, the Wall Street Journal reported that Microsoft loses $20 a month per Github copilot user (who only pays $10 per month).
- Imperfect Reasoning: Code has significantly less subjectivity than other types of language. A breakdown in reasoning, such as suggesting methods that don’t exist or a misstated logical test, will result in broken, buggy, and potentially insecure code. Reasoning errors at this scale are relatively easy for a developer to miss. Paradoxically, this is especially true for inexperienced developers who otherwise stand to benefit most from Gen AI assistance.
Ultimately, Github Copliot serves as both an example of the benefits of AI and the difficulties product developers will face in bringing these benefits to market.
Gen AI’s impact on cyber defense
Blackhat 2023 seemed to be the big reveal party for Gen AI for cyber, with SOAR (sometimes referred to as hyperautomation) leading the buzz. And yet, many of the splashiest claims were already muted halfway through the fall. As it turns out, cyber is a hard discipline for a Gen AI to step into for a few reasons:
- Many of the core cyber analyst tasks require writing structured queries to recall data that is imperfectly structured. The analyst must be highly precise in how they search for data and also understand the business and technology landscape in order to decode the nuance captured in that data.
- All the information you want or need might not exist, or might be scattered into many systems, that each have their own idiosyncrasies.
- The meaning behind cyber events often hinges on human intent: What was someone trying to do? Did they mean to do that? That someone is sometimes not even in your organization. They very well could be a developer who works for a consulting firm or a product company who wrote code to behave in a particular way.
- There is a lot of analysis to perform.
These challenges, and others, present a huge hurdle that the AI technology by itself struggles to (and will continue to struggle to) overcome. AI in cyber products will take a backseat role, more so than in products for other technology verticals. These products will rely heavily on people to access hyper-localized information that is traditionally shared throughout a business in the form of oral histories.
Products that simply layer language models on top of existing cyber platforms will receive their deserved “meh” rating from their intended customers. At best, they are a worse ChatGPT. At worst, they are a totally unreliable mess that is somehow both unreliable and expensive.
But on a hopeful note, thoughtful products that use AI in well-scoped roles are emerging as well. Selfishly, we believe Salem is one of those products that respect the critical role people, now more than ever, play in protecting organizations. Products like these can pave the way for how future AI products should and can interact with organizations.
First, we believe AI’s role will be to scale the impact of people’s knowledge and experience. To that end, Salem has shown that it can take a globally true understanding of cyber-attacks and learned hyper-local business context to undertake a very specific job: tier 1 alert triage.
Alert triage is a perfect mission for AI because it represents both a critical mission and extremely low ROI for every person hour invested. Anecdotally, there is less than 1 real threat for every 100 critical and high-level cyber alerts and currently, many medium and low severity alerts aren’t being looked at by organizations given time and resource constraints. However, we’ve found multiple threats by analyzing low and medium alerts.
Takeaway 1: find a narrowly defined mission that is both critical, necessary, and super repetitive.
The challenge with alert triage is already stated above when we clearly made the case that Gen AI will struggle to do this work. And yet we can do the work, just not with Gen AI. There is a whole world of ML, AI, and other statistical modeling methods that have been employed for decades. Our strategy is to use the best methods to best address specific reasoning problems. By using multiple models, we can stitch together a pipeline that collectively represents the reasoning a cyber analyst would use to come to definitive conclusions.
Takeaway 2: find the AI tools that best fit your problem, not the problem that best fits your AI.
And finally, we know that lab-grown just doesn’t taste the same as the real thing. We know that cyber is a nuanced business because the technology a business uses is layered with tradeoffs and technical debut. So, training models on lab data just isn’t going to produce the right result. We won’t give away our methods but will say that battle scars teach the best lessons.
Takeaway 3: Built products for the messy real world.