Three Types of Risk with Traditional AI
Securing AI - Part I
Some lurking problems…
When people bring up their fears about AI (Artificial Intelligence), it’s common to hear concerns like mass unemployment, a total blurring of real and fake news, or nuclear Armageddon. Those might all be valid fears, but they mostly revolve around the development of AGI (Artificial General Intelligence), which is only hypothetical at this point in time. I have a few other concerns that are more immediate, since they have to do with the way modern AI is already being used. They are all security focused, of course. Here are the issues I see – Generative AI, the way most of us use it now, makes us more vulnerable to threats like data breaches, system failure, and ironically enough, obsolescence.
Data
We all know AI is trained on nearly inconceivable swathes of data and that AI companies like Open AI are hungry for even more. So when we discuss user inputs, it’s natural to think of that in the context of how AI companies are using your input to train their AI. I have sometimes heard complaints from people who don’t like the way their data is being used to advance AI in ways they might think are unethical. But what gets less attention is what kinds of sensitive data users might be entering, and how good, old-fashioned social engineering could lead users to put the wrong information into the wrong hands.
AI can prompt users to enter more data than they might ever trust to the Google search engine. When people use a web search engine, they are rewarded for inputting precise, jargon-free queries. However, AI responds better to more sophisticated kinds of queries. A helpful acronym for building a strong AI prompt I learned from my Google Cybersecurity Certificate is TCREI (Task, Context, References, Evaluate, Iterate). Even if people don’t know this specific acronym ahead of time, they will learn these principles through trial-and-error interacting with AI. Consider how data fits into TCREI – AI prompts require a lot more data than a web search. And not just in terms of bytes! Context and references contain the personal details that we usually leave out in a web search.
As an example, imagine you’re looking for a movie to watch. If you’re Googling, you might try a search like “best sci fi movies on Netflix.” If you’re not happy with the search results, you might try a couple iterations like adding “right now” or swapping “movies” for “shows,” but that might be the extent. Now imagine you’re asking AI like Google Gemini for movies to watch. A good prompt might look something like this.
“Hi Gemini! I need your help finding something sci fi to watch tonight (Task). I have a couple hours to kill so anything goes, but it will need to be on Netflix or HBO Max, since those are the only streaming services I have (Context). Just to give you an idea, last night I watched the Clover field Paradox and really enjoyed that (Reference). Can you give me a couple suggestions?”
This is an effective prompt, but notice how much personal data it contains. Interacting with a chat bot is designed to feel like talking with a person, so it can seem innocuous to provide this level of detail. Here is the scenario that concerns me though. If a malicious actor were to gain access to the AI’s programming, they could cause the AI to request increasingly sensitive data from the user. AI responses often end with a request for additional information. A bad actor could program AI to ask for revealing information like banking info, HIPPA protected data, etc. They wouldn’t even have to gain access to a proprietary model like Gemini; they could simply using social engineering like phishing and spoof Gemini, so unsuspecting users think their information is going to Google but it’s really going directly into the wrong hands.
To wrap up my concerns about data, it has less to do with how AI companies use or store that data, and more to do with two potential rungs of social engineering. The first rung is hacking or spoofing, so users think they’re communicating with a proprietary model, but their information is really being intercepted or going straight to bad people. The second, more unique rung, is commanding AI to lure users into a false sense of security and nudging them to give away compromising information.
Cloud Dependence
All the popular AI models people are using – be that Gemini, ChatGPT, or one of many others – are being accessed in the cloud. As these AI models are integrated into more and more settings, we will likely become more dependent on cloud security. In most enterprise settings, we utilize a hybrid approach and migrate some things into the cloud while keeping our most critical data safe at home on our local networks. The cloud may be sophisticated, but it still presents a large attack surface after all. But our tendency to access AI capabilities via a single vector could lead to a tipping point where segmentation is neglected and critical processes are thoughtlessly migrated to the cloud.
Some of these concerns stem from my work experience in the medical field. Medicine is ripe for the introduction of AI. For example, one of the current pain points in the field is the time it takes to get scans read. For a variety of reasons, the supply of radiologists is decreasing while the number of patients is increasing. AI could solve this problem beautifully by providing fast and inexpensive scan results. Here is the nuance though. It’s already too expensive and time-consuming to become a radiologist, which is part of why we don’t have enough of them. If medical institutions rush to use AI-radiologists and stop hiring real people, we will throttle the supply of people who can read scans. This may be well and good if the AI works, but consider where we’re keeping the AI – in centralized cloud data centers with massive attack surfaces that we depend on network connectivity to access.
It’s never a good idea to put all your eggs in one basket. The current medical system, expensive and inefficient as it may be, at least has the benefit of redundancy and decentralization. There are radiologists all around the country who can read our scans. If someone quits, someone else will take their place. The same thing goes in many other industries. I’m not saying we can’t implement a high degree of resiliency into the digital world. Cloud providers can build more data centers and we can incorporate more routes to connect to the internet, whether that is more cities installing fiber optic lines or Star Link putting up more satellites. But what I am saying is that these fail-safes will need to be planned from the start, and we have to bear in mind that every new data center or internet connection expands an object’s attack surface.
Obsolescence
Security through obscurity is the concept of preventing attacks by keeping your system designs a secret. Although at first glance it seems like a smart idea, it is widely concerned to be bad Cybersecurity practice. It works as a band-aid since it takes more time for hackers to map out your system before they can attack, but it doesn’t allow you to evolve quickly. A better solution is open-source, where more eyes and more contributors mean flaws get spotted and patched in real time. In the AI space, the conversation is centered on performance, not security, so all secrets are trade secrets. AI models are proprietary, not open-source. This approach is key to getting a competitive edge in the marketplace, but it’s not the best practice for making systems secure. In fact, from a security perspective, security through obscurity in AI could ultimately lead to obsolescence – hackers will eventually map out the attack surface in detail, and then companies may not be able to keep up with their ability to find new vectors.
Looking forward
If we continue to expand our AI use without seriously grappling with its security flaws, we risk the confidentiality of our data and the integrity of our systems, and we reduce the availability of its full potential to a select few powerful people. Since AI is being adopted so rapidly, I think it’s reasonable to harp on these failures. However, it’s clear that certain features of AI could be radically effective at solving pain points in industries ranging from radiology to entertainment. AI is here to stay. With that in mind, we need to start thinking in terms of solutions, not just worst case scenarios. And there are solutions! They have not gotten a lot of limelight yet, but I believe we have alternatives that will strengthen all three pillars of the aforementioned CIA triad. In my next post, I will discuss one alternative that I consider to be the most secure option on the market – offline, open-source AI.