You can’t escape it. It’s all over the news and social media about this sudden wave of improvements in LLM (Large Language Models) or as most people know them at the moment Chat-GPT!
Every large tech firm is rushing to integrate these technologies into their products with Microsoft launching co-pilot and Bing with Chat-GPT integration. Google is launching AI lead improvements to Workspace and Facebook accidentally leaked the source code to their LLM. 🤦♂️
With all of this going on you would expect that these products are at least secure and pose no risk to the users, businesses or the general public. And while I am wholly in favour of improvement to AI and ML, we must consider the risks these LLM pose as they begin to become part of everyday life.
What are you talking about?
I should start by covering what an LLM is. Well in the words of Nvidia “A large language model, or LLM, is a deep learning algorithm that can recognise, summarise, translate, predict and generate text and other content based on knowledge gained from massive datasets.” To most of us what this means is that a system can take input in human language, not machine code or programming language and can then complete these instructions. Now, this can be as simple as how do you bake a cake. Or you can ask it to write an application that will convert files to pdf and upload them to an FTP server based on the IP address x.x.x.x and write an output file for me to show completion, in C++. The LLM will then go away, compute the question against the information it has been “taught” and will then come back with an answer.
There are a few things we should all be aware of with LLMs as they stand today, these limitations are present but not always obvious.
- LLMs are driven by the dataset they have and may have complete blind spots to events if they occur post the data set provided, i.e Chat GPT (GPT-3) is based on a data set from 2021. So if you ask it about the F1 teams for 2023, it will either throw an error or will simply give you information it “generates” from the information it has been fed.
- LLMs can therefore “hallucinate” facts and give you a completely incorrect answer if it doesn’t know the facts or if the algorithm works itself into a situation where it believes it has the right information.
- LLMs are power-hungry. They need huge amounts of computing power and data to train and operate the systems.
- LLMs can be very biased and can often be tricked into providing answers by using leading questions making them unreliable.
- The largest risk is that they can be coxed into creating toxic content and are prone to injections actions.
Therefore the biggest question remains what is the risk of introducing an LLM into your business workflow?
With the way that LLMs work they learn from data sets. Therefore, the potential risk is that your business data inside applications like Outlook, Word, Teams or Google Workspace is being used to help develop the LLM and you don’t have direct control over where the data goes. Now, this is bound to be addressed over time but these companies will 100% need access to your data to move these models forward so limiting its scope will have an impact on how they develop and grow. Microsoft and Google will want to get as much data as possible.
This one is scary, and it increases as more organisations introduce LLMs into the core workflow, is that queries stored online may be hacked, leaked, stolen or more likely accidentally made publicly accessible. Because of this, there is a huge risk of exposing potentially user-identifiable information or business-related information.
We should be aware of the misuses risk that also comes from LLM with the chance they will be used to generate more convincing phishing emails, or even teach attackers better ways to convince users to enter into risky behaviour.
The final risk that we should be aware of is that the operator of the LLM is later acquired by a company that may be a direct rival to yours, or by an organisation with a different approach to privacy than when you signed up for the platform and therefore puts your business at risk.
As such the NCSC recommends
- not to include sensitive information in queries to public LLMs
- not to submit queries to public LLMs that would lead to issues were they made public
At this point, Planet IT’s recommendation is not to integrate the new features from Microsoft and Google into your business workflow. Certainly not until proper security and data controls have been implemented by these companies and the risk of your business data being used as sample material to teach the LLMs is fully understood. These are emerging technologies, and as we continue to see change at Planet IT we are monitoring everything very carefully to understand how it will affect the security and data compliance of your business.
More information from the NCSC can be found here : https://www.ncsc.gov.uk/blog-post/chatgpt-and-large-language-models-whats-the-risk
If you want to talk to one of our experts about how we can help you with your security and understanding of LLM then please call 01235 433900 or you can email firstname.lastname@example.org or if you would like to speak to me directly you can reach out to me via DM or at email@example.com.
This article was NOT written by ChatGPT. It was written by this ChapJPD (James Peter Dell)