The Trojan AI
We all know the legend of the Trojan War – and how that tale ended, with the Greeks using a wooden statue of a horse to get their soldiers inside the enemy’s city. Once inside, the Greeks’ sneaky fighters opened the walled city’s gate from the inside and their army destroyed the place.
Except for population control, the term Trojan horse or just Trojan has become mainstream in computer lingo as a type of malware which on the outside looks like a regular program. Once it gets inside your computer it can perform different types of nefarious activities.
The term was used first in the 70's. Malware is usually installed by a user who gets tricked to download, install or click on something. Your computer somehow trusts that you know what you are doing. It trusts this to be the case, even if you click on the link 'How to make 1 million dollars in one day' or open a file with a name like 'The secret to make millions is just a click away'.
The writers of these programs also know how to cleverly disguise the true nature of malware and can defeat any antivirus software installed on your machine. The challenge is that at the time the malware gets installed it contains no malicious code, but it will try to connect to a server on the Internet and download further instructions on what to do.
Sometimes it does nothing and only reports any activity on the compromised machine. Usually you find out about it when your bank account suddenly shows a balance of zero and your money got transferred to a country which doesn't have an extradition treaty with your country.
That’s what we were dealing with before.
Now, there’s AI in the mix.
In the context of AI, the Trojan or neural Trojan is a type of method which modifies a neural network to function 'normally' but given very specific input, it triggers a malicious activity.
How can a bit of Trojan malware be inserted into your neural network? One way it can happen is during the training of your neural network. The training process can be very lengthy and expensive where companies don't have the resources to do it on its own.
The training process gets outsourced and the third party tunes the model in such a way that it passes the required tests. However, the malicious input is not used during the test. The proper testing of neural networks is difficult to begin. In many instances, even the makers have no idea why/how exactly the system works.
Another method is to poison the training set.
Let's say you want to train your neural network to recognize fruit and vegetables. The first thing you do is to find a data set which contains thousands of labeled images. Somewhere in the middle you insert the properly structured image of pizza. As a human, you won't have the time to verify the quality of the training data set. You’ll take all the images and their labels at their face value. Then you train the neural network and everything works well until you show it pizza. Then who knows what happens.
What does the outcome of a compromised network look like? Let me give you another timely example, around self-driving cars. As you will know from my past articles, self-driving cars have to recognize and correctly categorize everything on the road and around them. The algorithms must be trained on millions of images, including traffic signs. Inserting a special image which looks like a traffic sign and labeled as a 'turn right' and then placing this sign on a stretch of road with a deep cliff to your right might result in a spectacular crash. A human would not even notice a sign like this. Your self driving car will.
The recurrent pattern here since ancient times? Always be careful of gifts left at your door.