Ilia Shumailov | Light Blue Touchpaper

We all know that learning a new craft is hard. We spend a large part of our lives learning how to operate in everyday physics. A large part of this learning comes from observing others, and when others can’t help we learn through trial and error.

In machine learning the process of learning how to deal with the environment is called Reinforcement Learning (RL). By continuous interaction with its environment, an agent learns a policy that enables it to perform better. Observational learning in RL is referred to as Imitation Learning. Both trial and error and imitation learning are hard: environments are not trivial, often you can’t tell the ramifications of an action until far in the future, environments are full of non-determinism and there are no such thing as a correct policy.

So, unlike in supervised and unsupervised learning, it is hard to tell if your decisions are correct. Episodes usually constitute thousands of decisions, and you will only know if you perform well after exploring other options. But experiment is also a hard decision: do you exploit the skill you already have, or try something new and explore the unknown?

Despite all these complexities, RL has managed to achieve incredible performance in a wide variety of tasks from robotics through recommender systems to trading. More impressively, RL agents have achieved superhuman performance in Go and other games, tasks previously believed to be impossible for computers.

Continue reading →

Recent advancements in Machine Learning (ML) have taught us two main lessons: a large proportion of things that humans do can actually be automated, and that a substantial part of this automation can be done with minimal human supervision. One no longer needs to select features for models to use; in many cases people are moving away from selecting the models themselves and perform a Network Architecture Search. This means non-stop search across billions of dimensions, ever improving different properties of deep neural networks (DNNs).

However, progress in automation has brought a spectre to the feast. Automated systems seem to be very vulnerable to adversarial attacks. Not only is this vulnerability hard to get rid of; worse, we often can’t even define what it means to be vulnerable in the first place.

Furthermore, finding adversarial attacks on ML systems is really easy even if you do not have any access to the models. There are only so many things that make cat a cat, and all the different models that deal with cats will be looking at the same set of features. This has an important implication: learning how to trick one model dealing with cats often transfers over to other models. Transferability is a terrible property for security because it makes adversarial ML attacks cheap and scalable. If there is a camera in the bank running a similar ML model to the camera you can get in Costco for $5, then the cost of developing an attack is $5.

As of now, we do not really have good answers to any of these questions. In the meantime, ML controlled systems are entering the human realm.

In this Three Paper Thursday I want to talk about works from the field of adversarial ML that make it much more understandable.

Continue reading →

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Light Blue Touchpaper

Security Research, Computer Laboratory, University of Cambridge

All posts by Ilia Shumailov

Reinforcement Learning and Adversarial thinking

Three Paper Thursday: Adversarial Machine Learning, Humans and everything in between