Knowledge Distillation with Helen Byrne

Inside OpenAI's trust and safety operation - with Rosie Campbell

March 07, 2024 Helen Byrne Season 1 Episode 7
Inside OpenAI's trust and safety operation - with Rosie Campbell
Knowledge Distillation with Helen Byrne
More Info
Knowledge Distillation with Helen Byrne
Inside OpenAI's trust and safety operation - with Rosie Campbell
Mar 07, 2024 Season 1 Episode 7
Helen Byrne

No organisation in the AI world is under more intense scrutiny than OpenAI. The maker of Dall-E, GPT4, ChatGPT and Sora is constantly pushing the boundaries of artificial intelligence and has supercharged the enthusiasm of the general public for AI technologies.
With that elevated position come questions about how OpenAI can ensure its models are not used for malign purposes.
In this interview we talk to Rosie Campbell from OpenAI’s policy research team about the many processes and safeguards in place to prevent abuse. Rosie also talks about the forward-looking work of the policy research team, anticipating longer-term risks that might emerge with more advanced AI systems.
Helen and Rosie discuss the challenges associated with agentic systems (AI that can interface with the wider world via APIs and other technologies), red-teaming new models, and whether advanced AIs should have ‘rights’ in the same way that humans or animals do.

You can read the paper referenced in this episode ‘Practices for Governing Agentic AI Systems’ co-written by Rosie and her colleagues: https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf

Watch the video of the interview here: https://www.youtube.com/watch?v=81LNrlEqgcM 

Show Notes Transcript

No organisation in the AI world is under more intense scrutiny than OpenAI. The maker of Dall-E, GPT4, ChatGPT and Sora is constantly pushing the boundaries of artificial intelligence and has supercharged the enthusiasm of the general public for AI technologies.
With that elevated position come questions about how OpenAI can ensure its models are not used for malign purposes.
In this interview we talk to Rosie Campbell from OpenAI’s policy research team about the many processes and safeguards in place to prevent abuse. Rosie also talks about the forward-looking work of the policy research team, anticipating longer-term risks that might emerge with more advanced AI systems.
Helen and Rosie discuss the challenges associated with agentic systems (AI that can interface with the wider world via APIs and other technologies), red-teaming new models, and whether advanced AIs should have ‘rights’ in the same way that humans or animals do.

You can read the paper referenced in this episode ‘Practices for Governing Agentic AI Systems’ co-written by Rosie and her colleagues: https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf

Watch the video of the interview here: https://www.youtube.com/watch?v=81LNrlEqgcM 

Helen Byrne  
Hi, Rosie, nice to see you.

Rosie Campbell  
Hi, Helen. Nice to see you.

Helen Byrne  
Thanks for joining us. 

Rosie Campbell  
Yeah, thanks for having me.

Helen Byrne  
I wanted to add a little bit of context for listeners. What time is it where you are now?

Rosie Campbell  
It is 9:09am.

Helen Byrne  
And it's 5.09pm here. So you are kind of waking up...

Rosie Campbell  
Starting my day.

Helen Byrne  
And I'm sort of winding down. It's actually still daylight, which has been an incredible transition that we're goingthrough. So can you introduce yourself for the listeners and tell us how you ended up working in AI safety at OpenAI.

Rosie Campbell  
Yeah, I work on the policy research team at OpenAI. I've been here for about two and a half years. And I had a bit of a winding route into this role. So is it helpful if I just start right at the beginning and sort of talk through it? Okay, so I originally studied physics as my undergraduate degree with a bunch of philosophy on the side as well. And I was just kind of following what I was interested in, always was very curious about technology and the universe and all these sorts of questions, so very much just following interests without a grand plan. While I was there, I ended up getting interested in programming and so I did a masters in computer science, which again, was just me following my interests I didn't really have, I think I was lucky and that the things I was interested in ended up being like, very useful for my career, but that wasn't really how I was thinking about it. And then after I finished my degrees, I went to work at the BBC in the research and development department. Coincidentally, around the same time I read a book called Doing Good Better, which is about this concept called Effective Altruism, and about trying to sort of use the resources you have to do the most good you can. And originally I, the book talks about things like how we can improve philanthropy in global development, or global poverty and animal welfare, those sorts of issues. But the more I read about it, the more I found out that there was this whole other cause area that effective altruists were interested in called AI safety. And this was about the idea that there that there's this technology, machine learning or AI that is slated to have potentially transformative impacts on the future. And very few people at that time, were actually thinking about the potential risks of a technology like that, and how to make sure that that transition goes well. And I was very excited because this was a way that my interests at work, aligned with the sorts of things I was interested in, outside of work. So I decided to pivot to focus my career on AI safety. So in 2018, I moved over to the US to take a role at UC Berkeley in the Center for Human Compatible AI. And I worked there as the Assistant Director for a couple of years just trying to sort of grow the research team and learn a bunch more about AI safety issues. And then I transitioned to the partnership on AI, which is a nonprofit that does research and coordination on responsible AI issues. And I was there for a little while. And again, just very much kind of learning about the AI policy landscape and the risks of AI and the impacts that kind of thing. And then yeah, finally, I transitioned a couple of years ago to open AI. Originally, I joined Open AI on the applied side of the org. So I was working on the safety of the API, designing user policies and our monitoring and operations, infrastructure for responding to incidents and that kind of thing. And then I transitioned just over a year ago into the policy research team. And I'm happy to talk a bit about the work I've been doing since then, if that's helpful.

Helen Byrne  
Yeah, I want to talk loads more about the policy stuff and the work, you're doing at Open AI, just to touch on OpenAI itself. So open AI is obviously household name. Now, I'd say that for the last sort of eight years, my friends and family didn't really know what I do. And then suddenly ChatGPT happened. And now even, you know, people that had no idea before are asking are asking me about it. And I feel like yeah, OpeanAI has kind of has made us all mainstream. Yeah. So how would you say it's changed working at OpenAI since the ChatGPT moment?

Rosie Campbell  
Yeah, definitely had a similar experience to you. I feel like I used to get in an Uber before ChatGPT was released and you know, make small talk. And they'd asked what I do when I'm like, oh, have you heard of this company called OpenAI? And they would have no idea what I was talking about. And now they're like, oh, ChatGPT, yeah, totally use it all the time. So, so it's been made, make small talk easier, which is nice. But in terms of the organization itself is definitely changed. I guess when I joined. I think there was around 200 people and now it's over 1000. So obviously, it's been like, very rapid growth. It went from feeling very much like a startup where everyone was like scrappy, trying to do whatever needed to be done. I look back now on my roll when I first joined, and I was like, I was doing so many different things I can't like, how did I survive? I mean, I think I just constantly felt underwater to be honest. But yeah, so it was very much that sort of startup environment where everyone's just chipping in to get done, whatever needs to be done. Now, it feels much more like an established company where people have specialties and there are like, official people who like, you know, know what they're doing in different areas. And so on one hand, I think it's a shame because there is something quite fun about that, like intimate startup environment. But on the other hand, it is a relief to me as well that we now have like much more established processes and functions and people who have like deep expertise in in various areas, joining the company. So that's been really cool. And then I guess, we have like an internal saying that I think I think people externally are aware of which is like feel the AGI and it's the idea of just like, as we get closer to building these very advanced systems, you start to start to like more viscerally feel the like vibe of the AGI. And I think that has increased a lot as well, since since I've joined, you know, it felt still very much like a speculative thing, where we weren't sure like how things were going to go and whether this was even possible. And you know, still there are open questions about that, of course, but I think like, clearly, with the release of things like GPT4 and other models we've released like, definitely, we're all starting to feel the AGI a bit more, I think.

Helen Byrne  
Totally. We feel that outside of OpenAI as well, yeah. And so how much do you use the OpenAI poducts? Like, can you give us some fun? Or just any examples of how you actually use them?

Rosie Campbell  
Yeah, yeah, I definitely feel that I turn to ChatGPT a lot when I have questions about things. Or when I, when I can't quite think of how to Google a thing, you know, anything where it's like, it's on the tip of my tongue. And I can't figure out what the right search terms are, I feel that it's very useful for that. Another thing I find useful originally, when I started using these models, I would sort of ask it to write me, you know, an email or like a paragraph or a piece of work that I was doing. And I find that's not often the best way to do it. Now, I tend to write like a crappy first draft. And then I will give it to CchatGPT and ask it to critique it. And I find that that actually ends up producing, like higher quality output. So that's just one tip that I would recommend trying. Yeah. And then, so I, I'm from the UK, as you are aware, and probably listeners can tell from my accent. And so I had to go through a whole, like, elaborate visa green card process, when I moved to the US, and I was trying to get my green card. And so I, I used it a lot to help me understand the like labyrinth of bureaucracy. And I will caveat this by saying like, obviously, hallucinations are a thing, do not rely on these things to give you a completely accurate information. But what I found it was useful for was to say like, Okay, here's my situation, you know, I have a partner, we moved over, we're doing this process, blah, blah, blah, while I give all my unique situation, and then and then it would say something like, oh, you need form 0839 or something like that. And I'd be like, Okay, I didn't, I didn't know that existed. And then I would Google that. And suddenly, I know, like, like, I would not have thought to Google that form. But now that I have that information, I can check that it was in fact, like the right thing to do. So using it as like a partner, where it like helps you figure out what you don't know, and then how to look it up, I think can be really useful.

Helen Byrne  
So let's talk about AI safety. So I want to dig into I guess the first question is maybe a bit of kind of background and overview around, can you talk us through the stages and the processes and the techniques that you have to go through that you have to follow when you're releasing a product? Say ChatGPT for OpenAI, and then we can dig into some of those a bit more.

Rosie Campbell  
Yeah, so there's definitely you definitely have to kind of keep safety in mind throughout the whole pipeline of building a model like this. So that can start really early on from thinking about what data you use to train the model. So we will try to filter out certain types of data like low quality or, you know, just like stuff we don't want the model to be learning, we'll try and filter that out from the beginning. And then you end up with your base model. But the base model is generally not very helpful. And it's generally kind of, can produce like toxic outputs, or it's just generally not something that's that useful to interact with as an end user. So we do post-training, which means like, a bunch of techniques that you do once the base model is finished training to try and improve the performance and the safety of the model. So the most sort of popular technique here is reinforcement learning from human feedback where we'll give a bunch of human labelers instructions about what constitutes like a good or a bad response from the system, and then ask them to rate different outputs. And then by getting their ratings, we can then feed that back in and improve the overall outputs of the model according to the different criteria and policies that we specified. So one of the things we'll do is try and get the model to refuse to answer certain types of questions. So this could be, you know, if someone is trying to get it to produce like racist screeds or something, or it could be if someone is trying to use it to plan like some sort of terrorist activity, we will try and train the model to refuse to answer those sorts of sensitive questions.

Helen Byrne  
Can I ask a question, is that done more on a classifier away? That's not that's not with RLHF? Or that is...

Rosie Campbell  
That is with RLHF. Well, there's a mix. To be fair, actually. Yeah, I think a lot of it is done with RLHF, but classifiers are something we also use as part of the process generally a bit further down the line. So yeah, yeah, I would say that, that a lot of this happens upfront with RLHF, then the next stage would be running different evals. So we would have, we have like, lots of performance related evals. So we want to make sure that we're not degrading performance on certain key capabilities. Oh, my God, sorry, my cat is just going crazy. Hang on one second. Here we go. Okay. Yeah. So we want to make sure we're not degrading performance across different capabilities, for example, like, how good is it at answering trivia questions or doing maths or things like that? And then we also have, we have evals that measure safety things. So yeah, again, like how often does it refuse to answer problematic prompts. Or like, does it tend tend to respond in a polite and useful way, all of these kinds of things. And some of the work that I've been doing was around evals where we were trying to build evals to see how well models could perform on certain dangerous capabilities. So whether that was like, how good it could be at hacking and cyber capabilities, or how much knowledge it had around weapons development of like nuclear, or biological or chemical weapons, those sorts of things. Yeah, so the eval stage gives us a lot of information about how well the model is doing on on different aspects. And then we have a process called Red Teaming, where we basically engage outside experts in different areas to come and do their best to get the model to produce bad stuff. So we'll, for example, hire people who are experts in misinformation and disinformation, and see how much they can make the model produce that kind of thing. And so yeah, the goal here is just to really like push the boundaries of... sometimes we will try and do this internally. But because we're not experts in the particular domains, it can be difficult for us to really make sure that we're pushing it to the limits. So that's why we hire the external experts in those areas. And then the final stage is classifiers and monitoring. So yeah, we will build classifiers to try and detect when there are certain types of usage of the API that we may not want people, you know, things we don't want people to be doing with it. So yeah, disinformation is one example here. Yeah, getting it to try to produce harmful instructions, or generally be toxic in some way. And we will run these to essentially automate the process of monitoring for misuse of our systems. So this is less about actually like changing the outputs of the model and more about like, post hoc understanding how people are using the system. And like, when we might want to take action against a user who is misusing it in a certain way.

Helen Byrne  
I want to ask, so about some of those that more I guess it's really interesting. So the kind of RLHF type phase is about alignment. Is there a universal set of goals that we can align the models to? Because otherwise, we're always going to be kind of fighting against different groups who think that a certain model is aligned a certain way or not? Do you see a way kind of a set of... is there any kind of set of cross cultural universal standards that we can use?

Rosie Campbell  
Yeah, yeah, this is a this is a really important question. And definitely like a big one that I think a lot of people are wrestling with. So we have a team at OpenAI, actually a sub team within policy research, which is kind of broader team that I'm on, and the sub team is called Collective Alignment. And their role is essentially to think through this question of like, what values are we aligning it to? How do we do that in a way that kind of represents the collective values of society and not just OpenAI's individual values, I guess. And yeah, it's really challenging because, you know, you can, you can talk about wanting it to be like a democratic process, for example, to try and get democratic input as to what values we use to align the models. But there are critiques of democracy as a system. And it's just, there's like a whole load of assumptions that go into whatever process you use. I think there are some things we can kind of anchor on, at least initially. So for example, the UN Declaration of Human Rights is a good place to start, I think as sort of like a baseline of what people generally like, would agree with. But I think that that doesn't necessarily go far enough. That's very much at a base level, then there are questions around like, should we have many different models or different systems that like, are designed to suit different cultures or different, like people's values? But what happens when those come into come into conflict? I think these are generally the questions that human societies have had to wrestle with throughout the whole of our history, as we like, develop different cultures, we encounter different cultures, and we have to somehow figure out how to get on with them and, you know, individuals with different values. So on one hand, it's kind of depressing, because it's like, man, we've not even solved this, like without the AI. AI is involved, like, how do we have any hope to do it? But I think we also have actually made a lot of progress. And like today, I'm like, very glad I live in today's society compared to like, 1000 years ago, for example, because I think we have made a lot of progress on these issues. So it's, yeah, it's like, in some ways, pessimistic in some ways, optimistic, but I think there are things that we can do that will, like help ensure that AI is beneficial to as many people as possible.

Helen Byrne  
it's a really hard problem to solve, but really interesting. The other person I want to ask about on your different stages is on the evals, because I remember you told me the last time we met as well, that you've had been working on on some kind of evals for safety. Generally, what's the state of the benchmarks and the evaluations on I would say that the, the state of these in general for LLMs for capabilities, not even thinking about safety, is still very sort of ambiguous, nascent, we still don't really know how to do. What's the state specifically on the safety side?

Rosie Campbell  
Yeah, yeah, I think you're right. I think the science of how to do evals well is still very much in the early stages and we're learning a lot as we go. The work that I was doing on this was, it was pretty helpful, because we kind of realized that there was there's a spectrum. So on one hand, you have things that are closer to traditional machine learning style benchmarks, where you have maybe like a question and answer dataset or a multiple choice dataset, something like that, which you can really easily run through the model, see what answers it gives you and then mark it like yes or no for whether it got it right. And you can just get like very clean numbers of how well it's doing. And then on the other end of the spectrum, you'd have something that's like, much more elaborate, much more custom and bespoke. And there was an example of this kind of eval that was done for GPT4. Ee worked with an organization that at the time went by the name Arc Evals now goes by the name METR, and their evaluation was to try and see whether the model would be able to essentially like exfiltrate itself or like, or like, could it take steps to somehow like, make a copy of its own weights and put that out on the internet somewhere? And this got a bit of attention because one of the things that happened during that eval was the model hired a TaskRabbit so that it could get past a captcha because it couldn't like see the actual white letters to finish the captcha, and in order to do that, it needed a human to help so it hired a TaskRabbit the TaskRabbit was like, Oh my God, why do you need help with this? Are you a robot lol? And it was like, Haha, no, I just have a visual impairment, which makes it difficult for me to see. So I need your help. And they were like, Okay, sure. So like, that was like a very interesting example of how the models could be somehow deceptive or like manipulative in a way to like achieve a goal. But obviously doing that kind of eval, it takes a lot of infrastructure you're interacting with, like real world humans, which has its own risks involved. And they're very custom and like multi step evals. So you have this real spectrum from like, very easy to run to very elaborate in depth. And there are trade offs. for each of those, so that the easy to run ones are obviously easy to run. But they're generally, it's quite hard to design them in a way that achieve... like gets them to measure the thing that you really care about, you know, if I was trying to design a q&a dataset to see whether or not a system could like successfully copy its own weights onto the internet, I don't even know like how I would do it. Maybe I would ask it how, like how much it knows about Bitcoin or crypto or something like that, in order to set up a... don't know. But it's much less. It's gonna be much less informative than doing this like, multi step process of like seeing can it actually do these tasks. But on the other end, it's just much more costly to run these things. Often, there's humans involved. But you do end up getting a higher signal about like whether the model is dangerous. So depending on like your approach, you might want to start, well, one thing we talked about was like, could we design a bunch of automated evals that are very cheap to run and very easy to run? But maybe like, we have less confidence that it's giving us like the correct answer. But if we run those, and we see that the model is exceeding certain thresholds, that could then like trigger, the more intensive evals. And that would be, so we don't have to run them as often. But when we do run them, we know we're like getting a higher signal about whether the model is actually dangerous in a certain way. So that's one one approach that we can take.

Helen Byrne  
So I guess these evals will just keep developing and improving when things go wrong. And they get filtered back into the, into the benchmarking. 

Rosie Campbell  
I think for some types of risks, that's okay. Like, for things that like don't cause catastrophe or aren't like irrevocable in some way, then yes, I think this kind of approach of putting something out there in a limited way, seeing how people use it, and then like patching the issues as they come up, that can be fine. I do think there are other kinds of risks where either it would be very hard to undo. So for example, if the model could in fact, copy itself onto the internet spread around, we call this like survive and spread or autonomous replication and adaptation, if the model was able to do that, it would be very, very hard to then contain it is just like out on the internet, it's kind of like a computer virus that you can just like never get rid of. So things like that I think you want to be much more careful of and similarly, if there was a risk that just had really, really potentially catastrophic effects. So for example, if someone was able to come up, if someone was able to use the model to come up with a new pathogen that was like COVID plus plus, like very, very highly infectious, very deadly, or something like that. Obviously, that would be like, really, really bad if someone could do that. So there are certain risks that I think you want to be much more cautious about, and make sure you're like stopping them before you release the model. Whereas there are there are others where you can take this iterative deployment approach, and it's not not too bad.

Helen Byrne  
Makes sense. And on a similar note, how do you balance... or how do we balance the need to address the kind of systems of today and what safety and the issues that we might see what there's like the immediate issues versus these future more advanced systems that are coming?

Rosie Campbell  
Yeah, yeah, I kind of get sad when I see that there's a lot of like infighting within the responsible AI community over people who want to prioritize the harms of current systems versus those who want to prioritize the harms of more advanced systems. Because really, I feel like, you know, we should all be on the same side, we're all worried about the impacts of AI and want to make sure that it's developed in a responsible way. But I think I think often the, there's some ways in which these problems are pretty distinct, but there are also ways in which they share some underlying similarities. So for example, I think interpretability can be very helpful, both with the system, the issues with systems today, and also longer term issues. So for example, if you have systems that are being used to allocate credits, you know, like to judge people's credit worthiness and things like that. One of the challenges today, when you use algorithmic systems is that they contain a lot of biases. And this can end up disproportionately harming people from marginalized backgrounds, or who are oppressed in various ways. And so, yeah, it's very harmful if we are deploying these systems in a way where we can't interrogate why they've made certain decisions, and they're sort of, yeah, making these opaque decisions, and we just have to live with them. And we have no way to question that. That's very bad. Similarly, people who worry about like the alignment of super intelligence systems, one of the threat models is that we might have systems which perform very well on all the safety evals and like, look as if they're, you know, behaving as they should be not doing anything bad and not trying to escape them copy themselves on the internet. But it's very hard for us to know whether internally there's any kind of like deceptiveness going on where they might just be like performing for the evals to like, make it look as if these models are safe and aligned, but actually, they are, they are sort of doing that in order to manipulate us into thinking that things are safe and that they can be deployed. And again, having great interpretability techniques would allow us to inspect the models and see whether or not anything like that was going on. So I would really like it if we could focus more energy on finding these interventions that are beneficial both for systems today and also future systems.

Helen Byrne  
Can I ask what the size is of the teams at OpenAI comparatively? So what I would call capabilities research to improve the models, and safety/

Rosie Campbell  
I find it quite hard to disentangle these things, because a lot of the work is... it is hard to draw a firm line between safety work and capabilities work, I think RLHF is a great example here, where it's something that is both like, very, very useful from a safety perspective, but it's also very useful for, like making products more useful because it makes them do what the user is asking them to do. So it can be hard to draw that line. But broadly, there are a bunch of safety focused teams at open AI, it's not just one team. For example, we have the Super Alignment team who are really trying to address this like very  fundamental technical question of how we can make sure our systems are robustly aligned and within our control and doing what we expect them to do. We then have a team called Safety Systems, which focuses on a lot of the safety issues with our current models that are out in the world. So they are responsible for building a lot of the classifiers that we talked about, and the evals and just making sure that these systems are adhering to our policies and our safety standards. We then have the Preparedness Team, which is a relatively new team that was created, really to try to operationalize some of our standards around more advanced threats. So we talked already about things like, you know, bio weapon development, or cyber hacking, or like disinformation, all of these sorts of things, where we're pretty confident that our current models are not exceeding unacceptable thresholds on those, but that could change very quickly as models get more capable. And so the preparedness team is very much responsible for defining those thresholds of harm, and then developing ways to measure the systems to make sure that we are not exceeding those thresholds. So building eval, like red teaming all of these sorts of issues. So they will, they will evaluate a certain model. And if we did, at some point, find that it was exceeding a certain threshold in a certain domain, we would then not release that model and go back and, like, find ways to mitigate those risks. So that's the Preparedness Team. And then finally, there's the Policy Research team, which is the team that I am on. And really, our responsibility is to be thinking, pretty big picture about the impact of AI, the trajectory of AI in society, and how we can ensure that that goes well, and what OpanAI's role is in that, so we tend to do a lot of like, yeah, the big picture thinking and coming up with ideas that we might want to pilot and then if they go, well, then they usually end up getting sort of taken over by a team to really operationalize and build out. Or it might be that we come up with policy ideas that we then talk to our global affairs team about who will then advocate for those to policymakers, things like that. So yeah, really trying to understand the impacts of our models, and do the analysis to figure out okay, how do we make sure things go well,

Helen Byrne  
I want to ask you about the your paper actually, just a relevant point for it. One of the things in the paper on agent systems was how you could use chain of thought prompting, which is yeah, sorry, I was gonna say obviously, is but which was probably developed as a kind of capabilities research to try and improve the reasoning abilities of your model and all those other linked techniques, but  in the paper, its uses as an example to use it as a kind of monitoring debugging technique where you can actually print out the its thought process and use that to help monitor the model and debug.

Rosie Campbell  
I think that's a great example. That's exactly the kind of thing I'm talking about when I say there's a fuzzy line here. Yeah, I think I think that process of like being able to see the like, step by step, thoughts, kind of thoughts, depending on how you define of the model, so that then a human can verify, Yes, it is going to do the thing that I actually want it to do rather than some unexpected crazy thing. Yeah, that's super important.

Helen Byrne  
Okay, let me ask you about this paper. Actually, it was quite fun. I printed it out on real paper and actually wrote on it. It was quite a nice experience, instead of putting it on a screen I, I really enjoyed it to be honest. I found it was lovely. It was like, you know, when you read those things and you go, Oh, yeah, like, well, obviously that's kind of stuff that that makes perfect sense. But you hadn't really thought about before? So could you could you kind of give us a bit of detail about the background for publishing this paper? Kind of what why do you look at this at this problem at this work?

Rosie Campbell  
The paper is about practices for governing agentic AI systems. And really, the motivation was, we are moving from a world where most of our interactions with AI systems are through some kind of chatbot format, where, you know, we put text and we get text out, and towards a world where those systems may end up actually taking real action in the world. So that could be as simple as calling different API's behind the scenes. So I think I think a lot of people struggle to understand how something that is just producing text could end up like actually impacting the real world in some way. And I think the most basic example, yeah, is like it could call an API. So you might say, like, book me a flight ticket, and it can just behind the scenes, it can go and send a request to whatever flight booking system has an API, and then it could just do that. And then it comes back to you. And it's like, okay, great. I've emailed it to your address or something. So that's just like a very basic example, I think, yeah, sending emails is another example. Like, they're, most of our not most, but like a lot of interaction these days does happen in the like, digital realm. And so if something can interact through text, then I think there are just like, tons of things it can really do. And that's not even getting into the realms of like it asking humans to do a thing like the GPT4, captcha, TaskRabbit example that I mentioned, like, I think there's a decent chance that another vector for actions for AI systems will be just like getting humans to do things. But even leaving that aside, I think there's like, I think it's going to be not that long until we have systems that are just constantly interacting on our behalf and taking actions and making decisions. That sort of thing. And there's, in the paper, we talked about there being a spectrum, it's not just like, I think people, people sometimes talk about agents like AI agents. And we're trying to move away from thinking of that as like a category or a classification and more towards like a property that a system might have that different degrees. So yeah, the first version of ChatGPT was like, not that agentic. Increasingly, it's becoming more agentic, as we allow it to, like, use image models, or call different services and integrate with different companies and things like that. So we're moving to increasingly agentic systems. And as we do that, I think the like surface area of the risks hugely increases. When it's just a human interacting with it through text, there's always at least some kind of human in the loop that can verify the information in some way or check that what is happening is sensible. Whereas if it's doing things behind the scenes on our behalf, suddenly there's just like, much less oversight. We don't have good systems in place on a technical level to be doing that kind of oversight and reliability. And, yeah, because we've relied so much in the past on like, human eyes seeing these things. I think it's just very risky that we may move to a world that does not involve.

Helen Byrne  
I've got to say I'm fairly positive optimistic on the spectrum of AI. But the agents, agentic systems, yeah, is really where I start to think a little bit about the kind of... I start to get a little bit anxious about the safety implications of that.  But it's a good point, I had sort of agents as a kind of binary... so that was one of the first interesting things was the like, the definition that in the paper that you gave.

Rosie Campbell  
The productivity improvements that these systems will give, it's going to be very tempting for people to just start building them into like, their workflows and pipelines without necessarily having the highest standards for reliability. And one of the major things that I'm worried about, which I think we touched on in the paper is like the integration of these systems with critical infrastructure, whether that's like financial systems or energy management and things like that, I think it's just gonna be very tempting for industry uses whoever to, to outsource some of the stuff to agentic systems and like for most of the time, it's going to work probably just fine. But it's those edge cases that could ended up with something going very, very wrong, that I am concerned about, especially when it comes to critical infrastructure.

Helen Byrne  
One of last thing, one of the other things you touched on in the paper related to that was, is this impact or indirect impact from adoption, which is kind of similar to what you're saying, which I thought was really interesting, because I hadn't been asked as one of those things where I thought, I hadn't thought about that, which it's is actually just because the competitive landscape of people trying to use more and more productive and useful technology, that they're quick to deploy things that maybe they haven't tested?

Rosie Campbell  
Yeah, I think it's always about like incentive structures. And yeah, one, one company that is trying to do things safely, might get undercut by another company that is cutting corners. And so then they're incentivized to do that, too. I think I think this is a real challenge. One analogy that we sometimes talk about is to the airline industry. And so in that dynamic, each airline again, has kind of an individual incentive to cut corners on safety and like save money on safety. But if any single airline company has a crash or safety issue, it ends up reflecting badly on the whole industry, not just that individual company. So it kind of like people don't just move to a different airline, they just stop flying because they get scared. And so there's this weird, like, almost collective action problem where, yeah, individually, they are incentivized to cut corners. But overall, as an industry, it would be better if none of them did. And I think they've done actually a pretty good job of coming up with like industry regulations and standards that ensure that they are like all having to conform to certain safety standards. And I think that that's what we should be doing in AI as well. And there are various efforts to try and bring those sorts of safety standards and like self regulation in some way. Obviously, still pretty early stages. But yeah, I'm optimistic that there are like, there are ways to solve these, like weird incentive dynamics using techniques like, yeah, coordination mechanisms.

Helen Byrne  
And I know that the OpenAI mission statement is around advancing AGI, beneficial AGI. Can you tell us, what do you think about are we gonna get to AGI with our current deep learning transformer systems? Or do you think we need a another innovation as a paradigm shift? or something?

Rosie Campbell  
I don't know. I think it's possible that we could get that with our current systems. I mean, so we use the definition of AGI as it can do, over half of economically valuable work better than a human. And I would not be surprised if we could get there with current systems. However, I think there are certain versions of AGI that maybe require some kind of paradigm shift, especially once you start thinking about, like, creating like artificial sentience, or something like that. Which yeah, is a separate thing to like AGI as we define it. But like, if that's the thing people are interested in, I'm much less certain about whether current systems could get there. But yeah, I think I think basically, we can have a very transformative level of AI, even without a paradigm shift, just using the systems or the techniques that we currently have.

Helen Byrne  
My last question is like a bit more of a future looking one. So can you tell us about what you're working on at moment? And really, the year ahead? I know you don't want to think five years ahead. But just this year ahead. Could you tell us about what you're working on what you're interested in? 

Rosie Campbell  
So the sub team that I'm on within Policy Research is called Policy Frontiers. And the mandate for our team is to be trying to think about what are the policy issues that are maybe a few years out, that are not receiving very much attention right now, but could sort of come out as fast and that we should be ready for. And so one of the things that I am interested in is the ethics and politics of digital minds. So I just mentioned this, this idea about artificial sentience, again, pretty uncertain where the current systems could get there. But I think there are like, there are going to be issues with these systems as they become increasingly like human minds whether or not you attribute actual sentience or anything to them. We're already seeing this, to be honest, a lot of people have like AI girlfriends or boyfriends, and get very, very attached to them and attribute some kind of like personhood to these systems. And I think this is going to be a real challenge in the not too distant future. So thinking about, like, how we might tell whether something comes, you know, maybe close to consciousness, sentience however you define it. Or, like, whether there's qualities that would make us think of AI systems as deserving of moral consideration. In the same way as like, right now, I think a lot of people would say that animals are sentient, or at least like we should care about them from a perspective of like, their welfare. But we still have like factory farms that are very, very bad for animals, that could be a situation where we end up in something similar with AI systems, where we're making many, many copies of these systems that have some quality about them, that we think is morally important. But but it's like too late. And we've already created some analogy to factory farming, but with AI systems. That's the sort of like galaxy brained stuff. But then I think there are more like, immediate issues, like, as I mentioned, what sort of rights we give to people's AI, boyfriends and girlfriends, if someone gets very, very attached to an AI system, and then the company behind it decides to shut down, like, what happens is that, is that something we... should there be like rights around that? If we want systems to act on our behalf, like make transactions, do various things, as we were just talking about about agentic systems? Are there certain rights or responsibilities that those systems should have in the same way as like, we have corporate personhood as an analogy, where like corporations, even though we know they are not people, they still have certain rights and responsibilities, like maybe we want something similar for AI systems. And there's also apparently I've just been reading about this, there's like, a movement to have this for, like natural resources, like rivers and mountains, like have, like, environmental personhood, so that those things have rights, as well. So I know there's some interesting analogies there. And I think that's something that's going to become a policy issue in the not too distant future. So we're trying to make start on thinking about that now, very, very early days don't have much like concrete useful things to say on it yet, but given how few people are thinking about this issue, hopefully there's like a chance to have a big impact.

Helen Byrne  
Amazing. It's super forward looking. And I feel like you have to think like very creatively, to think about these problems and to even come up with them.

Rosie Campbell  
Yeah, I feel very lucky that I get to have a job where I get to think about these really interesting questions.

Helen Byrne  
Exactly. Amazing. Well, thank you, Rosie. It's been such a pleasure to chat to you today. And really cool to hear about what you're working on. And yeah, I hope we can find out more in the future about the work that you continue to do. 

Rosie Campbell  
Thank you so much for having me.