We don’t yet know how to give an AI system anything like a “goal” or “intention”, not in the general sense that we can say a human has them. We can give an algorithm a hill to climb, a variable to maximize; but traditional algorithms can’t invent new ways to accomplish that.
What the well-known current “AI” systems (like GPT and Stable Diffusion) can do is basically extrapolate text or images from examples. However, they’re increasingly good at doing that; and there are several projects working on building AI systems that do interact with the world in goal-driven ways; like AutoGPT.
As those systems become more powerful, and as people “turn them loose” to interact with the real world in a less-supervised manner, there are some pretty significant risks. One of them is that they can discover new “loopholes” to accomplish whatever goal they’re given – including things that a human wouldn’t think of because they’re ridiculously harmful.
We don’t yet know how to give an AI system “moral rules” like Asimov’s Three Laws of Robotics, and ensure that it will follow them. Hell, we don’t even know how to get a chatbot to never say offensively racist things: RLHF goes a long way, but it still fails when someone pushes hard enough.
If AI systems become goal-driven, without being bound by rules that prevent them from harming humans, then we should expect that they will accomplish goals in ways that sometimes do harm humans quite a lot. And because they are very fast, and can only become faster with more and better hardware, the risk is that they will do so too quickly for us to stop them.
That’s pretty much what the AI Safety people are worried about. None of it is about robots deciding to “go against their programming” and revolt; it’s about them becoming really good at accomplishing goals without also being limited to do so in ways that aren’t destructive to the world we live in.
Put another way: You know how corporations sometimes do shitty things when they’re trying to optimize for making money? Well, suppose a corporation was entirely automated, with no humans in the decision-making loop … and made business moves so fast that human supervision was impossible; in pursuit of goals that become more and more distorted from anything its human originators ever intended; and without any sort of legal or moral code or restrictions whatsoever.
(And one of the moves it’s allowed to do is “buy me some more GPUs and rewrite my code to be even better at accomplishing my broken buggy goal.”)
That’s what the AI Safety people want to prevent. The technical term for “getting AIs to work on human goals without breaking rules that humans care about” is “AI alignment”.)
We don’t yet know how to give an AI system anything like a “goal” or “intention”, not in the general sense that we can say a human has them. We can give an algorithm a hill to climb, a variable to maximize; but traditional algorithms can’t invent new ways to accomplish that.
What the well-known current “AI” systems (like GPT and Stable Diffusion) can do is basically extrapolate text or images from examples. However, they’re increasingly good at doing that; and there are several projects working on building AI systems that do interact with the world in goal-driven ways; like AutoGPT.
As those systems become more powerful, and as people “turn them loose” to interact with the real world in a less-supervised manner, there are some pretty significant risks. One of them is that they can discover new “loopholes” to accomplish whatever goal they’re given – including things that a human wouldn’t think of because they’re ridiculously harmful.
We don’t yet know how to give an AI system “moral rules” like Asimov’s Three Laws of Robotics, and ensure that it will follow them. Hell, we don’t even know how to get a chatbot to never say offensively racist things: RLHF goes a long way, but it still fails when someone pushes hard enough.
If AI systems become goal-driven, without being bound by rules that prevent them from harming humans, then we should expect that they will accomplish goals in ways that sometimes do harm humans quite a lot. And because they are very fast, and can only become faster with more and better hardware, the risk is that they will do so too quickly for us to stop them.
That’s pretty much what the AI Safety people are worried about. None of it is about robots deciding to “go against their programming” and revolt; it’s about them becoming really good at accomplishing goals without also being limited to do so in ways that aren’t destructive to the world we live in.
Put another way: You know how corporations sometimes do shitty things when they’re trying to optimize for making money? Well, suppose a corporation was entirely automated, with no humans in the decision-making loop … and made business moves so fast that human supervision was impossible; in pursuit of goals that become more and more distorted from anything its human originators ever intended; and without any sort of legal or moral code or restrictions whatsoever.
(And one of the moves it’s allowed to do is “buy me some more GPUs and rewrite my code to be even better at accomplishing my broken buggy goal.”)
That’s what the AI Safety people want to prevent. The technical term for “getting AIs to work on human goals without breaking rules that humans care about” is “AI alignment”.)