I work on LLM’s for a big tech company. The misinformation on Lemmy is at best slightly disingenuous, and at worst people parroting falsehoods without knowing the facts. For that reason, take everything (even what I say) with a huge pinch of salt.
LLM’s do NOT just parrot back falsehoods, otherwise the “best” model would just be the “best” data in the best fit. The best way to think about a LLM is as a huge conductor of data AND guiding expert services. The content is derived from trained data, but it will also hit hundreds of different services to get context, find real-time info, disambiguate, etc. A huge part of LLM work is getting your models to basically say “this feels right, but I need to find out more to be correct”.
With that said, I think you’re 100% right. Sadly, and I think I can speak for many companies here, knowing that you’re right is hard to get right, and LLM’s are probably right a lot in instances where the confidence in an answer is low. I would rather a LLM say “I can’t verify this, but here is my best guess” or “here’s a possible answer, let me go away and check”.
I thought the tuning procedures, such as RLHF, kind of messes up the probabilities, so you can’t really tell how confident the model is in the output (and I’m not sure how accurate these probabilities were in the first place)?
Also, it seems, at a certain point, the more context the models are given, the less accurate the output. A few times, I asked ChatGPT something, and it used its browsing functionality to look it up, and it was still wrong even though the sources were correct. But, when I disabled “browsing” so it would just use its internal model, it was correct.
It doesn’t seem there are too many expert services tied to ChatGPT (I’m just using this as an example, because that’s the one I use). There’s obviously some kind of guardrail system for “safety,” there’s a search/browsing system (it shows you when it uses this), and there’s a python interpreter. Of course, OpenAI is now very closed, so they may be hiding that it’s using expert services (beyond the “experts” in the MOE model their speculated to be using).
Oh for sure, it’s not perfect, and IMO this is where the current improvements and research are going. If you’re relying on a LLM to hit hundreds of endpoints with complex contracts it’s going to either hallucinate what it needs to do, or it’s going to call several and go down the wrong path. I would imagine that most systems do this in a very closed way anyway, and will only show you what they want to show you. Logically speaking, for questions like “should I wear a coat today” they’ll need a service to check the weather in your location, and a service to get information about the user and their location.
It’s an interesting point. If I need to confirm that I’m right about something I will usually go to the internet, but I’m still at the behest of my reading comprehension skills. These are perfectly good, but the more arcane the topic, and the more obtuse the language used in whatever resource I consult, the more likely I am to make a mistake. The resource I choose also has a dramatic impact - e.g. if it’s the Daily Mail vs the Encyclopaedia Britannica. I might be able to identify bias, but I also might not, especially if it conforms to my own. We expect a lot of LLMs that we cannot reliably do ourselves.
I work on LLM’s for a big tech company. The misinformation on Lemmy is at best slightly disingenuous, and at worst people parroting falsehoods without knowing the facts. For that reason, take everything (even what I say) with a huge pinch of salt.
LLM’s do NOT just parrot back falsehoods, otherwise the “best” model would just be the “best” data in the best fit. The best way to think about a LLM is as a huge conductor of data AND guiding expert services. The content is derived from trained data, but it will also hit hundreds of different services to get context, find real-time info, disambiguate, etc. A huge part of LLM work is getting your models to basically say “this feels right, but I need to find out more to be correct”.
With that said, I think you’re 100% right. Sadly, and I think I can speak for many companies here, knowing that you’re right is hard to get right, and LLM’s are probably right a lot in instances where the confidence in an answer is low. I would rather a LLM say “I can’t verify this, but here is my best guess” or “here’s a possible answer, let me go away and check”.
I thought the tuning procedures, such as RLHF, kind of messes up the probabilities, so you can’t really tell how confident the model is in the output (and I’m not sure how accurate these probabilities were in the first place)?
Also, it seems, at a certain point, the more context the models are given, the less accurate the output. A few times, I asked ChatGPT something, and it used its browsing functionality to look it up, and it was still wrong even though the sources were correct. But, when I disabled “browsing” so it would just use its internal model, it was correct.
It doesn’t seem there are too many expert services tied to ChatGPT (I’m just using this as an example, because that’s the one I use). There’s obviously some kind of guardrail system for “safety,” there’s a search/browsing system (it shows you when it uses this), and there’s a python interpreter. Of course, OpenAI is now very closed, so they may be hiding that it’s using expert services (beyond the “experts” in the MOE model their speculated to be using).
Oh for sure, it’s not perfect, and IMO this is where the current improvements and research are going. If you’re relying on a LLM to hit hundreds of endpoints with complex contracts it’s going to either hallucinate what it needs to do, or it’s going to call several and go down the wrong path. I would imagine that most systems do this in a very closed way anyway, and will only show you what they want to show you. Logically speaking, for questions like “should I wear a coat today” they’ll need a service to check the weather in your location, and a service to get information about the user and their location.
It’s an interesting point. If I need to confirm that I’m right about something I will usually go to the internet, but I’m still at the behest of my reading comprehension skills. These are perfectly good, but the more arcane the topic, and the more obtuse the language used in whatever resource I consult, the more likely I am to make a mistake. The resource I choose also has a dramatic impact - e.g. if it’s the Daily Mail vs the Encyclopaedia Britannica. I might be able to identify bias, but I also might not, especially if it conforms to my own. We expect a lot of LLMs that we cannot reliably do ourselves.