• 1 Post
  • 28 Comments
Joined 1 year ago
cake
Cake day: June 7th, 2023

help-circle





  • I think they have some areas where they’re very useful, but beyond those areas they’re only OK at best. They don’t come close to living up to the hype, which is mostly based on “the next version will be mind blowing!”.

    They are a new type of app, nothing more. New types of apps can be extremely useful, and make a lot of tasks easier, e.g. spreadsheets. I would say at best generative AI is as game changing as spreadsheets were, but maybe less.

    The hype machine wants us to believe they are as revolutionary as the PC itself, or the car. In fact 10 times as revolutionary! I just don’t buy it… at least not in the foreseeable future.






  • I think choice of software (wiki or otherwise) is the least of your worries. The problem is not so much with fake data, it’s with the interpretation of the data. That’s where the bias (and sometimes manipulation) comes in. Even if you managed to moderate it well enough so that all the data was “objective”, you couldn’t stop subjectivity being a part of the interpretation.

    As an example, in most countries, certain minority groups are over-represented in prison populations. e.g. in the US, black people disproportionally end up in prison. That is an objective fact (so far as it goes).

    But based on that fact, you could interpret it as either:

    • Black people are just inherently more likely to commit crimes
    • There are systemic biases that mean black people are imprisoned more often

    How do you decide which is right when both are based on the data? (One is clearly racist, but still based partially on facts)







  • LLMs choose words based on probabilities, i.e. given the word “blue”, it will have a list of words and probabilities that those words should follow “blue”. So “sky” would be a high probability, “car” might also be quite high, as well as a long list of other words. The LLM chooses the words not by selecting whatever has the highest probability, but with a degree of randomness. This has been found to make the text sound more natural.

    To watermark, you essentially make this randomness happen in a predefined way, at least for cases where many different words could fit. So (to use a flawed example), you might make it so that “blue” is followed by “car” rather than “sky”. You do this throughout the text, and in a way that doesn’t affect the meaning of the text. It is then possible to write a simple algorithm to detect whether this text was written by an AI, because of the probability of different words appearing in particular sequences. Because its spread throughout the text, it’s quite difficult to remove the watermark completely (although not impossible).

    Here’s an article that explains it better than I can: https://www.kdnuggets.com/2023/03/watermarking-help-mitigate-potential-risks-llms.html