• 0 Posts
  • 17 Comments
Joined 1 year ago
cake
Cake day: June 17th, 2023

help-circle

  • This was my core point. I don’t consider a business raising prices or gating features as a direct result of those features increasing their cost as “enshittification”. Stickers being paid, custom emojis, etc, that doesn’t cost Discord anything to provide, making that paid is enshittification; But if the feature itself costs the business actual money to provide, does everyone just expect them to eat that cost forever, in a lot of cases for absolutely no revenue from the users?

    Calling out businesses for not giving stuff that costs them money away for free just, doesn’t fundamentally make sense to me. Why is it just expected of Discord that they pay to store all your large files? A lot of “freemium” services like GMail recoup some of that money by mining your email for data that it can sell to advertisers, or eating the cost in an attempt to lock you into an ecosystem where you’ll spend money. Storing files on Discord is neither of those things.

    Don’t get me wrong, a lot of services are enshittifying, and making their services worse so you spend more money with them— but adjusting your quotas and pricing to reflect your real world cost of business is not that. To frame it as though you are entitled to free compute and resources from companies that don’t owe you anything comes off as just that, entitled. The cloud isn’t free. If you want to use a service, you should pay for it if you can.









  • I’m aware the model doesn’t literally contain the training data, but for many models and applications, the training data is by nature small enough, and the application is restrictive enough that it is trivial to get even snippets of almost verbatim training data back out.

    One of the primary models I work on involves code generation, and in those applications we’ve actually observed verbatim code being output by the model from the training data, even if there’s a fair amount of training data it’s been trained on. This has spurred concerns about license violation on open source code that was trained on.

    There’s also the concept of less verbatim, but more “copied” style. Sure making a movie in the style of Wes Anderson is legitimate artistic expression, but what about a graphic designer making a logo in the “style of McDonalds”? The law is intentionally pretty murky in this department, with even some colors being trademarked for certain categories in the states. There’s not a clear line here, and LLMs are well positioned to challenge what we have on the books already. IMO this is not an AI problem, it’s a legal one that AI just happens to exacerbate.


  • That’s not what’s happening though, they are using that data to train their AI models, which pretty irreparably embeds identifiable aspects of it into their model. The only way to remove that data from the model would be an incredibly costly retrain. It’s not literally embedded verbatim anywhere, but it’s almost as if you took an image of a book. The data is definitely different, but if you read it (i.e. make the right prompts, or enough of them), there’s the potential to get parts of the original data back.