The Ethics of AI

That genie ain't going back in the bottle.

Aug 14, 2024

A subject that has been on my mind lately: the ethics, or lack thereof, of AI.

Some of the AI models do attempt to introduce a governance model and be more conscientious, but generally speaking in the world of today “ethics” is what’s standing between billionaires and their spaceships. It’s going to be up to us, the humble people building products and using services, to nudge the world in the right direction.

My blog post images are sourced from Pixabay, from human creators.

The Future is Here!

I think it’s important to acknowledge that this particular genie isn’t going back into the bottle. The current mass public and professional interest in Generative AI may be at the peak of the hype cycle, but these tools have been around for ages.

The potential good use of those tools is tremendous — even if not fully understood and utilised at the moment. I expect there to be shifts along the way, both in how they are trained (which currently has massive ethical issues with copyright and fair access), and what they are used for (hint: not to generate derivative or compromising work).

So let’s explore what we have, what’s it’s good for, and what we could do better.

The Ugly

Training generative AI models requires massive amount of input. What this input is and how it was acquired is a source of contention. It is very clear that the input used had no regards to copyright or fair use, which is particularly evident in images and other multimedia generative models. The RIAA is suing Suno and Udio, the makers of music generators (see articles on NPR and The Verge), after they generated music that was so similar to known artists that in the case of DJ Jason Derulo it even had his own voice tag at the beginning.

Suno basically came out and admitted they trained their generator on every song they could get their hand on over the internet. Similarly, Perplexity, which makes an almost hallucination-free answer engine on top of linked search results (similar to what Google has been doing for years, but far better as a virtual research assistant), has been embroiled in a drama when it was observed that they don’t respect robots.txt. (For those who don’t know: that’s how a site indicates content it doesn’t want to appear in search engines and results). Other providers have been conspicuously silent and evasive when asked about training data. Utter Wild West, in short, with complete disregard to content ownership.

While some models do make a half-hearted attempt to block outright plagiarism (eg Suno will refuse you if you ask it directly to generate a Jason Derulo song), it’s clear that they have been trained on copyrighted material and the ability to use those works as “inspiration” or just circumvent their interface protection goes beyond anything that could even remotely come under fair use.

When you listen to podcasts and articles coming out of Silicon Valley and the the industry, it’s very clear that there is little consideration for any of these dimensions. It’s the thing standing between the wannabe-or-actual billionaires and the dreams of private spaceships.

Resistance is warranted — but then, pitchforks and wholesale bonfires are not the right approach.

The misunderstood

I also find that there is general misunderstanding about how generative AI is used. For example, see this article by the ABC on how children’s photos are used. While the photos of children (and anyone, really, for those who remember the ClearView débâcle) is obviously wrong, it’s not quite what the alarmist journo writes that those photos could be used directly to regenerate images of the same children. A fellow author had to explain somebody foaming at the mouth that using stock photos on book covers isn’t “AI art” as those photos were taken by professional photographers of consenting human models. (Unlike Adobe, who are using those stock images to train their generative models. They then use it to generate paid content rather than pay the photographers who uploaded the originals as stock photos. Back to ugly territory.) Edit: seems like I was wrong, based on my original reading of them using their “content analysis” clause in their T&C to use images for training models, and Adobe did start to pay creator whose works were used in training their models (see here). If I put my cynical hat on, I’d say that was a response after the fact to a public outcry, but that could be just my curmudgeon self.

When an AI models ingests the training data, you can think of it as effectively compressing it like a JPEG image — but to such a degree that you loose too much information to display that specific image back again clearly. Instead, it learns to reconstructs similar-ish images or content. How similar is, of course, the matter of debate.

For example, if I just go to a random image generator and ask for a picture of “Assaph Mehr”, some would reply that this would violate a person’s rights. Others would happily generate an image — which looks nothing like me… and yet shows a man surrounded by books and holding a detective’s magnifying glass. So me writing fantasy/mystery fiction has clearly made it into the “zeitgeist” of AI training.

Other images had me typing and drinking coffee, with books in the background. Or dressed as an ancient Roman. All things prevalent on my internet presence as an author.

That doesn’t mean that you can’t take a specific image (or other input) and feed it into a model to create very similar results. That’s what “deepfakes” are using, and I really hope I don’t need to explain why that’s unethical. That firmly puts us back into “ugly” territory, and there’s no innocent misunderstanding there.

Me, using my picture, and asking for a René Magritte style portrait. A deepfake would take just a *little* bit more effort.

The Bad

At the mere “bad” level, I would classify many of the frivolous use of AI mushrooming these days, which to a degree is also based on misunderstanding. The statistical nature of AI generation means it’s aiming for the average, and just isn’t good for the extremes of highly creative or highly accurate results.

Accuracy can be improved by some techniques and consistent effort. Perplexity, despite the unethical sourcing of data, can operate as a good research assistant and does provide you with links to original content to show what it has based its answers on (better than recommending you eat a rock a day). Others like Komo or Bing do similar things. But these are whole companies working on that, and getting accurate results with generic models like ChatGPT is still hard.

Creativity is another aspect. Using models to generate content that isn’t derivative is doomed to failure — specifically because the models are trained to be derivative. When they aren’t (when they take a very small probability path) we end up with an “hallucination”.

This aiming at the average can lead to an interesting phenomena called Model Collapse. By training AI models on AI generated content we are repeatedly removing the extremes, and effectively making the model stupider. (See academic article on Nature, a simpler explainer on the Financial Times, or this lengthy-but-interesting analyst column on synthetic data.) It means a lot of the vibrancy, originality, diversity, and everything else that makes the breadth of human experience interesting and, well, human.

The other type of “bad” are products that are not much more than thin wrappers around Gen AI models, companies that just try to make a quick buck via a quick hack. This is hardly an isolated case in humanity’s long history, and I expect them to fail just as fast. VCs may be keen to throw money at them now, but that’s a numbers game — they also expect 9 out of 10 to fail.

The basic premise of solid product management always apply: you should provide meaningful value to answer a great-enough need of an otherwise sufficiently large under-served population. Otherwise, and this the course of this hype cycle just as it was in previous ones, we’ll just see a mushrooming and collapse of many of the more frivolous startups.

More on the topic of building long-term AI-powered applications in a future post.

The fringe

There are those who claim that Artificial General Intelligence (AGI) will effectively be an alien intelligence. It will come about with basic AI generating more complex training models to create more and more complex Generative AI, until we reach true AGI — which will also be completely alien to humans in it’s way of thinking and bring about the end of humanity (Skynet style).

For what it’s worth, “artificial intelligence” has been around for decades. Eliza chat has kinda passed the Turing Test back in the mid 1960’s. Neural networks have been worked on almost as long, and generative AI are just their latest application. But every time we break another barrier, we come to the realisation that that wasn’t what conscious intelligence was about. We end up with just another tool, better but still quite dumb. Curiosity seems like a good current bet — all the AI models respond to prompts, none of them are able to come up with its own questions, directions it wants to explore for its own sake. So I would still qualify that as a misunderstanding. There maybe merits to enforcing more ethical boundaries on AI research just like there are on medical research, but a torches-and-pitchforks approach isn’t right.

The current stance is that since it can’t be held responsible for its actions, don’t use a computer to make life-changing decisions. Note that legal courts don’t consider computers as sentient entities, so don’t try to make such silly arguments. The EU has already put out an AI-risk framework, where you and your customers can compare how potentially dangerous and impactful your AI is and mitigate the risk accordingly. I expect there will be more coming, addressing copyrights, privacy, impact, and ethics.

Anyway, I wouldn’t call the proponents of the doomsday view the “lunatic” fringe, because there is a non-zero chance of total human annihilation. But that was always the same, if you watched Oppenheimer. I personally just don’t think this chance is realistic. There will be other tensions and problems arising for sure, but our tools are still based on human needs. I’ve explored this in fiction, as did other authors, which you can see on my previous post here.

The Good

So yeah, humanity stinks. There’s rampant plagiarism — almost outright theft — of copyrighted material, a drive by unchecked capitalism to squeeze every last dollar regardless of social costs, and the proliferation of both misunderstanding and frivolous content and companies.

And yet, I dare say all is not lost, humanity isn’t doomed, and some good is yet to come. This is where we and ethics come in.

Mozilla has been putting content on ethical and trustworthy AI for years. See this whitepaper from 2020 (as in, before the hype cycle boom). This post shows many of the other dimensions that they consider in approaching this subject, or this practical guide published in 2023 on picking a Gen AI model. You can lose hours down the rabbit hole of Mozilla’s AI blog.

Just like the GDPR in the EU was followed by the CCPA in California and other legislation around the world aimed at bringing personal privacy under control, I expect that there will be similar constraints put on AI model providers. This has its limits, of course — money laundering rules ever stopped anyone really determined, and new weapons are being developed around the world. But if, say, failing to disclose your training data and observing copyrights excludes you from doing business in certain countries and continents it will at least stop some of the more blatant violations and put commercial companies that are established in those regions under more scrutiny and pressure to do the right thing.

For what it’s worth, I also think that we’ll see a rise of “ethical AI” vendors with transparent data. I like how Meta put the LLaMa models as open source, but I have no doubts that this was purely a commercial move. I like how Anthropic has clear views on AI safety and govenance, and I’d like to see even them go even further with openness and transparency. I suspect we’ll also see the rise of Small Language Models and their multimedia equivalents — you don’t need a Large model trained on Twitter and Reddit just to construct English sentences. You can even get better results by using a smaller base model (with ethically sourced inputs) and fine-tuning it for a specific task. This will further enable taking a stance about ethics without compromising efficiency.

The other vector is us product managers, as we think and develop AI powered applications. I’ll write more about the approaches in a future post, but for the discussion about ethics consider:

What am I using the AI for? Is it for the public benefit? How does it impact people’s lives?
What are my criteria to select AI vendors? (See the Mozilla criteria above)
How do I use those models, in ways that are fair and do not violate rights?
How do I ensure user privacy?

Forget about explaining to your mum the technicalities of what you do — would you be able to proudly explain to her your ethical stance about making people lives better?

Don’t be afraid. I believe that as the hype peaks and falls, those leading now with clear value delivered with ethical guardrails will be the survivors and thrivers of the Trough of Dillusionment and leaders in the Plateau of Productivity.

Gen AI, like any tool, has potential for both good and misuse. It’s up to us to ensure hammers are used for hammering nails. It’s a particularly powerful tool, true, but I’m not the first one to say that with great power comess great responsibility.

That’s it for now. Next post on the practicalities of building an AI application when you’re not an AI company — ethics included.

Organic Creativity

Sep 3

I think the term training AI is so loaded. Basically you’re feeding the machine, but they want to use politically correct language. As for music, sampling has been an issue for years and AI generation compounds and hyper-aggravates the problem. Inspiration, training, cute words to cloak what it’s really doing. Great read

Expand full comment

Rise of the Product Leader

Discussion about this post