It’s becoming harder to distinguish between human and AI generated content. As the systems improve, what is the boundary between what is uniquely human to machine? Is there such a limit, or are we kidding ourselves? And are those generative, reasoning AI model actually thinking? Does that include diffusion models? What is thinking anyway?
Experts in any field I’ve spoken with, have a basic curve of reactions to AI content. It’s hauntingly similar to Gartner’s hype cycle curve: “It’s so cool it can do that! Actually, when you look closely it’s utter crap. But then, perhaps I could use it just for some of the boring bits…”
So the question for today is: Why do we react the way we do to the various forms of AI content?
In large part, this is an expansion of my note in my review of Co-Intelligence by
. This article is trying to delve deeper into the meaning of art, thinking, creating, and the limits of generative AI on the path to artificial general intelligence (AGI).
How we react?
Let’s start with images.
Back in October, Scott Alexander on
ran an AI Art Turing Test. he then analysed and published the results in November. It isn’t quite scientific, but with fifty images and 11,000 responses it’s pretty comprehensive.I did find his columns after the results were published and read the analysis first, so when I went back and took the test myself I was forewarned and forearmed. I did reasonably well (better than the average) in classifying the pictures not mentioned in the analysis, but of course failed a few.
Alexander has specifically excluded ‘obvious’ AI pictures, thus not covering the ‘typical’ output of Midjourney / DALL-E you might encounter on social media, so some of the results might be skewed. Then again, he also chose works of experienced human artists so both sides can claim the only best were taken, not your ‘industry average.’ Still, in the wild, people will tend to recognise AI content better than the below results.
That said, I’d like to draw your attention to one particular conclusion:
The average participant scored 60%, but people who hated AI art scored 64%, professional artists scored 66%, and people who were both professional artists and hated AI art scored 68%.
Go read that whole article, and in particular that last point where an artist explains everything subtly wrong in a particular image. As Alexander concludes:
So maybe some people hate AI because they have an artist's eye for small inadequacies and it drives them crazy.
It may not be conscious, but I would interpret it as people being more sensitive to the uncanny valley. Beyond that, when you have an expertise in a field, then the small nuances and subtle mistakes are more prominent. In ‘factual’ fields like logic and math it’s easier to highlight, but in creative and artistic ones it’s harder define. That doesn’t make it any less visible or important to people, both novices and experts.
Of course, ‘creative efforts’ is a range — coding, for example, is not pure logic and requires creative problem solving, and is prone to subtle errors that aren’t immediately observable (all those bugs that leak through any testing). Many creative tasks are somewhere on the range between logic and art.
The point is, that expertise brings the ability to discern subtleties, and experience brings discernment and taste in selecting the right approach. Teaching someone a skill is one of the best ways you can improve yourself and achieve expertise, as it forces you to analyse works for faults and answer questions clearly.
So detecting faults (in human or AI content) can be conscious or subconscious; when we look at AI output and think it’s off, we may or may not be able to put our finger on what’s wrong with it, but the more attuned we are to the field the better we get.
What about writing?
I have uncovered the secret to effective use of AI in writing fiction! Let me share with you this nugget of wisdom. Ready?…
Every so often, I try one of the bots and get it to create and/or edit a text. I use a variety of prompts and techniques, and refine further and further, until I max out the capacity to produce the best possible text. And the results are, invariably, … shit.
Utter bollocks. Derivative drivel. Grammatically correct garbage. Soulless slop.
OK, enough exaggeration by alliteration. There are two good usages for the output of LLMs in purely creative writing. One, is as an exercise in editing, where I force myself to articulate all the problems in the AI’s output 😜 Cheekiness aside, it cab help me build up my editing and critiquing skills, though I certainly enjoy it more when I edit the works of fellow humans.
Even when skipping outright creation and using AI to edit works, whether a plain LLM or something like Grammarly, is fraught with its own issues. While it will do its best to make sure everything is grammatically correct and the language is clear and accessible, it will do so by ensuring it conforms to some version of “internet average” language and will strip and personality and voice from it. Developing your own voice is significant to authors, just as visual style it for artists. (For example, if you’ve read more than a couple of my posts, you should get a sense of my style, of me, and should be able to recognise it in other posts). While an improved AI could perhaps learn to imitate my voice, it won’t help my develop it. This is also the minor problem: the bigger one is that writing is thinking. One has to be very careful and consider any second-order effects (like lazy experts), and ensure that the crutch doesn’t become indispensable.
The second use I found for AI in writing fiction is pure motivation. Spending time with the bot to get such a mediocre output and knowing I could have written something much better quicker is a kick to stop procrastinating and get writing…
Getting back to a more serious discussion, there are cases where this kind of language is the right thing to use. For example, writing book blurbs is notoriously hard for authors. How to you condense a 100,000 words novel into a paragraph of marketing-speak? Using an LLM not to write your blurb but to quickly generate a few options, allows one to select the better features and get over the block, building something that fulfils the needs of describing the work while being generic enough to fit in a particular genre’s tropes, set expectations, and appeal to the right readers.
But creative writing (fiction, strategy docs, or creative copy) is my area of expertise, and why I find the base LLM output problematic in those cases. To become an expert myself, I must read and analyse and struggle with writing to express my thinking, and not outsource that to an AI. Using an LLM is like using crutches — no amount of relying on such will get me beyond the mediocre. There is a truth and a reason why ‘art’ is often described as a painful struggle, and why experts in any field are required to put in the hours of practice-feedback loops.
Language is thinking
When we look at the output of LLMs, we anthropomorphise it as ‘thinking,’ as having put in mental effort to reach a conclusion. This is especially true when you play with a reasoning model and look at the stream it produces along the way.
We don’t have an explanation (yet) on how the low-level bio-chemistry of the brain produces abstract, linguistic, thinking — let alone our sense of self (the one ‘having’ the brain and doing the thinking). We say that this is an emergent property, one that rises out of the complexity of the underlying systems. We think, or hope, or fear, that LLMs will show that same level of emergence.
Maybe it’s already showing some — we can’t always pinpoint the way it learns and produces output, and there is some appearance of emergence of higher order understanding and capabilities. But is a reasoning model really thinking?
This is my current pet hypothesis:
We mainly think in words. Even visual artists that think in pictures when creating art think mostly in words at other times. We certainly mostly communicate with others using words. And we evaluate the intelligence of others using words. It’s why we find it hard to evaluate the intelligence of corvids and dolphins, and the bias against non-native speakers of our language. So when we encounter the stream of consciousness of a reasoning model, which looks coherent and sounds like a person might describe how they reason through a problem, we ascribe thinking to it.
But is it?
Specifically, how can we evaluate the apparent level of ‘thinking,’ and whether it’s a real emergent quality or just really good parroting? Is that stream budding consciousness, or an illusion based on our biases? That’s where I think turning to art can help assess some of the issues.
In part, language models engage in “pure” language generation. It may encode our biases (as they encoded in the body of words the LLM was trained on), but thinking is more than just language. For example, when the brain suffers damage in the capacity to generate emotions, people cannot make decisions — all decisions are emotional. (Side note: When we build decision support systems and market them as impartial, it often turns out that we just found a way to encode biases; often not the result we were aiming for — with the added bonus of extra opacity).
Our ‘thinking’ is dependent on many physical aspects of our brains and bodies, and cannot be separated. A model’s reasoning chain is, as anyone who tried meditation would tell you, far too coherent. Humans don’t quite think like that. So does mimicking language, learnt from humanity at large but still performed in isolation of any cohesive physical experience, still count as thinking? Would it support the emergence of all the other properties of thinking, such as feelings, awareness of time and space, morality? Back to the point about art above, right now, AI output hits that uncanny valley pretty bad. Though the example in this video is trivial, it’s exactly the point Alexander makes about driving people crazy.
Abstracting thinking only to language brings up the Chinese Room argument. This is a thought experiment that goes as follows. There is a room, inside of which is a person who does not know Chinese (or the language of your choice; my Greek is better than my Chinese, so I’ll stick to that). However, the man has a master codex on a fast computer, that species for each conceivable input of Chinese characters a corresponding output. By passing notes under the door, a different person outside can fluently communicate with the person inside in Chinese, even though the one inside does not speak the language.
There are many interesting philosophical responses, not the least of which is the system view that perhaps the room speaks Chinese.
For our discussion, though, note that the person inside is like a chatbot — they sit inside, waiting for input, for which they then provide an output. They lack agency and self-direction. They provide us with an output that makes us think they are thinking, but this is merely an output that is mimicry without discernment. It’s the diffusion model saying to itself ‘a hand has five fingers, I must remember to paint only five’ while still losing the plot on the cat merging into the curtain in the background. It’s the Deep Research (pick any of the multiple AI products with that name) who writes out ‘this is interesting, let me delve down this Reddit rabbit hole’ because it learned that this style of output pleases the humans who built it, without the discernment of a researcher knowing reputable sources from internet memes.
That is perhaps a bit flippant, but my experience with the Deep Research tools was a bit hit and miss. Even when the sources gathered were appropriate, the extraction of information accurate, and the report correct (a process we can assume will improve over time), it still leaves a few open issues.
First, just like with human interns, there will be subtle errors, necessitating a careful review of the output. That’s the above expert noticing and correcting the output, rather than accepting it.
Second, it only works if the information is already available on the internet, and does not cover original thinking. The written artefact is a representation of that understanding work. Writing is thinking, and without the effort to write, well, it’s like getting a report from an external consultancy on how to solve your problems. That is, not all that helpful.
Third, is that that value of these reports is rarely in the artefact, but in the thinking and effort that went into producing it. That’s part of growing the intern’s skill level, as well as understanding the material in depth — the struggle in the creative process, that brings insights now and later.
We’ll probably solve the training at inference time, making models that can learn from their own experience (and remember this learning when you close the chat). But that won’t be under current LLM architectures; we’ll need a different way to approach the problem. It also won’t touch embodied experience, which is integral to human thinking, and it won’t help us understand any topic outsourced this way to any depth. Critically, it will never solve ‘grokking’ as defined by Robert A. Heinlein: “to understand intuitively or by empathy, to establish rapport with, and to empathize or communicate sympathetically (with); also, to experience enjoyment.” It’s that innate understanding of a topic that you gained through struggling with it.
Perhaps one day, someone at some AI lab will issue the command “go out and live,” after which a future AI model will come alive and continue to pursue its own interests ad infinitum. It might also scheme against us when we try to put it back in the box. But this will be only the start — if it will want to produce art at that point, it will have to do it like the rest of us, observing, trying, getting feedback, and repeating — not merely emulating. Without this, all we have is mimicry without understanding, techniques without the taste to discern between right and wrong.
What can we do
Don’t get me wrong, I think LLMs can be very useful, including for creatives. But there’s still work required to build this up, guided by principles like ‘human in the loop’ and ‘trust but verify.’ And that doesn’t even touch on the ethical issues of the companies building those systems; I’m focusing here only on the technology, and how to approach implementation by considering second order effects.
When we implement AI systems, even accounting for human in the loop, even when building different tools for experts and novices, there are still many dangers.
Because this is a new field, the research is new as well and sometime conflicting. We can see instances where it improves capabilities, but at the same time (and occasionally due to second order effects) we see humans with good AI performing worse, or even a decline in cognitive capabilities because where AI helped with specific skills it reduced the capacity to problem-solve. (This was my main quibble with Mollick’s book — he claims that everyone benefits and AI is levelling the playing field, helping everyone who uses it).
In order to build expertise, there can be no replacement to human mentorship and guidance, and to shortcut to putting in the effort. Even if the exact type of effort morphs, it you go for the cheap option you get the cheap result.
‘Art’ is hard to define, but it’s my observation that art is half observation and half expression. It’s hard to separate the two, because we humans “train at inference time” to use the AI terminology. It takes sustained effort under competent guidance to build those skills (like a computer science major in their first job getting a lot of feedback from the team lead on what they naively thought was brilliant code 😉).
So while human-in-the-loop is absolutely the right way to go, there are still open problems in and around how they are marketed and implemented, how people interact with those systems, and how we account for varying levels of discernment (expertise) as well as allow for building future generations of experts. This doesn’t mean we should avoid AI — just like Photoshop changed graphic design but did not eliminate it, we need to find new ways of supporting students, practitioners, teachers, and experts.
When designing an AI-based solution, one (of many) questions you must ask yourself is whether an “internet average” is the right approach or not for the use-case, and how you should guide your user community to utilise it without harm.
In Summary
When it comes to art, the struggle is real.
When it comes to LLMs, we are far away from consciousness.
When designing AI solutions, having a “human in the loop” is still a must, but may not be enough.
We need to ensure that the tools support both novices and experts, do not reduce the expert’s capacity to exercise taste, and importantly allow novices to learn from human experts. The AI in the rest of the loop is a crutch. A very useful crutch, but one that should be be allowed to exclude other forms of movement, growth.