Colm O'Regan: AI scraped my novels, but why? Does it want to describe the inner workings of an imaginary Tidy Towns group?

Irish Examiner23-05-2025

Artificial intelligence — Lookit, you can't avoid it. Here are a few words to get you through if you are ever stopped by the AI police.
AI, machine learning, neural networks, and large language models. I've used these interchangeably like bog/marsh/mire/fen/swamp. But they're not all the same thing.
Artificial Intelligence is the overall idea of a computer system that can simulate human intelligence. This means it can learn new things.
Machine learning is a type of AI where computers learn from data. A neural network is a technique of machine learning that mimics how the brain learns.
That's the stuff that does image recognition so that's why your weird photo was rejected from Facebook. And a large language model is a type of neural network trained on squillions of words and grammar rules, and data from the internet so that it can answer you like a smart human can.
ChatGPT is an example of a large language model. That's the thing you used to write your CV. LLMs have a rake of stuff in their brains that they're using to write answers for you. This is the training data. A lot of this has been scraped from the internet but also books, art, etc. Including my own books.
Finding my novels scraped by AI, at first I couldn't possibly figure out what use it would have to build up the large language models' ability to describe the inner workings of an imaginary Tidy Towns group. But now, looking at the infiltration of Tidy Towns Facebook groups with AI-generated profiles, the whole thing is starting to fit together.
Just like all of us, AIs can have bias. And just like all of us, our bias depends on what we've been told. That bias can lead us to say some stupid stuff. AI systems reflect the prejudices in their training data. If it was trained on biased data, it can act in biased ways. Shite in, shite out.
Soon you'll hear about AGI, artificial general intelligence. This is where a machine is smart enough to match or be better than humans across all types of intelligence.
We're not there yet, but already it is said that the next version of AI will be smart enough to do a PhD. The joke doing the rounds is that the version after that will be smart enough to turn down a PhD and maybe go working because its head is melted from research.
Speaking of education, there's a huge concern in college about using AI to cheat. Academics are at a loss to know how to detect it. The only answer I think is to increase the level of in-person practical assessment. Maybe take a tip from the Fianna. There was no essay to be written for joining them.
Jumping over a branch as tall as yourself, running under a knee-high stick, and plucking a thorn from your foot as you run at top speed was considered the entrance exam.
If you hear the 'Turing test' thrown around, it's a test to see if a machine can fool a human into thinking it's also human. AIs recently passed a very hard version of it. 'Very hard' is a technical term. I will not be taking follow-up questions.
You'll see talk about regulation and 'guardrails'. Guardrails are the rules built into AI to stop it from being harmful, but you'll probably need another AI to enforce it and … shur they know each other as they went to school together. It'll be the Celtic Tiger banks all over again.
Finally there's hype — fierce excitement about what's coming, given by people who have a stake in you being as concerned about this as possible. For more on that, why not sign up to my fun AI safety course called AAAAA!-I. Cash only.
Colm is not hosting an AI course. Yet. He'll be doing comedy at Coughlans on Sunday, May 25, until an AI replacement can be found. Coughlans.ie.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Computer scientist to develop 'honest' AI that will spot rogue systems and flag 'harmful behaviour'

Irish Examiner

3 days ago

Irish Examiner

Computer scientist to develop 'honest' AI that will spot rogue systems and flag 'harmful behaviour'

An artificial intelligence pioneer has launched a non-profit dedicated to developing an 'honest' AI that will spot rogue systems attempting to deceive humans. Yoshua Bengio, a renowned computer scientist described as one of the 'godfathers' of AI, will be president of LawZero, an organisation committed to the safe design of the cutting-edge technology that has sparked a $1tn (€877bn) arms race. Starting with funding of about $30m (€26.3m) and more than a dozen researchers, Bengio is developing a system called Scientist AI that will act as a guardrail against AI agents — which carry out tasks without human intervention — showing deceptive or self-preserving behaviour, such as trying to avoid being turned off. Describing the current suite of AI agents as 'actors' seeking to imitate humans and please users, he said the Scientist AI system would be more like a 'psychologist' that can understand and predict bad behaviour. 'We want to build AIs that will be honest and not deceptive,' Bengio said. 'It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines — like a scientist who knows a lot of stuff.' However, unlike current generative AI tools, Bengio's system will not give definitive answers and will instead give probabilities for whether an answer is correct. 'It has a sense of humility that it isn't sure about the answer,' he said. Deployed alongside an AI agent, Bengio's model would flag potentially harmful behaviour by an autonomous system — having gauged the probability of its actions causing harm. Scientist AI will 'predict the probability that an agent's actions will lead to harm' and, if that probability is above a certain threshold, that agent's proposed action will then be blocked. LawZero's initial backers include AI safety body the Future of Life Institute, Jaan Tallinn, a founding engineer of Skype, and Schmidt Sciences, a research body founded by former Google chief executive Eric Schmidt. Bengio said the first step for LawZero would be demonstrating the methodology behind the concept works — and then persuading companies or governments to support larger, more powerful versions. Open-source AI models, which are freely available to deploy and adapt, would be the starting point for training LawZero's systems, Bengio added. 'The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control,' he said. Bengio, a professor at the University of Montreal, earned the 'godfather' moniker after sharing the 2018 Turing award — seen as the equivalent of a Nobel prize for computing — with Geoffrey Hinton, himself a subsequent Nobel winner, and Yann LeCun, the chief AI scientist at Mark Zuckerberg's Meta. A leading voice on AI safety, he chaired the recent International AI Safety report, which warned autonomous agents could cause 'severe' disruption if they become 'capable of completing longer sequences of tasks without human supervision'. The Guardian Read More The real cost of slightly funnier AI is the health of a poor black community

What the hell happened to Google search?

The Journal

5 days ago

The Journal

What the hell happened to Google search?

LET'S SAY YOU want a list of Irish ministers. So you google it, of course. The fact that it's its own verb sums up pretty neatly Google's total dominance of online search. 'I'll Bing it,' said no-one, ever. (Sorry, Microsoft.) is the world's most used website . Ninety percent of internet searches go through the company's search engine. It's the front door to the internet, and a navigational tool on which we have become entirely dependent. Who among us has typed out a url in the last decade? Whether you have an Android or an Apple phone, that's Google search you're using when you open your browser. But something has gone wrong. Search for 'Irish ministers' and the top result is… Pat Breen? ( The Journal checked this on several users' desktop browsers with the same result.) Breen was never a minister. He was a junior minister – and that was a while ago now. He lost his seat two elections ago, in 2020. A government website with a full list of current government ministers is quite a bit down the results page. Pat Breen, the Platonic ideal of an Irish minister, according to Google. Google Google Sponsored posts The utility of the search engine has been particularly eroded when it comes to anything that could be sold to you, with top results likely to all be from companies that have paid to skip up the ranking to a position where they would not have organically surfaced. These paid-for top results, which take up more and more space on the search engine results page, are also partly based on your browsing history rather than what you are currently looking for. So a search from an Irish location for 'the best place to buy children's shoes', for instance, can contain sponsored top results for (a) shops that don't sell children's shoes or (b) British online-only retailers. (Good luck buying children's shoes without trying them on.) There are useful results amid the debris of sponsored links and below the paid-for top table, but it feels like harder work than it once was to find them. This isn't helped by the fact that sponsored links are not very visually distinct from organic results. It's hard not to click on them. Ads on search are how Google makes most of its money. ChatGPT's challenge to Google And then, of course, there's the new AI Overview that, for the past year, has appeared in response to certain types of queries. Now, the integration of AI into search is about to be turbocharged as Google goes on the offensive against ChatGPT. It may not be its own verb yet, but for many people, OpenAI's chatbot is becoming as automatic and intuitive a go-to as Google. Liz Carolan, a tech consultant and author of The Briefing newsletter, says that while Google hasn't shared data on the drop-off in people using its search engine, all the signs are that the switch to ChatGPT has been 'profound'. Where once we would have googled, 'what time is the Eurovision', now we are asking chatbots. So Google is becoming a chatbot too. In May, Google began to roll out the next step up from AI Overview. AI Mode, which has been launched in the US, will deliver customised answers to users' questions, including charts and other features, rather than serving up a lists of links. These answers will be personalised based on past browsing history. You will even be able to integrate it with your Gmail account to allow further personalisation. At first, AI Mode will be a distinct option in search, but its features and capabilities will gradually be integrated into the core search product, Google has said . Carolan says this will be as fundamental a change to how we interact with the internet as the original arrival of Google search. 'Instead of navigating between links, we're going to end up using a single interface: a chatbot querying the websites that exist and delivering back to you its interpretation of that, in a conversational style,' she explains. An example of an AI Overview result in Google. Google Google AI nonsense The first problem is, Google's AI results can be nonsense . Kris Shrisnak, a senior fellow at the Irish Council for Civil Liberties working on AI and tech, says people need to understand one fundamental point about the large language models (LLMs) on which chatbots such as ChatGPT and Google's Gemini AI are based: they are not designed to be accurate. 'When they're accurate, they are coincidentally accurate,' Shrisnak says. 'They're accurate by accident, rather than by design.' For example, Carolan recently wanted to check how many working days there are in June. Google's AI-generated top result helpfully explained that there are 21 working days and no public holidays in June. If you specify 'in Ireland', Google says there are 22 working days and no public holidays. Both answers are wrong. There are 20 working days in June, and the first Monday is always a public holiday. ChatGPT didn't know that either. It counted the bank holiday twice. Google isn't planning to take Monday off. Google Google 'It's just blatantly inaccurate,' Carolan says. 'People are relying on it, and it's giving them inaccurate information.' Aoife McIlraith, managing director of Luminosity Digital marketing agency, says Google had almost certainly released its AI search product sooner than it wanted to. 'There's huge pressure on them. It's the first time they actually had competition in the market for search. It will definitely get better, but it's going to take some time,' McIlraith says. Google defended AI Overviews, telling The Journal that people prefer search with this feature. It said AI Overview was designed to bring people 'reliable and relevant information' from 'top web results', and included links. Advertisement Enshittification Even setting aside the incorporation of undercooked AI answers into results, Google's traditional search product does not seem to be working as well as it once did. Journalist Cory Doctorow coined the term 'enshittification ' in 2022 to describe the pattern whereby the value to users of platforms – be it Amazon, TikTok, Facebook or Twitter – gradually declines over time. Doctorow argued that platforms start by offering something good to users (like an excellent search engine), then they abuse their users to serve business customers (search results buried under ads), and then they abuse both users and business customers to serve their shareholders. Documents released in 2023 as part of a US Department of Justice antitrust case against Google gave a rare insider view of the top of the company, revealing that in 2019 there were tensions over the direction of search. The documents suggested a boardroom struggle over whether Google's search team should be more focused on the effectiveness of the product, or on growing the number of user queries (a better search engine would mean fewer queries, and therefore fewer ads viewed). In one email, the head of search complained his team was 'getting too involved with ads for the good of the product'. Google said this weekend that this executive's testimony at trial had 'contextualised' these documents and clarified the company's focus on users. 'The changes we launch to search are designed to benefit users,' Google said. 'And to be clear: the organic results you see in search are not affected by our ads systems.' Carolan says it's impossible to know exactly what has happened within Google's algorithm, but the quality filters that were once in place to keep low-quality results further down the ranking seem to be struggling to hold back the tide. Visibility on Google can be gamed using certain practices known as search engine optimisation (SEO). SEO is the reason why, for example, online recipes often contain weird, boring essays above the list of ingredients. All publishers use SEO, but the quality of search results is degraded when low quality websites are able to abuse SEO to boost their Google ranking. 'Maybe investment within search engines are going more towards AI than they are towards just sustaining the core search product,' Carolan says. 'It's very hard to say because all of this is happening in very untransparent ways. Nobody gets to see how decisions are being made.' McIlraith says it's widely believed in her industry that recent changes to Google's algorithm – in particular an August 2022 update called, ironically, 'Helpful Content' – have corrupted results. She believes this is having a bigger impact in smaller markets such as Ireland, with more . websites appearing in Irish users' results, for example. 'A lot of people in my industry have been shouting about this, particularly in the past 18 months,' McIlraith says. Google said it makes thousands of changes to search every year to improve it, and it's continuously adapting to address new spam techniques. 'Our recent updates aim to connect people with content that is helpful, satisfying and original, from a diverse range of sites across the web,' it said. For what it's worth, Shrisnak doesn't use Google now, favouring DuckDuckGo, an alternative search engine based on Google that feels a lot like the Google of old. It doesn't collect user data (and is capable of correctly identifying the current government of Ireland). What happens next? Google says AI is getting us to stay where it wants us: on Google. CEO Sundar Pichai has suggested that AI encourages users to spend more time searching for answers online, growing the overall advertising market. Google says AI Overviews have increased usage by 10% for the type of queries that show overview results. Soon, Irish users are likely to see advertising integrated into AI Overview. The company is telling advertisers this will be a powerful tool, putting their ads in front of us at an important, previously inaccessible moment when we are just beginning to think about something. But AI raises existential questions for the production of content for the web as we know it, both in its ability to generate content and as it's being applied in search. In the jargon of digital marketing, the problem is known as 'zero click'. You ask Google a question and get an answer – maybe an AI-generated one – without ever having to click on a blue link. McIlraith says: 'The biggest challenge for all of my clients and the wider industry is that Google is flatly refusing to give us any data around zero click. We cannot see how much our brand is showing up in search results where no click is being attributed.' Until now, there was an unwritten contract: websites provided Google with information for free, and benefited from Google-generated traffic. This contract is broken when Google morphs into a single interface scraping the web to feed its AI in a way that negates the need to click through links to websites to find information. 'The challenge then really becomes, why would I create content?' McIlraith says. 'Why would I create content on my website just for these AIs to come along and scrape it?' Already there are challenges to ChatGPT's practices, with publishers led by the New York Times suing OpenAI over its use of copyrighted works. News/Media Alliance, the trade association representing all the biggest news publishers in the US, last month condemned AI Mode as 'the definition of theft'. 'Links were the last redeeming quality of search that gave publishers traffic and revenue,' the alliance said. 'Now Google just takes content by force.' Google CEO Sundar Pichai was grilled about this by US tech news website The Verge last week. He said AI Mode would provide sources, adding that for the past year Google has been sending traffic to a broader base of websites and this will continue. He did not give a definitive answer when asked by whether a 45% increase in web pages over the past two years was the result of more of the web being generated by AI, stating that 'people are producing a lot of content'. Carolan speculates that in the single interface, linkless future, with the business model of web publishing broken, the risk is that the internet starts to eat itself: regurgitating AI slop rather than sustaining the production of original material. The information Google's AI Mode and ChatGPT and the rest are feeding off will then degrade. Late stage enshittification. AI search itself may improve, but these improvements will be undermined by this disintegration of the information environment. Readers like you are keeping these stories free for everyone... Our Explainer articles bring context and explanations in plain language to help make sense of complex issues. We're asking readers like you to support us so we can continue to provide helpful context to everyone, regardless of their ability to pay. Learn More Support The Journal

‘AI did my school project. It took me an hour rather than a week'

Irish Independent

6 days ago

Irish Independent

‘AI did my school project. It took me an hour rather than a week'

Many EU schools are introducing ChatGPT in the classroom, but is it a lazy way of fact-finding or writing essays? One student I spoke to doesn't think so Today at 21:30 Should Ireland follow other EU countries in rolling out ChatGPT in secondary school classrooms? That may be on the cards, according to reports this week. On one level, the idea seems odd. Doesn't generative AI fly against the principles of basic, intuitive learning? Isn't it just a lazy, decadent way of outsourcing essays and fact-finding?