I am more optimistic on that one. AI provides a pretty clear way out of this, since it allows you to automatically detect the bullshit. Meaning either the bullshit has to raise so much in quality that it is indistinguishable from good content, in which case it would not be bullshit anymore, or it will get filtered. AI can also transform bad websites into good ones, like a super-powered ReaderMode, AdBlock and more all rolled into one, so a lot of the “lets plaster everything with ads” will lose effectiveness.
The problem over the last decade was that Google completely lost interest in being a search engine, they are just an ad company and as long as search leads you to more ads, they are quite happy. So the user experience went down the toilet.
The real problem with AI is that it will remove the incentive for the authors. Content producers want to get paid, with AI you can just extract the information from an article without ever viewing the article or the ads around it.
They have already been trying to use ai to combat and identify ai in college and highschool papers. So far it’s been severely ineffective. AI has gotten pretty good at writing out a sentence or two that looks like it’s real. If ai improves enough I doubt they’ll be much of a way to identify it all.
It’s not about identifying AI or even spam, but about extracting useful information. Are the claims made in a source backed by other sources? Do they violate information from trusted sources? That’s all stuff that an AI can reason about and then discard the source as junk or condense it down to the useful information in it.
Basically you completely skip browsing the Web yourself and just use the AI to find you what you want. Think of it like some IMDB or Wikipedia, but covering everything and written and curated by AI. When the AI doesn’t already know some fact, it goes crawling the Web and finding it out for you, expanding its knowledge base in the process.
Or see the ship computer from StarTrek, you don’t see the people there browsing the Web, you see them getting data in exactly the format they need and they can reformat and filter it as needed.
At the moment there are still some technical hurdles, the AI systems we have are all still a little to stupid for this. But that seems to be the direction we are heading, things like summarizer bots already do a pretty good job and ChatGPT is reasonably good at answering basic questions and reformatting it the way you need it. Only a matter of time until it gets good enough that you couldn’t do a better job yourself.
You’re looking at it in a flawed manner. AI has already been making up sources and names to state things as facts. If there’s a hundred websites for claiming the earth is flat and you ask an ai if the earth is flat, it may tell you it is flat and source those websites. It’s already been happening. Then imagine more opinionated things than hard observable scientific facts. Imagine a government using AI to shape opinion and claim there was no form of insurrection on Jan 6th. Thousands of websites and comments could quickly be fabricated to confirm that it was all made up. Burying the truth into obscurity.
You have plenty of literature that can act as ground truth. This is not a terribly hard problem to solve, it just requires actually focusing on it. Which so far simply hasn’t been done. ChatGPT is just the first “look, this can generate text”. It was never meant to do anything useful by itself or stick to the truth. That all still has to be developed. ChatGPT simply demonstrates that LLM can process natural language really well. It’s the first step in this, not the last.
At some point, probably soon, AI content will generate so much data it becomes untenable to store all the scraped data.
We’ll also reach a point where it becomes much more costly to parse the data for AI spam+trustworthiness+topics. If you need LLMs just to filter spam, that is a large step up in costs and infrastructure vs current methods.
When that happens what happens to search? The quality will have to degrade or the margins will drop off sharply.
I am more optimistic on that one. AI provides a pretty clear way out of this, since it allows you to automatically detect the bullshit. Meaning either the bullshit has to raise so much in quality that it is indistinguishable from good content, in which case it would not be bullshit anymore, or it will get filtered. AI can also transform bad websites into good ones, like a super-powered ReaderMode, AdBlock and more all rolled into one, so a lot of the “lets plaster everything with ads” will lose effectiveness.
The problem over the last decade was that Google completely lost interest in being a search engine, they are just an ad company and as long as search leads you to more ads, they are quite happy. So the user experience went down the toilet.
The real problem with AI is that it will remove the incentive for the authors. Content producers want to get paid, with AI you can just extract the information from an article without ever viewing the article or the ads around it.
They have already been trying to use ai to combat and identify ai in college and highschool papers. So far it’s been severely ineffective. AI has gotten pretty good at writing out a sentence or two that looks like it’s real. If ai improves enough I doubt they’ll be much of a way to identify it all.
It’s not about identifying AI or even spam, but about extracting useful information. Are the claims made in a source backed by other sources? Do they violate information from trusted sources? That’s all stuff that an AI can reason about and then discard the source as junk or condense it down to the useful information in it.
Basically you completely skip browsing the Web yourself and just use the AI to find you what you want. Think of it like some IMDB or Wikipedia, but covering everything and written and curated by AI. When the AI doesn’t already know some fact, it goes crawling the Web and finding it out for you, expanding its knowledge base in the process.
Or see the ship computer from StarTrek, you don’t see the people there browsing the Web, you see them getting data in exactly the format they need and they can reformat and filter it as needed.
At the moment there are still some technical hurdles, the AI systems we have are all still a little to stupid for this. But that seems to be the direction we are heading, things like summarizer bots already do a pretty good job and ChatGPT is reasonably good at answering basic questions and reformatting it the way you need it. Only a matter of time until it gets good enough that you couldn’t do a better job yourself.
You’re looking at it in a flawed manner. AI has already been making up sources and names to state things as facts. If there’s a hundred websites for claiming the earth is flat and you ask an ai if the earth is flat, it may tell you it is flat and source those websites. It’s already been happening. Then imagine more opinionated things than hard observable scientific facts. Imagine a government using AI to shape opinion and claim there was no form of insurrection on Jan 6th. Thousands of websites and comments could quickly be fabricated to confirm that it was all made up. Burying the truth into obscurity.
You have plenty of literature that can act as ground truth. This is not a terribly hard problem to solve, it just requires actually focusing on it. Which so far simply hasn’t been done. ChatGPT is just the first “look, this can generate text”. It was never meant to do anything useful by itself or stick to the truth. That all still has to be developed. ChatGPT simply demonstrates that LLM can process natural language really well. It’s the first step in this, not the last.
Sounds like you’re arguing against yourself, now.
I think it’s just a new world for spam.
At some point, probably soon, AI content will generate so much data it becomes untenable to store all the scraped data.
We’ll also reach a point where it becomes much more costly to parse the data for AI spam+trustworthiness+topics. If you need LLMs just to filter spam, that is a large step up in costs and infrastructure vs current methods.
When that happens what happens to search? The quality will have to degrade or the margins will drop off sharply.