Nearly all of the Google images results for “baby peacock” are AI generated

This was probably always the likely outcome of an internet economy that revolves around the production and monetization of “content”.

We started by putting advertisements on existing content, then moved to social networking and social media, which was essentially an engine for crowdsourcing the production of greater amounts of content against which to show advertisements. Because money is up for grabs, producing content is now a significant business, and as such, technology is meeting the demand with a way to produce content that is cheaper than the money it can make.

The problem of moderating undesirable human-generated content was already starting to intrude into this business model, but now generative tools are also producing undesirable content faster than moderation can keep up. And at some point of saturation, people will become disinterested, and tools which could previously use algorithmic heuristics to determine which content is good vs bad will begin to become useless. The only way out I can see is something along the lines of human curation of human-generated content. But I’m not sure there is a business model there at the scale the industry demands.

> We started by putting advertisements on existing content, then moved to social networking and social media, which was essentially an engine for crowdsourcing the production of greater amounts of content against which to show advertisements.

I see a lot of people talk nostalgically about blogs, but they were an early example of the internet changing from ever green content to churning out articles on content farms. If people remember the early internet, it was more like browsing a library. You weren’t expecting most sites to get updated on a daily – or often even a monthly – basis. Articles were almost always organized by content, not by how recent they were.

Blogging’s hyper-focus on what’s new really changed a lot of that, and many sites got noticeably worse as they switched from focusing on growing a library of evergreen content to focusing on churning out new hits. Online discussions went through a similar process when they changed from forums to Reddit/HN style upvoting. I still have discussions on old forums that are over a decade old. After a few hours on Reddit or HN, the posts drop off the page and all discussion dies.

Also with some blogs we started to attach content to personalities, which was different than consuming content from another internet stranger.

And with personalities you have some form of relation to, you want these more recent updates instead of sticking to topics of interest.

Reddit is at least still focused on topics instead of people. I think this is why for some it still is more interesting than platforms like Insta, Facebook or Twitter.

Blogs were great when they supported RSS, you could subscribe to feed and get updates if they happened every day, or randomly months or years in the future. There was no need for refreshing to see if there was something new.

That’s a fascinating perspective. I imagine A blog should do something like press releases and describe and progress made on the actual website or plans for-. Forums should then play with ideas and chat is for hammering out details that are hard to communicate or overly noisy and for talking about stuff unrelated to the project.

> I imagine A blog should do something like press releases and describe and progress made on the actual website or plans for-

A lot of older websites actually used to do this with a “what’s new” section or page. With blogging, “what’s new” became the entire site, with almost the entirety of the content (everything that wasn’t new) now hidden.

Ironically, after mentioning that discussion dies off incredibly quickly when HN stories fall of the front page, this discussion was moved off the front page to a day old discussion. My guess is that almost no one will see it now.

Ive seen it! But yeah, I typically only browse the best stories section, basically I really want some human curation in my feed which might help with the vast amount of infogarbage generated by llms?

There isn’t, because human trust can barely scale past 100 people, much less the entire internet. I think humans will recede into the tribes we were always built to understand and be a part of. Private group chats are far more popular and ubiquitous than we give them credit for, and I only see that accelerating in this climate.

“Echo chambers” have been the default for almost all of human civilization right up until about 10-20 years ago. You communicated with your immediate circle of friends and coworkers rather than arguing politics with LLM bots on Twitter.

on the contrary.

Good fences make for good neighbors.

Its not a coincidence that the printing press brought devastating war to europe in the form of the wars of reformation (1) .

The internet is another real tool for knocking down fences for free, by anyone. Its only a matter of time when there’s pushback by angry fence-owners.

We absolutely need less friction and more of minding our own business and focusing on our own back yard instead of chiming in on someone thousands of miles away.

Don’t get me wrong, I’m 100% for the free flow of information, but what people (HN crowd?) don’t understand is that a significant subgroup of humans cannot tolerate relentless change or challenges to their worldview for too long.

(1) https://en.wikipedia.org/wiki/European_wars_of_religion#Defi…

At some level there is always a private mode. Think family and friends. Do you not have any issue with everything being public? I think the parent suggests we’re not made for very large groups and I kind of agree. I can’t name 100 people I know or known in my life. Maybe I can (barely) but with great effort.

Im not sure all private chat groups are really private. Maybe some are but can’t help thinking the industry isn’t at least running AI on private chats and summarize what people are talking about.

> The only way out I can see is something along the lines of human curation of human-generated content.

That’s retweets.

> undesirable human-generated content was already starting to intrude into this business model, but now generative tools are also producing undesirable content faster than moderation can keep up.

> people will become disinterested, and tools which could previously use algorithmic heuristics to determine which content is good vs bad will begin to become useless.

So what these parts are saying is, tiny monoculture of bored college kids are always going to figure out the algorithm and dominate the platform with porn and spams and chew up all resources, and that both improved toolings and tie-in to monetary incentives intended to empower weaker groups to curb kids only worsens the gap, and that that’s problematic because financial influencers are paying to be validated by the masses, not to be humiliated by few content market influencers.

But what is the problem with that? Those “undesirable content” producers are just optimizing for what market values most. If that’s problematic, then the existence of the market itself is the problem. What are we going to do with that? Simply destroying it might make sense, perhaps.

Human curation is possible in an open system, but when you have a few large silos this algorithmic efficiency is put to use and we can observe the result. But I agree and hope people will lose interest and stop consuming trash. The gamble on the other side is that people will get used to poorer and poorer algo served content and the industry will continue squeeze profit out by any means necessary and indefinately. By looking at the history of cable television it appears there is a breaking point.

>This was probably always the likely outcome of an internet economy that revolves around the production and monetization of “content”.

hasn’t publishing since Gutenberg been driven by the monetization of content? Looking at the history of the Catholic Church, potentially before that too.

I’ve been idly wondering about something like the Web of Trust. A social network where users vouch for one another’s actually-a-real-humanness. There could be setting that let you adjust the size of the network you see (people you’ve actually met? One remove from that?)

What you’re describing is early Facebook. Your feed was only from your 1st degree connections. Content mattered because it was from people you cared about (and inherently knew, because users wouldn’t accept friend requests from people they didn’t know). It really was the pinnacle of social media.

> But I’m not sure there is a business model there at the scale the industry demands.

This is the kicker. When unfettered by regulation or leaders/workers with morals, most industries would rather avoid human curation because they want to sell you something. Amazon sellers would rather you not see or not trust the ratings because they want you to buy their stuff without knowing it’s going to fall apart. Amazon makes a profit off it, so they somewhat encourage it (although they also have the dual pressure of knowing that if people distrust Amazon enough they’ll leave and go somewhere else, so they have to keep customers somewhat happy).

No, curation has to come from individuals, grassroots organizations, and/or companies without a financial interest in the things being curated – and it has to revolve around a web of trust, because as Reddit has shown, anonymous curation doesn’t work once the borderline criminal content marketers find the forum and exploit it.

> The only way out I can see is something along the lines of human curation of human-generated content.

…however, unfortunately, curation doesn’t solve the problem of people desiring AI-generated content. That’s a much harder problem. Even verifying that something was created by a human in the first place is hard. I don’t want to think about that. I’m just going to focus on curation because that’s easier and it’s also incredibly important for the lowering quality of physical goods as well.

No offense and I understand, but that use of “AI-generated content” sounds like somewhat of an euphemism. I think there are not significant number of people who specifically prefer AI generated versions, but rather it’s referring to certain kind of content that the attempt to democratize and trivialize its generation by releasing AI models had completely backfired.

This distinction is important, because while AI is faster than humans, it’s at best cheap gateway drugs into skilled human generations.

It’s not just images. I frequently get genAI word salad in the top three to five results when I google anything that could be considered a common question. You don’t even realize at first when you start reading. Then it makes you start to question the things that aren’t obviously genAI. You can sort of tell the kinds of things that a human might be wrong about, the ways in which they’re wrong, how they sound when they’re wrong, how likely they are to be wrong, the formats and platforms wrongness exists within, how often they are wrong and how other humans respond to that. AI is a different beast. No intuition or experience can tell you when reasonable-sounding AI is wrong.

Our entire framework of unconscious heuristics for ranking the quality of communicated information being rendered useless overnight may be a recipe for insanity and misery. Virtually nothing has made me this genuinely sad about technology in all my life.

Tbh I think this is just it for the public internet. It’s not Google that’s failing, it’s the substance of the public internet that has failed. Whenever I need help or questions answered on something, I don’t google it, I don’t post on public forums, I ask on private group chats where I know everyone is a real person, no one is making money, no one is copy pasting chatgpt to collect internet points to sell their account later.

There is only one way I can see things changing and people aren’t going to like it. All content on the internet gets linked to a legal ID. Every post on facebook, every comment can be attributed to a real person.

I don’t think the heuristics are that different. SEO-spam and BS content existed before, and both Google and YT were full of them, all made by human “content creators” who optimized for clicks and focused on gaming the YT recommendation system. AI content isn’t that different. But unfortunately it’s now 100x easier to generate such content, so we see a lot more of it. The problem is fundamentally a problem of incentives and the ad-based business model, not a problem of AI. AI has made the enshittification problem a lot more visible, but it existed before.

I don’t know what the solution here is. My guess is that the “public internet” will become less and less relevant over time as it becomes overrun by low-quality content, and a lot of communication will move to smaller communities that rely heavily verifying human identity and credentials.

Almost all of the “product X vs Y” results are AI ramblings now. This growth of the dead Internet is making me want to sign up for Kagi. We’re going to need a certification for human generated content at some point.

Kagi is not a panacea unfortunately. I pay for it and daily drive it to support a Google alternative, but I still have real trouble with my results being full of AI garbage (both image and text search).

As mentioned, product comparisons are a big one but another worrying area is anything medical related.

I was trying to find research about a medicine I’m taking this week and the already SEO infested results of 5 years ago have become immeasurably worse, with 100s of pages of GPT generated spam trying to attract your click.

I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.

That matches my experience. Kagi doesn’t surface much content beyond what Google/Bing do. What it does better out of the box is guessing which content is low-quality and displaying so that it takes up less space, allowing you to see a few more pages worth of search results on the first page. And then it lets you permanently filter out sites you consider to be low quality so you don’t see them at all. That would have been awesome 10 years ago when search spam was dominated by a few dozen sites per subject that mastered SEO (say expertsexchange), but it is less useful now that there are millions of AI content mills drowning out the real content.

For content that isn’t time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I’ve setup a Firefox keyword searches(1) for this, and find myself using them for the majority of my searches, and only use normal search for subjects where the information must be from the last few years. It does penalize “evergreen” pages where sites continuously make minor changes to pages to bump their SEO, which sucks for some old articles at contemporary sites, but for the most part gives much better results.

(1) For example: https://www.google.com/search?q=%s&source=lnt&tbs=cdr%3A1%2C…

> For content that isn’t time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I’ve setup a Firefox keyword searches(1) for this, and find myself using them for the majority of my searches…

OMG. I’m so happy how much AI is improving our lives right now. It really is the future, and that future is bright.

Thanks guys!

I use Kagi personally every day and my results are definitely not full of AI garbage so would like to better understand your context.

Have you reported any of those issues to Kagi (support/discord/user forum)? We are pretty good at dealing with search quality issues.

> I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.

I’ve been doing this for years now. The normienet as I call it is nigh worthless, and I don’t even bother trying to find information on it.

I’m a little sad for anyone who didn’t get to experience the Internet of the twentieth century. It was a unique point in time.

I’m ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I’m tired of the advertising ‘net.

I’ve said it before and say it again, I firmly think AOL was just ahead of its time.

Bring it back. Charge me 10 or 20 a month. Give me the walled off chatrooms, forums, IM, articles, keywords, search, etc. Revamp it, make it modern. And make a mobile app.

Everyone wanted a free and open Internet, until AI and the bots ruined it all.

> until AI and the bots ruined it all

Well… advertising as a business model ruined it all. They get paid for getting page views, so the business model optimizes for maximum page views at minimum cost of creation. This is the end result of what Google and Facebook have spent the last 20 years building.

But I’m sure the engineers who built all this have very nice yachts, so it’s all fine.

With every passing day I basically think this is going to be the future of the internet. Many disparate private/semi-private groups while the “public” internet becomes overloaded with AI slop.

It’s largely already happening in places like Discord.

I think the first company that can capture what Discord has but is not wrapped in that “gamer aesthetic” ui/ux is going to do really well.

>Many disparate private/semi-private groups while the “public” internet becomes overloaded with AI slop.

And eventually someone is going to come up with a search engine for Discord, and the cycle will start all over.

I don’t think it works today. You have walled communities in Discord and messaging apps, but if you are looking also for the degree of anarchy in those days, today we know that your everyday person can make money off the internet but you didn’t know that then, and I think that colors a lot of the experience.

i think the problem with revamping Q-Link/AOL into the be-all end-all for everyone (thats human) youre gonna hafta pump the prime with AI chatbots to give the appearance of lots of people to draw you in, kinda like how reddit admins pumped the prime by making tons of posts early on. just a little light treason.

It still exists. Currently it looks like Patreons and their associated communities, long-running web forums, small chatrooms on platforms like Discord or Facebook or Instagram, and so on. Small communities, with relatively high barriers to entry.

That’s not the same Internet.

Patreon, forums, Discord, Facebook, Instagram and so on are all centralized.

With the Internet of the 90s, discussion happened with decentralized Usenet (owned by nobody) with more-focused discussion happening on mailing lists (literally owned by whoever was smart enough to get Majordomo compiled and running).

Email was handled by individual ISPs or other orgs instead of funneled through a handful of blessed providers like Gmail.

Real-time chat was distributed on networks of IRC servers that were individually operated by people, not corporations.

Quickly publishing a thing on the web meant putting some files in ~/public_html, not selecting a WordPress host or using imgur.

Ports 80 and 25 were not blocked by default.

Multiplayer games were self-hosted without a central corpo authority.

One could construct an argument that supports either way being better than the other, but the the Internet of today is not the same thing as it was a quarter of a century ago.

(Anyone can make a “discord server,” but all that means is that they’ve placed some data in some corpo database that they can never actually own or control.)

> Multiplayer games were self-hosted without a central corpo authority.

Losing “dedicated servers” was a huge loss in my opinion. It was fun to play on the same handful of servers and get to know the same group of people. Dedicated servers were also free from profit-driven “matchmaking” schemes since you ended up playing with whoever was on at the time.

I also miss the chaotic multiplayer of the era, where the priority was having fun and not improving your rank on the leaderboard.

Edit: Dedicated servers also each had their own moderation. So if you wanted to play on a server that banned anyone nasty you could. But you could play with a bunch of folks dropping “gamer words” as well.

Then there were custom sprays – something no company would allow in their game today. It’s sad how constrained and censored online gaming experiences are today. In many games even dropping the “f-bomb” can get you banned and typing assassin in the chat window yields “***in”.

It’s crazy to me that custom sprays existed at all in any capacity for so long. They were novel and sometimes funny but man, I really don’t miss the days at all of playing a game and having your roommate/family ask why your computer screen is plastered with goatse or meatspin or the like. I remember more than a handful of awkward conversations trying to explain WHY that wasn’t actually part of the game itself.

All of this still exists today though. It hasn’t disappeared, it has only been drowned.

If you want to find it again, change your search engine. Use wiby.me, or search.marginalia.nu. Subscribe to RSS feeds of sites you find interesting, and go from there. Hop on gemini. Subscribe to some activitypub accounts (you don’t even need an account for that). Communal IRC servers still exist, forums still exist, independent emails still exist.

Stop falling for the overall doomism HN is so quick to fall for, start doing and living what you want

That’s a pretty narrow view. My Internet in the 90s and early 2000s was on AOL Instant Messenger and MSN Network, Yahoo! Mail and phpBB web forums hosted by random people with pictures hosted on PhotoBucket. Plenty centralized.

The technology doesn’t matter, it’s changing all the time. The difference is the size of the community, and the barriers to entry. If these communities were easy to discover and join and hard to be booted out from, they wouldn’t be protected from The Slop. It used to be just getting on the Internet was the barrier. Now we need new barriers: paywalls and word-of-mouth and moderation.

I have no reason to doubt your experience.

But AOL IM and MSN were centralized walled gardens that came rather late to the game, and neither PhotoBucket nor phpBB existed at all in the 20th century that is the context here.

> I’m a little sad for anyone who didn’t get to experience the Internet of the twentieth century. It was a unique point in time.

I did, and…well, let’s be careful how we look back at it.

Punch the monkey? Ad supported ‘free’ internet that literally put an adbar at the top of your browser at all times? Dreadfully slow loads of someone’s animated construction sign GIF? Waiting for dial up to connect after 20 tries? Tracking super pixels? Java web applets? Flash? Watching your favorite ISP implode or get bought up? To say nothing of the pre-Google search results (I miss the categories though).

I have plenty of good memories from those days, but it still had plenty of problems. And it wasn’t exactly a bastion of research material either unless you really went digging or paid for access.

Would you pay a nominal amount (like 5 cents or 25 cents) to consume one piece of good, ad-free content, assuming that there was no login, no account, no friction, etc? You click, you read, and 5 cents is magically transferred from you to the writer?

I would. But I’ve asked a lot of people who say “no, I don’t want to pay when I can read it for free. I don’t mind the ads that much.”

the problem with that is not the payment, it’s that you will only be sharing it with people similarly willing to pay for a walled garden. I’m guessing most of what we’re nostalgic for was created by people who wouldn’t be up for that

It was created by people that

1. could afford a computer back then and saw the utility of owning one.
2. had access to the internet, so either in college, a ‘tech’ company, or ties to some local collective that provided access.

When people say ‘the old internet’ they are referring to a very self selective/elite group.

> I’m a little sad for anyone who didn’t get to experience the Internet of the twentieth century. It was a unique point in time.

Sadly, they won’t know what they were missing. It’ll be the new normal

Some asshole tech apologist is probably getting ready to post that section from Plato where Socrates complains about writing any minute now.

Of course, that asshole is oblivious to the fact that most if not all of us probably just don’t understand what Socrates was missing, so he’s just showing his ignorance and stupidity.

> I’m ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I’m tired of the advertising ‘net.

The problem is that, even if you try to do that, the incentives are probably still aligned against you, just maybe less blatantly.

Just look at how many formerly ad-free paid services are adding ads, and how hardware users literally own acts against their interests by pushing ads in their faces (e.g. smart TVs).

The guy who runs the walled garden will always be tempted to get some extra cash by adding ad revenue to your subscription feed, or cut costs by replacing human curated stuff with AI slop (maybe cleaned up a bit).

> I’m a little sad for anyone who didn’t get to experience the Internet of the twentieth century.

I’m a little sad for anyone who didn’t get to experience of pre-Internet era.

Internet is lead of our time.

“The internet” isn’t for or against anything, it’s just a vast computer network. It’s the humans with an agenda that exploit the internet (specifically the web) that are against you.

You’re correct of course leptons, but do please give credence to
important shorthand. As I’ve put it before, the Internet is the
battleground. Ground is not necessarily “neutral”. It favours certain
forces and tactics. See the “Nine Situations” (0) We used to occupy
type-5 (open ground) which was also mostly type-2 (dispersive). We are
now on “serious” and “difficult” ground. Therefore the environment
itself is hostile. Most of what people commonly think of as “The
Internet” – that’s not your friend any more.

(0) https://suntzusaid.com/book/11

> It really is a werid feeling remembering the internet of my youth and even my 20s and knowing that it will never exist again.

User facing ability to whitelist and blacklist websites in search results, ability to set weights for websites you want to see higher in search results.

Spamlists for search results, so even if you don’t have knowledge/experience to do it yourself, you can still protect them from spam.

It’s recreation of e-mail situation, not because it’s good, but because www is getting even worse than e-mail.

A mesh network on top of IP with an enforcable license agreement that prohibits all commercial use would suffice to get the old net back. Bonus points if no html/css/js is involved but some sane display technology instead.

No way. What you are describing is Gemini, but even more niche – a place which is explicitly walled-off from the “big net”, which only nostalgic people with right technical skills and a desire to jump some hoops can get to.

This is not going to work – as time progresses, there will be less and less nostalgic people who are willing to put up with that complexity. And “non-commercial” part will ensure that there _never_ be an option to say: “I am tired of fixing my homeserver once again, I am going to put up my site to (github|sourceforge|$1 hosting) and forget about it”.

Compare to early web. First thing that came to my mind was Bowden’s Hobby Circuits site (0). It’s designed for advanced beginners – simple projects, nice explanations. And there are no hoops to jump through – I’ve personally sent the links to it to many people via forums, private emails, and so on. It apparently went down in 2023, but while it was still up, I remember regularly finding it from google searches and via links from other pages.

(0) https://web.archive.org/web/20220429084959/http://www.bowden…

I’m not sure I like your parent’s idea, but it isn’t like the regular internet would go away… when you want useful, go there.

I wouldn’t mind a modern take on geocities sorta system. Where: (1) You can make a webpage that could be about whatever the bleep you wanted it to be about. (2) Only allowed a reduced subset of web technologies. (3) That was free from any advertising or commerce/sales. (4) Was only available to individuals or businesses no larger than closely-held corporation. (5) had clear limitation on AI uses. (6) Had a complete index, categorized and tagged, of all the sites available.

But if I am being honest, that is just the nostalgia for the old internet talking.

I’m already treating the WWW and the commercial internet generally as “Babylon”. You have to use it for a lot of stuff (doing commerce, interacting with the government), but why would you willingly use it on your own time?

I only just put it together but Peter Watt’s Rifters series is some epic earth grimdark hard-sci-fi, the first taking place as practically horror, confined deep under water.

But my point is, the latter books have this has amazing post-internet, just a ravaged chaotic Wildlands filled with rabid programs & wild viruses. Packets staggering half intact across the virtualscape, hit by digital storms. Our internet isn’t quite so amazing, but I see the relationship more subtly with where we have gone, with so so so many generated sites happy to regurgitate information poorly at you or to sell you a slant quietly. Bereft of real sites, real traffic. Watts is a master writer. Maelstrom.

First book Starfish is free. https://www.rifters.com/real/STARFISH.htm

> This growth of the dead internet

It is quite surreal to witness. It is certainly fueled by the commercialization of internet due to ads and centralization to user hostile platforms.

The old internet seems to be doing much better. But it lost most of its users in the last 15 years…

The old internet seems to be doing much better. But it lost most of its users in the last 15 years..

What do you mean by this? How do you find the old internet?

You don’t find them easely. That is the point I guess. But I am not reffering to some obscure darkweb here.

Many of e.g. the old niche forums still exist. Like, FOSS sites. GNU project sites seemed not to have aged a day in 20 years, i.e. still party like its 2004.

Also, I think non English sites are better off since Reddit mainly ate English communities and sites.

Facebook is probably what killed most of the living internet. Small community sites. Like the local Kennel club or Boat marina.

A good example of the old internet would be Matthew’s Volvo site:

https://www.matthewsvolvosite.com/forums/search.php?search_i…

Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.

> Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.

The whole point of the API access change was to charge AI model-makers. I’d be ironic if the API change made destroyed their product and made their data unsellable.

Yes, everyone warned Spez about that at the time. He didn’t care, he wanted that IPO.

Recently, they came around looking to recruit me. I told them fire Spez or fuck off. (17 year Reddit user here)

If you know anyone who works in marketing/PR, ask them how they use Reddit. That has been gamified as much as SEO since about 2020. I’m assuming, anything except “why is there a fire in this street?” kind of posts are just ads at this point.

Fair. Exceptions definitely exist, but unless it’s location based subreddit, I, personally, wouldn’t trust it. There are fun methods that I’ve been told about. Like marketing companies maintain multiple very real life looking accounts, participating in discussions for months/years with no product affiliations, and very casually throwing in plugs which eventually generate revenues. Or giving a list of 10 items, and inserting their product in between, as a sale is better than no sale. Or marketing a competing product in a very obvious way, then replying from another account telling how it’s an ad to eventually lead sales to themselves.

I have no idea how intensive these are, as I most of these over bar drinks when I was traveling. Could’ve just been someone making stuff up as well. But my marketing friends have confirmed that they use Reddit very heavily, as it’s a great sales funnel if you play it right.

It’s also not much use to anyone who doesn’t use Google ever since Reddit started blocking all crawlers besides Googlebot. Old cached results might still show up in Bing/DDG/Kagi but they can’t index any of the newer stuff.

It’s surprising how many times you see this pattern on HN

“Google sucks!”(50 upvotes)

“That’s why I use Kagi!”(45 upvotes)

“Actually Kagi has the exact same problem and you have to pay for it.”(2 upvotes)

Unfortunately, as much as I do like Kagi overall, it goes out of its way to inject AI slop into the results with its sketchy summarization feature

Most product reviews are simply pumping amazon comments into AI to generate a review. with a final “pros/cons” section that is basically the same summary amazon AI generates.

Whether something is human generated is (mostly) beside the point. The problem is that spam is incentivized today. Any solution must directly attack the financial incentive to spam. Therefore what’s needed for a start is for search engines to heavily downweight ads, trackers, and affiliate links (obviously search engines run by ad companies will not do this). Shilling (e.g. on reddit) should be handled as criminal fraud.

Couldn’t reproduce – in fact, the second hit is a threads version of the same post – but I get no AI suggestions for this query. Humorous Google queries (or AI queries more generally) are definitely a trope, so I can never really tell if they actually happened or if it’s all for karma.

Google also routinely removes AI suggestions for searches that produce embarrassing results (you don’t get them for searches about keeping cheese on your pizza anymore, for example), so it’s even harder to validate once a result goes viral.

I still get the second one when I search “Difference between sauce and dressing” on Google. The Oven vs Ottoman empire one I don’t get an AI overview.

Edit: Similar to the second one I just did Panda Bear vs Australia which informed me “Australians value authenticity, sincerity, and modesty. Giant pandas are solitary and peaceful, but will fight back if escape is impossible.
“

I’m glad that Kagi (and others) exist as an alternative for people who don’t want generative AI in their searches.

Personally, I’m excited about more generative AI being added to my search results, and I’ll probably switch to whichever search engine ends up with the best version of it.

This peacock thing was the last straw for me. I installed Kagi just moments ago.

And of course the first image for “baby peacock” is the same white chick thing… obviously because this story is making the rounds —_—

AI tools on the search page: sure, cool. I use perplexity a lot, actually. I’m in favor of this.

Search results that are full of content mills serving pre-genned content: no thanks. It’s in the same category as those fake stackoverflow scrape sites.

Not sure if you’re being sarcastic, but they’re not talking about AI features of the search engine itself (Kagi has those too), but about nonsensical AI generated content on the web that exist solely for the purpose of getting you clicking on some ads. Kagi tries to make those sites stand out less on the search results.

But I can generate something with AI and then sign it myself and say “I wrote it, pinky promise”.

Since most people don’t write anything on the internet, I can pay people $5 to use their signature and operate a “sign farm”.

Look at the effort these people go though to send their spam.

A web of trust or reputation based system can be built on top of the signature scheme – maybe an appearance of smaller invite-only forums that share reputation. Or maybe it will just become an integrated part of the moderation of existing platforms like Reddit, Mastodon and Bluesky

If you put your name on AI spam people will flag your post and no-one will bother to see your posts for at least a few years.

And if people don’t like you what you’re talking about then people will flag your post too. This is not going to work at any serious scale – certainly not across the internet – because abuse will be rampant.

I’m not really sure if I follow your suggestion; what would be done with the camera? Authenticate that a picture is real? I’m not sure how that will be workable? For example, how would I (random internet person) verify a picture from you (also random internet person) is real?

Sure, but we’re working with a real person’s reputation. If they are willing to gamble with AI, that’s their problem.

I’m hoping for a “this human being validated this message”. I think that alone could solve multiple problems.

I’m not sure human-generated content is any better on the whole. BS-laden drivel has been pervasive for some time now, even before AI started taking over.

I’m talking about those 300-word, ad-ridden crap articles that are SEO’d right to the top, and if you’re lucky you might get the 3-word answer you were looking for: “<300 words of shit>… and in conclusion, <1-step answer>.”. Anyway, humans have been getting paid pennies to write those for a while.

AI just turns the throughput on that up to 11, where there’s just no end in sight. I think this is like the primary failure mode of AI at this point. It’s not going to kill us – we’re going to use it to kill the internet. OTOH, maybe then we just go outside and play.

In the world of content moderation, we refer to this as constructive friction. if you make it too easy to do a thing, the quality of that thing goes down. Difficulty forces people to actually think about what they are writing, whether it is germaine and accurate. So generative AI, as you point out, removes all the friction, and you end up with bland soup.

Perhaps you’re thinking of the Wikimedia Foundation.

There is plenty of space there for more volunteer editors to verify content, and likewise, WMF operates its own cloud platform where developers are automating tools that do maintenance and transformation on the human-contributed content.

Then, there is Wikidata, a machine-readable Wiki. Many other projects draw data from here, so that it can be localized and presented appropriately. Yet, its UI and SPARQL language are accessible to ordinary users, so have fun verifying the content there, too!

I don’t think you understand what I meant by human verified, but I used a very vague term to express what I meant, I meant proving that some input or data that comes from a user was generated by a human (whatever we define that to mean) rather than an LLM or multi-modal image/video/audio model output.

> We’re going to need a certification for human generated content at some point.

People keep saying this and I keep warning them to be careful what they wish for. The most likely outcome is that “certification of human generated content” arrives in the form of remote attestation where you can’t get on the internet unless you’re on a device with a cryptographically sealed boot chain that prevents any untrusted code from running and also has your camera on to make sure you’re a human. It won’t be required by law, but no real websites will let you sign in without it, and any sites that don’t use it will be overrun with junk.

I hate this future but it’s looking increasingly inevitable.

There’s ways to do this without destroying anonymity. Ideally, you verify you’re human by signing up for some centralized service in real-life, maybe at the post office or something. And then people can ask this service if you’re real by providing your super-long rotating token. So, just like an existing IDP but big.

Before AI, product comparison sites were ramblings of interns paid by people who found out you could make money from SEO-optimized blogs.

And long before the Internet, people slapped random concoctions together and sold them as medicine, advertising them as cure-alls.

Any source of content can be controlled or manipulated in non-obvious ways. And we already have strong algorithms for manipulating human attention (resulting in the growth of non-falsifiable conspiracy theories, for one). There is no clear approach leading out of information dystopia.

Drives me nuts. The internet is dead.

I just bought a home and I have been googling the best way to tackle certain home improvement projects – like how to prepare trim for painting. Virtually every result is some kind of content farm of AI-generated bullshit with advertising between every paragraph, an autoplay video (completely unrelated) that scrolls with the page, a modal popup asking me to accept cookies, a second rapid-fire modal popup asking me to join the newsletter to “become a friend”

For better or worse, Reddit is really the only place to go find legitimate information anymore.

For this kind of search, YouTube and TikTok (yes, TikTok) are your best bet. Videos are not (completely) flooded by AI (yet) and you can find pretty much anything about manual work.

I prefer text content to videos by a long shot, but genuine, human text content is almost dead. Reddit might be one of the rare exceptions for now. There are also random, still active, old school forums for lots of things but they tend to become extremely hard to find.

Gaining information from a video (often just someone talking into their phone) feels like sucking a milkshake through a coffee stirrer compared to reading a forum post written by a human. Worse, you can’t see how deep that milkshake is at a glance, so you may end up with just a sip from a melted puddle vs. the big volume of content you wanted.

You are right that youtube is better but so much of that content is also biased towards sponsors. At least the good instructional content with high production value tends to be very heavy on sponsorships. The indie stuff can be great, but you are gonna have a 720p shaky camera with terrible lighting and lots of umms and backstories about why I am redoing my vintage farmhouse (a-la the recipe meme where every recipe page has a 32 paragraph preamble before the actual recipe)

For what it’s worth, the last time I had a home improvement project I needed youtube help with, the one-and-a-half minute mumble-tronic video shot on a Nokia brick-phone was the most helpful one.

Would I have preferred a nice 1080p, shot in good lighting on a flat white table? Yes. But those also tend to be 30 minutes along, and as you said, with a sponsorship for HurfDurfVPN in the middle.

Well why do you expect people to teach you for free how to do home improvement? Those people who know how to do it well are working with it and you can pay them to improve your home.

AI-generated youtube videos are here too, although they’re fairly easy to spot for now. The general formula seems to be a bunch of stock images / AI-generated images / stock footage relevant to the video title, with a LLM-generated script read out by an elevenlabs-style voice.

Owner/builder here, of a 1939 home. I invested in a home reference library part way through my own improvements; I should have done it before even lifting a screwdriver. Renovations (https://www.bookfinder.com/search_s/?title=Renovation%205th%…), from Taunton Press, is the first source I consult when starting a home improvement project. Chapter 18 is all about painting. Many of the other titles from Taunton are excellent, but Renovations is unmatched in it’s coverage.

All of the flat white MDF trim you buy is primed and ready for painting, too.

I have an older car and a newer car. I can find out how to do any repair on my old car because it existed during the old internet when people did all kinds of write ups.

The information on working on my new car is non-existent other than Youtube videos where the majority is just a random dude who knows nothing filming himself doing a horrible self repair.

Also your local library probably has a bunch of home improvement books. They’re probably from the 80s, but trim painting techniques don’t change that much.

What you can do in the library is sample a wide variety of the books in the topic you want, and soon you’ll identify an author, or publisher, as your favorite, and then you can go purchase more in their series, and consider donating them back to the library when you’re done with the project!

The video sites are gonna be way better for this. Or reddit. I don’t know how much longer that will be true with AI video generation becoming cheaper over time though

Get a general purpose home maintenance book.

For example, https://archive.org/details/stanleyhomerepai0000fine/page/14… links to the chapter “Painting Trim the Right Way” from the book Stanley Home Repairs, 2014.

Could also look at used book stores. Home repair hasn’t changed much.

Edit: Could even fire up Wine and try the CD-ROM “Black & Decker Everyday Home Repairs” (published by Broderbund) at https://archive.org/details/BlackDeckerEverydayHomeRepairsBr… . https://www.goodreads.com/book/show/3424503-everyday-home-re… says;

> Like its predecessor in book format, the CD-ROM version offers easy-to-follow, step-by-step instructions on more than 100 common household problems, from how to fix a leaky faucet to repairing hardwood floors. What’s more, the CD-ROM version incorporates animation and narration to help make the repair project even easier to understand and complete. Instructions can be viewed one step at a time or all at once, and, if desired, can be printed out and taken directly to the repair site. Included with each repair project is the projected time needed to complete the work, estimated cost, and a list of materials and tools needed.

That sounds pretty nifty, actually!

I think this and validated sources are the best direction.

A trip to the bookstore to buy “x for dummies” can save dozens of hours of web searching.

The current iteration of the internet and AI is lacking depth, detail, and expertise.

You can find 1 million shallow answers on reddit, or echoed in AI, but anything more than the most cursory introduction is buried.

Not only is shallow information easier to generate, it is what most users want, and therefore most engines and services cater to it.

To find better content,you need to go to specialty outlets that don’t cater to the lowest common denominator.

Yeah, I’ve had this experience as well. I’ll have to go 4 or 5 pages deep in the results to get to a forum thread someone wrote in 2005 referencing a product that doesn’t exist anymore plus a bunch of advice that’s mostly still applicable.

> For better or worse, Reddit is really the only place to go find legitimate information anymore.

This is frightening and, I fear, true.

But I’d also add one odd little counterpoint: some of the most useful discussions and learning experiences I’ve had in the last four years have happened in private Facebook groups. As soon as the incentive to build a following using growth-hacking and AI — which private groups mitigate to a greater extent — is taken away, you get back to the helpful stuff.

The FreeCAD group on Facebook is great, for example. And there are private photography groups, 3D printing groups, music groups etc., where people have an incentive to be authentic.

Public Facebook feeds are drowning in AI slop. But people who manage their own groups are keeping the spirit alive. It’s almost at the point where I think Facebook will ultimately morph into a paid groups platform.

Fitting that this is a copy paste submission taken from another source (linked in dupe comments), likely by a bot based on post history. The computers are turning on each other.

We can only hope they consume each other in some kind of survival-of-the-fittest type scenario, and when all is said and done, we can turn the last one off and set the clock back to 2015 and try again.

I have a hard time explaining why, perhaps because I did not know what a baby peacock looks like, but this somehow really drove home the “dark side of AI” for me.

I have gotten used to trusting search results somewhat. Sure there would be oddball results and nearly non-sensical ones, but they would be scarce through a sea of relevant images. Now with this, I would be blind to the things I don’t know and as someone who grew up it with Google “just being there”, it truly scares me.

Google is going to have to solve for this somehow if they want to remain relevant, right? If searching for an image and generating the image yield the same result, what’s the point of image search any more?

> what’s the point of image search any more?

The same could be said for regular search. Pretty much anything I search for yields a page of ads followed by pages of content farmers followed by pages of “almost sounds like experts but is still just a content farmer.”

Financing the Internet with advertising has really made it difficult to find good quality content. The incentives are completely misaligned, unless you are a ‘content creator’ or Google.

I think what comes next is interest-based, influencer-moderated, semi-private chat rooms. For example a lot of hobby youtuber have moderated discord servers. My diy 3d printing communities have discord servers. I have a few invite-only discord servers for various circles of friends and family.

Well, and once this kills the web then Google’s AI no longer has a data source its AI can tap to answer questions about anything that happens afterward, so it kind of needs the web to at least limp along.

What do you mean “solve”? This is solving the problem. If people see things they consider good enough, that’s all they care about.

Source: every other piece of news or social media on the planet.

The problem is – how does Google get paid for providing this service. In a way, the better the service they offer, the less money they make. It really sucks.

Would you pay money to Google or some other company in exchange for a genuinely good search service that prioritizes well written content, and avoids AI (or human) generated crapticles?

AI slop (or convincing lies) is not distinguishable from genuine, human-generated content. Machines definitely can’t do it, and humans often can’t either. That problem with get worse.

Why is that a problem, anyway? If machines can play chess better than any human, it is reasonable to assume that they can write articles better than many humans. What’s wrong with Internet filled with good content generated by AI?

AI can’t create good (meaning truthful) content. It’s literally impossible for LLMs to be hallucination free. It’s just not how they work.

The problem is going to get worse as hallucinations are used as training data because even the AI companies can’t tell the difference between AI content and human content.

I just did it on Kagi and except for the obvious stock.adobe.com ones, all of the AI generated images were from snopes and
media sites repeating this story, but I have quite a few sites blocked (pinterest is definitely nuked)

Would you care to share what you do instead? For search in particular, the g-suite, etc. are not such a big deal. I’m really hoping for something other than use duck duck go / bing / etc. because AFAIK they all serve advertisement funded trash and I’ve yet to hear a really compelling alternative, tho I’ve been too lazy/busy to try Kagi.

I know you said that you’ve been too lazy and busy to try it, but I am very happy with Kagi. If you don’t want a search engine that serves advertisement funded Trash , then I recommend supporting the search engine whose business model is to provide searches without ads via subscription.

My search lists are curated very well through my settings and even just using the recommended block list keeps a lot of junk out of my search results. If I find a bad site, I can block it from all future results pretty quickly. I also can use regex on the URL’s in the search result to redirect things like Reddit to old.reddit automatically. It’s very nice.

We need a slim p2p social network swarm protocol.

If we could subscribe and suggest content along our interest graphs, we would control the algorithm and could prune slop with ease.

It’d be incredibly awesome if news, forums, and social media worked like BitTorrent.

You don’t think the AI content creators will target the next search engine if Google fails? I don’t think Google WANTS to prioritize AI slop, they just are unable to not do it.

I really don’t think that’s the case. The lesson of the 2020s internet is that the biggest players have become too big to be disrupted.

The masses are fully here now. They’re too passive to know or care what’s going on. They stick with the path of least resistance: Google, Amazon, Reddit, Twitter, etc. No matter how hostile or shitty those options become.

We have to put aside the way we’ve thought about the internet before now because it doesn’t apply anymore. There will be no more MySpace -> Facebook. The internet is no longer made up of a high enough percentage of conscientious and deliberate users to make a difference.

It’s is more about Google search than it is about the internet.

There was a period in the past when human spam was a problem that was not trivial to solve.

As always, modern problems require modern solutions.

The most effective spam filtering is done not by content but by various white and black lists of providers. Essentially it is a trust score which is a very old solution to a lot of problems.

I don’t think “better spam detection technology” can help out of this even in theory. The whole point of LLMs is that, by construction, they produce content which is statistically indistinguishable from human text. Almost by definition, any statistical test to distinguish LLM text from human text could be turned around to make better LLMs.

Most egregious is the one copying the title from Snopes’ “Video Genuinely Shows White ‘Baby Peacock’?” (with the question mark cut off). A page all about how the picture isn’t a real baby peacock.

But also, if you search the more accurate term, “peachick”, you seem to get 100% real images, although half the pages call them “baby peacocks”.

And the first result is from Adobe Stock, who you might assume would have higher standards than Pinterest and TikTok, but here we are.

In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated (e.g., through tools like Notebook LM).

However, I’m uncertain whether audiences will truly enjoy this AI-generated content. Personally, I prefer content created by humans—it feels more authentic and engaging to me.

It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content. Unfortunately, current detection tools for AI-generated audio are still lacking – https://www.npr.org/2024/04/05/1241446778/deepfake-audio-det…

(Edit) We just put together a list of notebooklm generated “podcasts”: https://github.com/ListenNotes/notebooklm-generated-fake-pod…

Consider whether you’d enjoy listening to AI-generated podcasts. I believe people might be okay with shows they create themselves, but are less likely to appreciate ‘podcasts’ ai-generated by others.

>Personally, I prefer content created by humans—it feels more authentic and engaging to me.

I’d like to think that too, but I wonder how long – if at all – this will be true. I “want” to like human generated content more, but I suspect AI may be able to optimize for human engagement more, especially for simple dopamine inducing content (like tiktok videos). After all, we’re less complicated than we like to think.

>It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content.

This will never work, unfortunately. There’s no way to exclude rogue actors, and there’s plenty of profit in AIs pretending to be human. If anything, we will have to watermark/sign human generated content.

> In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated

It’s not helpful that you’re making a binary distinction here.

As an example, as much as 10 years ago, I would find Youtube videos where the narration was entirely TTS. The creators didn’t want to use their own voice, and so they wrote the script, and fed it into a TTS system. As you can expect from the state of the art at the time, it sounded terrible. Yet people enjoyed the videos and they had high view counts.

Are we calling this AI-generated?

We now have better TTS (without generative AI). Way better. I presume those types of videos are now better for me to watch. You may still be able to tell it’s not a human because the tone doesn’t have much variance. You’d probably have to listen for a minute or longer to discern that, though.

Are we calling this AI-generated?

Now with generative AI, we have voices that perhaps you won’t be able to identify as AI. But it’s all good as long as a human wrote the script, right?

Are we calling this AI-generated?

Finally, take the same video. The creator writes the script, but feels he’s not a good writer (or English is not his native tongue, and he likely has lots of grammatical errors). So he passes his script to GPT and asks it to rewrite it – and not just fix grammatical errors but have it improve it, with some end goal in mind (“This will be the script for a popular video…”) He then reviews that the essence he was trying to capture was conveyed, and goes ahead with the voice generation.

Is this AI-generated?

To me, all of these are fine, and not in any way inferior to one with a completely human workflow. As long as the creator is a human, and he feels it is conveying what he needed to convey.

I would love to take a first draft of a blog post, send it to GPT, and have it write it for me. The reason I don’t is that so far, whatever it produces doesn’t have my “voice”. It may capture what I meant to say, but the writing style is completely different from mine. If I could get GPT/Claude to mimic my style more, I’d absolutely run with it. Almost no one likes endless editing – especially writers!

My FAANG working spouse thinks that AIs and Robocallers should be mandated to identify themselves. She thinks a audible “Beep-boop” at the end of a sentence for calls and video would be appropriate.

It’s almost impossible now. NotebookLM really impressed me. I knew voice synthesis has gotten better than Stephen Hawking’s “voice” but I really wasn’t expecting having two realistic voices with emotions that even banter with each other. There is a bit of banality to them – they like to call something a “a game changer” practically every “podcast” and the insights into the material is pretty shallow, but they are probably better than the average podcaster already.

It’s impressive at first until you realise they’re practically ad libbing a script. They’re filled with all the same annoying American clichés (“you know me, I like x”, your aforementioned “a game changer”, plenty of “wow”). It would be impossible to listen to two in a row without realising how repetitive it is.

At Listen Notes, we recently removed over 500 fake podcasts generated by Notebook LM in just the past weekend.

It’s disappointing to see scammers and black-hat SEOs already leveraging Notebook LM to mass-produce fake podcasts and distribute them across various platforms.

After Google continued to make it progressively more difficult to use their Image search to navigate/download to the actual image I wrote an image search tool that could be hot keyed from your OS to search the google image repository and copy to clipboard in a fast manner using a custom Google Search Engine id.

About a year back I found that 90% of the results I was getting were AI generated, so I added a flag “No AI” which basically acts as a quick and dirty filter by limiting results to pre-2022. It’s not perfect but it works as a stopgap measure.

https://github.com/scpedicini/truman-show

Wouldn’t surprise me if in a few years Google, for certain keywords:

– autogenerates URLs (tha look legit)

– autogenerates content for such URLs (that look kinda legit)

All of this would be possible if one is using Chrome (otherwise the fake URLs wouldn’t lead to anywhere). Of course, full of ads.

Think about it, some people are not really looking for some web site that talks about “baby peacocks”. They are looking for baby peacocks: content, images, video. If Google can autogenerate good-enough content, then these kind of users would be satisfied (may not even notice the difference).

Maybe Google ditches the URL and all: type keywords, and get content (with ads)!

> would be possible if one is using Chrome (otherwise the fake URLs wouldn’t lead to anywhere).

Didn’t they do something like that with AMP. I recall that if you were using chrome and visited an AMP site from Google the address bar would say site.com even though the content was being served from google.com.

This phenomenon has been such a spur of motivation to start writing again. I love it.

The only way we can make sure the internet retains any goodness is by contributing good things to it. Passive consumption will rapidly turn into sub-mediocre drudgery. I suppose it already has.

Be the change you want to see, I guess. I’m a shitty writer, but at least I can beat the dissonant, bland, formulaic rambling of ChatGPT (here’s hoping, anyway).

I’m optimistic that a lot of us can keep something good going. We’ll find ways to keep pockets of internet worth visiting, just like we did before search engines worked well.

The irony of Google’s core value proposition (search) being rendered useless by a technology that Google is investing heavily into (AI). It’s a self-licking icecream cone of suckage.

I’m a part time maker and purchase a lot of designs off of Etsy to make into physical goods. I have to weed through so many AI images when purchasing designs off of Etsy now. I wish they required users to indicate if AI was used to produce the image so I could then filter them out.

It’s currently optional for sellers, Etsy says “This info won’t change what buyers see for now, but will be used to improve the shopping experience in the future.”

Hear me out: is that Wikipedia? I am sure people are submitting all sorts of AI-generated information, but it’s probably getting rejected? (If someone better informed than me has any data one way or the other, I’m super curious)

There was this image that was circling some 20 years ago around and later, with the Internet becoming a cable tv-like service where you’d be a subscriber to particular big companies sites and additional “free-range” pages

So the pessimist in me can see the Internet being affected by the free-vs-premium formula: “basic” Internet with ads, tracking, AI fillers, limited access to +18 content, in the worst form comes with these pre-defined sites and “premium” that’s free of these limitations but it also in time tries to squeeze more money from users – like “premium but with ads”

I sense Reddit is a lost cause. Even before the latest wave of generative AI you could tell things were heavily manipulated.

I dare say that I haven’t noticed that much of a change in things and that could either be because LLMs are just that good at Reddit content, or that because Reddit was already so botted and manipulated it didn’t really change much.

Reddit started to decline dramatically once they started to charge for API access last year. Mods on a politics subreddit I was talking to said all the free tools they used to keep on top of things stopped working, so they could no longer filter out the trolls and bots.

I sure hope the money that Reddit made makes up for the readers who are fleeing.

LLMs are reinforced through adversarial training – you would essentially be playing a keep-up game with AI generated garbage that would get exponentially more difficult to pull ahead in.

There are a number of ways this might get solved, but I would speculate that it will generally be solved by adding image metadata that is signed by a certificate authority similar to the way SSL certificates are assigned to domains.

I think eventually all digital cameras and image scanners will securely hash and sign images just as forensic cameras do to certify that an image was “captured” instead of generated.

Of course this leaves a grey area for image editing applications such as Photoshop, so there may also need to be some other level of certificate base signing introduced there as well.

So, I was drawing an eagle for a new imprint, and I needed a reference for good looking claws. So I used my Google Images search shortcut to get pictures of eagles, and it was almost all AI. If you ask yourself the question, eagle claws suffer from the same problem that human hands go through with AI, so it’s completely useless.

Yandex images search is flawless though.

This is why Google took down its cached results. It’s going to horde pre-LLM internet data. Perhaps sell it but I doubt it.

Our best bet is to have scraped all that data, and give you a temporal parameter to search, like:

+”Sponge bob” year:2012

Ever since Sora I’ve been thinking about the overall death of the internet “content”. It all came back stronger with Meta Movie Gen.

I know there are no girls on the internet, but this AI crap is on another level. Even if find a trustworthy creator, I might be seeing a fake video of them. Say I like MKBHD reviews, I will need to pay attention if I am really watching his video on his official channel.

My guard will have to be up so much, all the time, I actually don’t think it will even be healthy to “consume content” anymore. Why live a life where almost everything I see can be a lie? Makes me not want to use any of this anymore.

> My guard will have to be up so much, all the time, I actually don’t think it will even be healthy to “consume content” anymore. Why live a life where almost everything I see can be a lie? Makes me not want to use any of this anymore.

While I generally agree with your whole comment, I feel like this part has been true for years on social media well before AI generated content hit the scene.

Yeah that’s a good point. It’s not so much that it was already happening before, but it’s the shear quantity of it. People have been doctoring images for political gain for forever, but that at least took some photoshop skills. Now anyone can just pop out thousands of misinformation photos, articles, and even now videos in a few hours.

Make a search engine that doesn’t have AI result except when I specifically ask for it, or you soon won’t have a search engine business.

A really quick fix is to search with “-ai” and that Google doesn’t do this implicitly for images is really strange.

A lot of comments amount to: “Internet is dead”. AI is crap for sure but far from making Internet dead or useless. Consider:
– emails,
– bills and payments,
– banking,
– searching for and buying stuff (assuming you already know what you want that is),
– calls/chats – whatsapp, messenger, etc.
– youtube (for learning),
– social stuff – however bad.

AI? This shall pass too. Internet will find its way.

Isn’t this a self solving conundrum really? If google dies because of being completely useless, then no one has incentive to keep generating clickbait and fake content anymore do they?

who ever searched for baby peacock? in this searchspace, is peacock distinguished from peahen? because peaweewee is potentially not as interesting a search as peacock, and I’m referring to the tail as the romance languages refer to it.

I noticed recently when searching for images of cities that they’re nearly all over-the-top unrealistic HDR images, beyond what you used to find in an travel agent’s catalogue.

I’m at a juncture in my career where I’m asking what could really motivate me to do anything that I really feel is worth doing in tech. In my earlier years I remember using both CompuServe and Prodigy. I’m not sure if it just hindsight colored by nostalgia, but I yern for the feeling I had as a young teenager when I could explore a quirky and curated world of information.

I’m starting to think that all this AI stuff has finally pushed the ads-based Internet past its tipping point.

I feel I could be motivated to work on a walled garden with moderation paid for by subscription fees. What would it be worth to you to have an entirely new online experience free of all the enshittification of the past 15 years?

Personally, I pay for Kagi just to have a small taste of what that could be like. But what if not just the search engine, but also all the sites be funded entirely by a subscription fee paid to the service profider? What if privacy could be a foremost feature of that world? What if advertising and astroturfing were strictly forbidden, and human authors would have to be vetted by other humans to be allowed a place in this world? “This content is Certified ads- and AI-Free(tm).”

I really don’t know how well something like that would turn out in 2024, but I feel I wouldn’t be alone in wanting to give it a try.

We could also have a public library but for the Internet. A list of sites and articles curated and maintained by librarians and experts and paid for by local taxes.

Facebook is flooded with this! Fake photo of poor people asking for help and you see thousands of like and people commenting how they can help

This might not be insightful, but I think we need to adapt.

Search and Internet is dead. It will be. There is no going back with AI. We must to learn how to deal with. You too should rethink how to approach the Internet, how to surf it.

If search is dead, are there any solutions to it? I use more RSS source now, because this is human created content. I navigate more to “word of mouth”.

actual baby peacocks are almost indistinguishable from guinea hatchlings and there’s a strong resemblance to baby chickens or turkeys.

Humans: “Hey this is bad”

Tech: “Gosh we better tune our algos so these images are even MORE indistinguishable from the real thing”

Evidently the road to hell is paved with novelty image generators.

The internet needs strong provenance to ensure content is created by trusted parties.

It has to be done in a decentralised way to ensure no enterprise controls who is trusted and who isn’t.

Well, on the “bright” side, all but 2 of the striked ones are either explicitly AI generated art (the 3 Adobe Stock ones, 2 from freepik, and the 1 from Instagram) or about noting the images aren’t real (the 2 Snopes and the 1 in the bottom left calling out the feet).

On the sad side the TikTok and YouTube ones that likely led to all of this aren’t labeled and are present, not to mention the complete lack of “I want the AI things automatically filtered, I’m not interested in trends I’m searching for actual things right now” button. Without something like that it will become harder to use Google to find new content.

I mean people obviously like the content, it’s cute enough to get shared around so much to make itself popular in these images and to trigger the post on X about it. Nothing wrong with that… but if it’s not easily filterable for what the user is actually trying to find then Google has somewhat failed at its goal.

I’ve started thinking more and more about a short throwaway conversation in Anathem about how the internet in their world is absolutely ruined by AI and the only solution they have left is a user driven reputation system for entities and how one of the characters just earned a lot of “reputons” for recording an event.

Mostly I think about how something like that is going to be signed into law by some state and it’ll require everything you do to be linked to your government issued ID card so they can “prove” you’re not spreading AI misinformation and all the horrendous unintended side effects that will spread from there.

“Anyone can post information on any topic. The vast majority of what’s on the Reticulum is, therefore, crap. It has to be filtered. The filtering systems are ancient. My people have been improving them, and their interfaces, since the time of the Reconstitution.”

…

“Asynchronous, symmetrically anonymized, moderated open-cry repute auction. Don’t even bother trying to parse that. The acronym is pre-Reconstitution. There hasn’t been a true asamocra for 3600 years. Instead we do other things that serve the same purpose and we call them by the old name. In most cases, it takes a few days for a provably irreversible phase transition to occur in the reputon glass – never mind – and another day after that to make sure you aren’t just being spoofed by ephemeral stochastic nucleation.”

Fantastic book. I read it twice so far, highly recommended. So many little off-handed conceptual gems everywhere.

A bit tangential, but it’s interesting to see comments like “we should start hosting our own websites”. We were discussing it with my friends, and it seems like there was a significant changes to what is considered as “cool” in terms of social validation. I understand that I’m dumbing it down right now, but it’s not just AI that contributes to it. It’s definitely accelerating this feeling though.

In early 2010s when Instagram, Twitter, Facebook started getting big, all the websites and apps had this process of discovery that you had to go through to make it fun for yourself. It obviously turned some people off of it, and made the onboarding a bit harder, but you needed to follow some people, send some friend requests, and in the end you would mostly see things you’ve actively wanted to see. Even when the algorithms started sorting the timelines, it would still be (mostly) within the things you’ve chosen to see. Even Youtube’s recommendation algorithm was pretty simple, and it would suggest extremely similar videos.

I think it changed around 2016, when the algorithms started trying to determine what you like, based on your interaction with other things, rather than your explicit action of saying “i want stuff from this person/channel/etc.”. I’m sure a significant chunk of us have worked on similar algorithms, so you get the gist of it. But this change resulted in users getting attention from the global audience (because in order for algo to detect what you like, it has to throw in suggestions from everywhere).

I get that forums have existed for decades, and people were getting Reddit karma since 2000s, but it was still more deliberate action when you wanted to see something. TikTok, YouTube and Instagram changed the entire playing field in the last 6 years or so, where your real life “social score” didn’t have to be depend on whom you know in real world for anyone. It translates into – you can generate posts, content, whatever you wanna call it, for everyone rather than actively getting someone’s attention. Like, going viral on YouTube was a big thing at some point. There’s some ongoing meme-like comments saying “you would be invited to Ellen’s show in 2010”, which is kinda true because breaking out of the “only seen by people whom you know” box was extremely rare.

Well, now, everyone, technically has a chance, which incentivizes people to constantly push out content. It doesn’t matter, if you’re doing it for just social media clout, or financial motives, and etc. It’s just possible for something to go “big”, albeit for minuscule benefits from it. So there’s constant churn of… content. And now AI is just making it even simpler to create such content. But again, resulting in even further decrease of social importance of such pictures/videos/texts.

I understand there’s always a group of people that “write/create/paint for themselves”, which I understand. I’m on a similar boat. But the if majority of creators have different incentives, the platforms will cater for them. And in this case, platform is the whole Internet, and incentives are “financial, and seeking global attention”. Right now, it takes about a minute to create a video and post it on any of the websites, which was basically impossible back in the day. That barrier of entry, combined with one’s deliberate discovery what, I think, was making the internet look more fun.

I’m not touching the subject of ad-infestation in every corner, and it definitely accelerated the downward spiral of average quality of content. But in the end, I blame ourselves for choosing this path, because we could’ve put pressure on global-algorithms of YouTube, TikTok, and etc. We chose to not to do so, because, well, it still gives us dopamine hits.

Images were already pretty devalued because of how good phone cameras are and how every person has one. Now it will just give it a little more kick

A “baby peacock” is not a thing, so I honestly don’t see the search quality issue here. The text “baby peacock” is associated with these fabricated images.

Have you ever encountered the extremely large contingent of HN commenters who claim to prefer that Google interpret their search literally, exactly, and at face value? Wouldn’t they be howling mad if Google silently adjusted the core concept of your search from “baby peacock” to “peafowl chick”?

In any case the web and Google’s index of it is crowdsourced. If the web associates this image and that phrase, what are they supposed to do about it?

But if I told someone that I had baby peacocks on my farm, they wouldn’t look at me bewildered and wonder what kind of animal I’m talking about. If they know what a peacock is, then they know what a baby peacock is, whether I’m using the correct word or not. The same is true if I say “baby cow” instead of calf, or “baby horse” instead of foal. You and I can picture exactly what those animals look like in our heads, and it seems AI can too.

I’ve had a mild amount of success asking some nature photographers if they would be willing to make a few of their photos freely licensed so that they can be used on Wikipedia articles.

Wikimedia is a fantastic resource.

Creative Commons offers a portal if you wish to cast a wider net (music, video, 3D models)

It also includes Google Images and Flickr.

https://search.creativecommons.org/

I found the peachicks on Commons by searching “peacock” and then following categories up the tree. If people use the wrong search engine with naïve search terms, I don’t know what to tell ya.

This is a parallel example of why reference librarians are still worth consulting, because they will guide you to the library’s resources and databases, and demonstrate how to use search queries.

It sounds a bit unfair to use content from a site without crediting the source in an obvious way. I’m sure this shameless content hijacking can’t continue as in the end there will not be any source to query. Robots.txt should allow meta-tags like block ‘all AI’ bots (or these AI companies should pay their sources).

I am reminded of the decline of MySpace. It was just thousands of bots posing as users, posting ads on people’s pages for e-books, pharmaceuticals etc. The bots remained talking to each other long after the last humans left.

I was searching google images for “cat professor” recently.

Same, as far as I could tell all AI garbage with weird saturation and colors and uncanny valley …. they look weird / didn’t work for me.

I was hoping for a picture of a real cat. There’s a different look that real photos have. The AI photos all look like computer polished weirdness.

I think this is kind of key to the issue, the good content is there if you know how to find it.
But if you don’t know the right terminology then you are going to search for baby peacock and get bad results.

Well, that makes me wonder if the search isn’t flawed primarily because it is an image search. I try to stress to my children how important it is to prefer reading material over “watching material”. And while these are stills (photographs seems inappropriate), the fact that because they are images the search can’t possible help you to self-correct. Google has no opportunity to show you the correct terminology within the results, and you do not learn enough to then go out and find the images you were hoping for.

I know there are exceptions. There are answers I’ve wanted that can be found within the first few minutes of the first video on Youtube, which I’ve gone days without discovering because I’m video-averse. But I suspect that the habit is, on average, more benefit than detriment.

I stopped using Google search…why even bother now. Results are just some crappy page with ads. Aastroturfed wikpedia page is also suspect. Chatgpt can answer questions in seconds. Just not sure if correct, but most of the time more than good enough.
I feel like Google is destroying their credibility by day.
Just go to zoo to see peacocks and take pictures. At least it will be an real experience not some virtual manipulation

I suppose the logic could be: if you’re going to consume AI generated content anyway, why not use a setup where you have control over the system prompt and other parameters? Not sure if ChatGPT qualifies there, though.

My solution is to use the computer as little as possible. Go see the world to know what a peacock looks lie. Last time I’ve seen Peacocks was in Lisbon in St. George’s castle, 4 years ago. The kind of questions I ask ChatGPT, are mostly code questions. Or for it to help me with planning something. I ask it questions, and it can provide some sort of logic behind the answer which I can then reason about. Sure it can mislead me, but it’s more like an ongoing conversation I’m having with it. So its more like an opinion I’m getting, and I discount it.
I’m generally a skeptical person. So I’m well aware of the manipulations that are happening online. Google is just a weaponized player in misinformation warefare at this point. It purposely will go out of its way to build consensus, for conflicts. Bunch of technocratic Billionaire overlords would get you to support genocide if it would benefit them. So I just don’t trust google at this point for anything news related. And the rest of their content seems to be just a giant trap of spam pages.

Uh?, this seems totally normal, a few clear AI images here and there but all those seem legi…

And then I remembered that I was on duckduckgo.

I wonder: do all the HNers who are excited about their GenAI product or wrapper or startup understand, at a fundamental level, that they are an intrinsic part of this deterioration?

Or is this one of those fundamental attribution error things:

– MY product is a powerful tool for creators who wish to save time

– THEIR product is just a poorly-though-out slop generator

Does it occur to people to instead be part of something real and visceral, and not just blame social media’s ad-driven impression model, not pretend they are only part of a trend for which they can’t be totally blamed?

You have only had google image search for what, 20-years? Why do you think it is a fundamental part of humanity’s growth story?

You talk about being a part of something “real and visceral” but you’re complaining about the demise of being able to sit at your desk and see pictures of wildlife. Maybe it’s okay that google image search dies and makes people go out and find the wildlife they want to see.

The internet, even in its best format (e.g. ad-free, free access information for all; and communication with all of humanity) has a ton of real downsides. It’s not clear to me that AI should be strangled in its infancy to save the internet (which does _not_ exist in that “best” format).

>Maybe it’s okay that google image search dies and makes people go out and find the wildlife they want to see.

I don’t think that is what will happen if google images dies.

haha, no definitely not! The internet is mostly not “real and visceral” so losing parts of it to AI-generated nonsense IS a loss, just not a loss of the actual underlying thing (in this case: baby peacocks).

Unfortunately your comment is doing the same thing, just at a different level—something like this:

– I am a thoughtful technologist, building real things for real people, concerned about others and the social impact of my work;

– they are greedy and ignorant, destroying society for short-term personal gain, no matter what the consequences.

It’s human nature to put badness on an abstract them, but we don’t get anywhere that way. It’s good for getting agreement (e.g. upvotes), because we all put ourselves in that sweet I bucket and participate in the down-with-them feeling. But it only leads to more of what everyone decries.

First off, no, it did absolutely not do the same thing. It was a polemic question, sure, but it was a specific criticism of a technology and its proponents.

I did not make any claims about myself at all, until I was separately accused of being something or other by someone projecting onto me whatever it was they needed to feel better about themselves.

Second, you have rate-limited me with the “posting too fast” thing so I couldn’t reply to your comment or other ad hominem, even though I was posting at a rate no faster than the discussions about OpenSCAD and FreeCAD I had been involved with earlier (considerably less, I would say).

It’s IMO really classless to use your administrative privileges to silence people after you accuse them of something but before they can respond, but I am not surprised to see that.

I will repeat again: I think it is really clear to me, and really to everyone I have me outside this bubble, that there is no fine distinction to be drawn between content generating AI projects that are “good” and those that are contributing to “slop”. It’s all slop-generation; e.g. NotebookLM is no better or cleverer than Midjourney.

Every tool HNers are excited about is going to be used to make the world’s culture, and the web, worse.

I’d encourage you and those reading to consider this.

Sure, you can’t make much of a change by yourself. But you don’t have to be part of what amounts to inflicting automated cultural vandalism on an unprecedented scale.

Goodbye.

Sure but doesn’t every technological development have these tradeoffs?

You could say what you say about anyone at any time. Where do you draw the line? I guarantee you’ll be guilty of the exact same thing. I don’t want to generalize, but IMO this sentiment of yours, I hear most loudly from software engineers far removed from ordinary non-technical end users: is making beautiful new LISPs and CNIs and Python package auditing tools the only valid work with seemingly no tradeoffs?

> I hear most loudly from software engineers far removed from ordinary non-technical end users

I am absolutely not far removed from non-technical end users. They are my client base, ultimately. As a freelancer I focus on building real things that make things better for people whose faces and voices I get to know. GenAI will be useless to them, because it is antithetical to what they do.

And that focus is only getting keener; I want nothing to do with the AI-generated web.

> They are my client base, ultimately… I focus on building real things that make things better for people… faces and voices I get to know.

So what I’m hearing is, “I agree very strongly with the people who pay me.” Or to put it in your words:

“MY product is a powerful tool for creators who wish to save time.”

“THEIR product is just a poorly-thought-out slop generator”

Every technical advancement has tradeoffs. Not every technical advancement has billions of dollars sloshing around doing absolutely nothing except making the web worse and further ruining the environment. What a shockingly bad-faith way to interpret GP’s argument, wow.

The comment is an interesting but very cookie cutter sort of vamp and drama. The comment trades in a bunch of generalization, much like yours, and you know, generalization doesn’t feel good when it directly attacks you.

I don’t sincerely believe that people who are working on Kubernetes features or observability tools are bad people. Do high drama personalities who engage in a mode of discourse of “wow” and “shockingly” say valid things too? Yeah. But it’s as simple as, log in your own eye before you worry about the thorns in others. Exceptionally ironic because the poster is vamping about “Attribution errors.” Another POV is, shysters project.

There’s a sort of “technological fundamental attribution error” that comes into play a lot with new technologies. Every past technology has, whatever its benefits to humanity, become substantially tarnished by abuse and malicious use. But this one won’t be! Promise!

That said, I don’t really think this is a tide any individual market actor can reasonably stem. It’s going to require some pretty fundamental changes in the way we use the internet.

Are you saying AI isn’t useful? My product is painstakingly crafted and uses AI but in my opinion it uses it tastefully and with great utility. Also 95%+ of my development efforts are not on improving the AI even though I use a .ai TLD. I think it’s crazy for a modern company/product _not_ to use AI, and the grifters building clear wrappers for GPT and other insanely low-quality efforts are already pretty much dead.

> Are you saying AI isn’t useful? My product is painstakingly crafted and uses AI but in my opinion it uses it tastefully and with great utility.

Sure. And THEIR products are just thoughtless slop generators.

I propose a new rule.
“Please respond to the actual actions and consequences of said actions, not what is said in a statement to generate positive PR. Assume putting one’s money where one’s mouth is, is harder to do than simply blow hot air about creating a private, ethical platform.”

Sick and tired of giving parasites benefit of the doubt they’ve long sucked dry.

It seems like the image results for “baby peacock” are returning articles talking about AI-generated photos of baby peacocks due to some recent trend involving an AI-generated baby peacock image.

Have people tried searching for other animals? Maybe this isn’t a case of Google being inundated with AI-generated photos, but just something to do with the results for this particular phrase.

In the near future, certain talking points we wish to discuss won’t be allowed by the downvote/flagging mafia, so we’ll link to Reddit instead while proclaiming how HN is so much better than those plebs over there.

All AI image and video generators must be forced to add metadata and watermarks and all uploading technology (browsers, iphone & Android SDKs, etc and websites, apps, etc) need to publish/label AI Generated or not. Then search engines worth their salt can filter out the AI crap and boom we are back to how the Internet was or if you want to see the fake crap change the filter.

> All AI image and video generators must be forced to add metadata and watermarks and all uploading technology

This is already impossible because it’s impossible to enforce. You can’t stop something running on a random laptop, and you can’t stop models running on server farms in, say, North Korea.

Those for profit can be forced to add watermarks/metadata and uploading tech (Google Chrome, Apple, Google Android, Firefox all what the public uses now) can be forced too.

If it cant verify the source it could label it suspect :-). Just thinking here … you got any other ideas or we are just going to let the Internet die by the hands of AI as Neil DeGrasse Tyson predicts https://www.youtube.com/watch?v=SAuDmBYwLq4 or you just gonna downvote someone who tries to come up with solutions.

Well, nearly all of the Google Images results for “Woman” show a woman with makeup and additionally the photos were altered via Photoshop.

We have been creating our own reality even before AI.

Creating a version of reality is significantly different from conjuring abject falsehoods. There is an objective reality for what (e.g.) a baby peacock should look like, and this AI slop is inherently misleading about that.

His comparison isn’t totally off. At some point our global perception of a certain subject might be totally different from what it is in reality just because all images about X are the optimal, AI improved, photoshopped version. This is in fact what women mean when they say that beauty standards are becoming unrealistic: Quite literally the standard image of women is being altered. Kind of similar to how the standard image of a baby peacock is being altered.

I’m actually a bit excited by this problem, believe it or not.

Like what solutions are we gonna come up with to solve it? Is the human side of the internet (however we create it) going to become more pure? Perhaps in discovering ways to avoid low quality AI content, we’ll also find ways to escape from destructive recommender systems and monetized advertisements as well. Strange as it sounds, solving this problem could lead us to a much brighter future!