AI Bots

02023-04-20 | Computer, Internet | 0 comments

RANK – DOMAIN – TOKENS – PERCENT OF
ALL TOKENS

768,560 – ottmarliebert.com – 30k – 0.00002%

See the websites that make AI bots like ChatGPT sound so smart – Washington Post

It would be one thing if we were building something together, some kind of open source chat bot, but instead this is all fodder for a proprietary, corporate machine that costs money to use. 

The three biggest sites were patents.google.com No. 1, which contains text from patents issued around the world; wikipedia.org No. 2, the free online encyclopedia; and scribd.com No. 3, a subscription-only digital library. Also high on the list: b-ok.org No. 190, a notorious market for pirated e-books that has since been seized by the U.S. Justice Department. At least 27 other sites identified by the U.S. government as markets for piracy and counterfeits were present in the data set.

See the websites that make AI bots like ChatGPT sound so smart – Washington Post

I get that wikipedia would be ranked highly as it is a free online encyclopedia, but how did they gain access to the subscription-only digital library, ranked third? Did someone pay for an account and then use the account to scrape the entire website? And a market for pirated e-books, since seized by the U.S. Justice Department?!?!

Interesting times!

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Archives

Images

Concert Dates

Thu, Apr 23 2026 in Fort Lauderdale, FL
@ The Parker

Fri, Apr 24 2026 in Clearwater, FL
@ Bilheimer Capitol Theater

Sat, Apr 25 2026 in Orlando, FL
@ Judson’s Live, Dr. Phillips Center

Sun, Apr 26 2026 in Ponte Vedra, FL
@ Ponte Vedra Concert Hall

Wed, Apr 29 2026 in Old Saybrook, CT
@ The Kate

Thu, Apr 30 2026 in Wilmington, DE
@ Baby Grand

Fri, May 1 2026 in Newark, NJ
@ NJ PAC

Sat, May 2 2026 in Riverhead, NY
@ The Suffolk

Social

@Mastodon (the Un-Twitter)