Good Afternoon, and welcome to the Intelligency Digital Roundup. Letting you know the latest trends and insights in Digital Marketing.
In this week’s news: Learn about OpenAI’s new web crawler, GPTBot. Also, see what’s changed with TikTok’s ad campaigns, and the best way to prune content on your website.
Let’s get right into it.
All About GPTBot from OpenAI
This week, OpenAI launched GPTBot, which is a new web crawler designed to improve future AI models like GPT-4 and GPT-5.
How it works
GPTBot is designed to enhance future AI technology and accuracy capabilities by scanning the web for the following user agent token and string:
User agent token: GPTBot Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
OpenAI has stated that the bot will automatically filter out paywall-restricted sources, any sources with personal information, and any sources which violate OpenAI policies. However, this technology provides businesses with the opportunity to improve the AI ecosystem and future language models by giving it access to your site.
It’s important to note that you’re able to restrict access to your site if you don’t want to give GPTBot access.
Site owners can restrict access by placing the following in the site’s robots.txt file:
User-agent: GPTBot Disallow: /
On the other hand, if you want to allow it some access, you can insert this into the robots.txt file:
User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/
You’ll be able to see if GPTBot has crawled your website as OpenAI provided IP address ranges on its site. This is to help provide transparency.
Legality and Ethics
This latest tech launch has sparked discussions on a forum called Hacker News. The discussions involve the legality and ethics of a company scraping web data in order to train AI systems.
Some users argue that because you can disallow it using robots.txt, there aren’t any ethical concerns. Other users argue that there’s no benefit to allowing it access to your site. This is because, unlike Googlebot, it’s not driving traffic to your site.
Another concern is how it will handle copyrighted and licensed products because ChatGPT doesn’t cite sources when giving an answer to a prompt. Additionally, if licensed media such as images or music is used to train AI, it could be copyright infringement.
Other users argue that anything on the public web is fair game, comparing it to a person learning from online content.
Overall, GPTBot has opened up many a discussion on fair use and ownership when it comes to AI. Robots.txt is a good first step.
TikTok Ad Changes
In other news, TikTok has made some major changes to its ads this week in order to meet EU regulations. Targeted advertising capabilities will be reduced, and any ads which violate content guidelines will be removed.
Also, users will see expanded reporting options, and if a creator faces a content moderation decision, they’ll be notified.
Here’s the full list of changes:
TikTok’s ad and platform changes
The full list of changes on TikTok includes:
- Expanded reporting- When reporting content, EU users can now report them as illegal. You’ll be able to report something as hate speech, a form of harassment, or a financial crime/scam/pyramid scheme.
- Global bans- If a piece of content is found to be violating TikTok’s content policies, it will be removed globally from the platform.
- Targeted ads- Brands can no longer create targeted ads for EU users aged 13 to 17. Those users will no longer see personalised ads based on their activities on and off-site.
- Personalisation– EU users can now turn off personalisation, this will mean that their “for you” and “live” feeds will show popular global content, rather than content popular near their location.
- More transparency– TikTok has promised to be more transparent when it comes to decisions regarding content moderation.
Here’s what TikTok has said regarding the changes:
“The European Union has set a clear vision for platform regulation with the Digital Services Act (DSA). Following our updates in July about our Research API and Commercial Content Library, we are providing more information about the work that we are doing to meet our obligations under the Act by the August 28 deadline.” “Our mission is to inspire creativity and bring joy. We know that ensuring the safety, privacy, and security of our European community is critical to achieving that goal.”
These changes are somewhat similar to the potential ones which Meta is making for Facebook, will more companies follow suit? Only time will tell.
Google’s advice on content pruning
This week, Google gave some good advice on content pruning, following an exposé from Gizmodo regarding CNET deleting thousands of articles. Allegedly, the site did this in order to “game Google Search”
CNET did confirm the culling of content, and while Gizmodo said it was in the thousands, CNET did not comment don’t the exact number. The company “redirected, repurposed, or removed” content by analysing the following metrics:
- Backlink profiles
- Time past since the content was updated
Content deprecation, or removal, tells Google that “CNET is fresh, relevant, and worthy of high page rankings” according to a CNET internal memo.
CNET is incorrect about this. Deleting content doesn’t tell Google the above. If you want to show that your site is fresh, relevant, and worthy of high page rankings, then publish helpful and high-quality content. Not delete existing content.
Showing even more of a lack of SEO awareness, Taylor Canada, CNET’s director of marketing stated “Unfortunately, we are penalized by the modern internet for leaving all previously published content live on our site.” This isn’t how SEO works, Google will never punish a site for having old articles live on a site.
Google’s guidance doesn’t state this, if anything, old content can still be helpful to users. Google SearchLiasion’s X account had this to say about deleting old content:
Danny Sullivan, owner of the Search Liasion account, had this to say when asked about old content which has broken links or isn’t relevant anymore:
“The page itself isn’t likely to rank well. Removing it might mean if you have a massive site that we’re better able to crawl other content on the site. But it doesn’t mean we go ‘oh, now the whole site is so much better’ because of what happens with an individual page.”
What Google said about content pruning
Google once said in 2011 that low-quality content removal could help rankings:
“In addition, it’s important for webmasters to know that low quality content on part of a site can impact a site’s ranking as a whole. For this reason, if you believe you’ve been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.”
However, Danny Sullivan argues that Google never outright said to delete content just because it’s old.
Google’s current advice from experts such as John Mulaney is to repurpose content by improving it, rather than removing it, where possible. Intelligency will always argue that this is much better for SEO too. This is because improving and repurposing old content rather than deleting it improves the overall quality of the content of your site.
As always, thanks for reading this week’s digital roundup!