mirror of
https://forge.fsky.io/wl/pages.git
synced 2025-04-19 17:03:41 -05:00
Compare commits
2 commits
a975c41bdb
...
4121023984
Author | SHA1 | Date | |
---|---|---|---|
![]() |
4121023984 | ||
![]() |
87c49f4976 |
8 changed files with 230 additions and 5 deletions
31
_zola/content/2025-04-16-anubis-is-a-joke.md
Normal file
31
_zola/content/2025-04-16-anubis-is-a-joke.md
Normal file
|
@ -0,0 +1,31 @@
|
|||
+++
|
||||
title = "Anubis is a joke"
|
||||
date = 2025-04-16
|
||||
description = "an easily bypassable one, and not actually protecting your site (against anything other than really low effort scrapes)"
|
||||
+++
|
||||
|
||||
Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
|
||||
their sites, primarily Git forges and alternative frontends, against AI scraping.
|
||||
|
||||
Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
|
||||
browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
|
||||
a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
|
||||
stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
|
||||
challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
|
||||
does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
|
||||
iToy to test this with) and no other browser (until you read the next section) works on this.
|
||||
|
||||
There's one small problem to Anubis though. By default (which no installation I've checked changes),
|
||||
Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
|
||||
at the time of me writing this. You can check this in /data/botPolicies.json.
|
||||
|
||||
This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
|
||||
Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
|
||||
I know it's been annoying a lot of people lately. You can curl a site using the default config (most
|
||||
of them), and it won't give an Anubis challenge, it'll just show you the site in its original
|
||||
form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
|
||||
through. This is applicable to your normal browser as well, just give it a user agent that doesn't
|
||||
contain "Mozilla" or any of the other terms in the file and you won't have any problems.
|
||||
|
||||
I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
|
||||
have to do is give it a UA not containing some keywords.
|
10
_zola/templates/404.html
Normal file
10
_zola/templates/404.html
Normal file
|
@ -0,0 +1,10 @@
|
|||
{% extends "index.html" %}
|
||||
|
||||
{% block title %}
|
||||
<title>wanderlost - 404</title>
|
||||
{% endblock title %}
|
||||
|
||||
{% block content %}
|
||||
<h1>it's OVER</h1>
|
||||
<p>page not found</p>
|
||||
{% endblock content %}
|
66
blog/2025-04-16-anubis-is-a-joke/index.xhtml
Normal file
66
blog/2025-04-16-anubis-is-a-joke/index.xhtml
Normal file
|
@ -0,0 +1,66 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
|
||||
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
||||
<head>
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
|
||||
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1, viewport-fit=cover" />
|
||||
|
||||
<title>wanderlost - Anubis is a joke</title>
|
||||
|
||||
<link rel="stylesheet" href="/assets/css/main.css" />
|
||||
|
||||
|
||||
<link rel="alternate"
|
||||
type="application/rss+xml"
|
||||
title="Atom"
|
||||
href="/blog/atom.xml" />
|
||||
<link rel="alternate"
|
||||
type="application/rss+xml"
|
||||
title="RSS"
|
||||
href="/blog/rss.xml" />
|
||||
|
||||
</head>
|
||||
<body>
|
||||
<div class="navbar">
|
||||
<h1 class="title"><a href="/">wanderlost</a></h1>
|
||||
<a href="/blog/">index</a>
|
||||
<a href="/blog/atom.xml">atom</a>
|
||||
<a href="/blog/rss.xml">rss</a>
|
||||
</div>
|
||||
<div class="main">
|
||||
<hr />
|
||||
|
||||
<div class="post">
|
||||
<h1 class="post-title">Anubis is a joke</h1>
|
||||
<h2 class="post-date">2025-04-16</h2>
|
||||
<!-- if Zola just generated compliant XHTML on its own that would be great, but looks like this will have to do -->
|
||||
<p>Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
|
||||
their sites, primarily Git forges and alternative frontends, against AI scraping.</p>
|
||||
<p>Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
|
||||
browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
|
||||
a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
|
||||
stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
|
||||
challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
|
||||
does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
|
||||
iToy to test this with) and no other browser (until you read the next section) works on this.</p>
|
||||
<p>There's one small problem to Anubis though. By default (which no installation I've checked changes),
|
||||
Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
|
||||
at the time of me writing this. You can check this in /data/botPolicies.json.</p>
|
||||
<p>This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
|
||||
Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
|
||||
I know it's been annoying a lot of people lately. You can curl a site using the default config (most
|
||||
of them), and it won't give an Anubis challenge, it'll just show you the site in its original
|
||||
form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
|
||||
through. This is applicable to your normal browser as well, just give it a user agent that doesn't
|
||||
contain "Mozilla" or any of the other terms in the file and you won't have any problems.</p>
|
||||
<p>I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
|
||||
have to do is give it a UA not containing some keywords.</p>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -1,3 +1,40 @@
|
|||
<!doctype html>
|
||||
<title>404 Not Found</title>
|
||||
<h1>404 Not Found</h1>
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
|
||||
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
||||
<head>
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
|
||||
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1, viewport-fit=cover" />
|
||||
|
||||
<title>wanderlost - 404</title>
|
||||
|
||||
<link rel="stylesheet" href="/assets/css/main.css" />
|
||||
|
||||
|
||||
<link rel="alternate"
|
||||
type="application/rss+xml"
|
||||
title="Atom"
|
||||
href="/blog/atom.xml" />
|
||||
<link rel="alternate"
|
||||
type="application/rss+xml"
|
||||
title="RSS"
|
||||
href="/blog/rss.xml" />
|
||||
|
||||
</head>
|
||||
<body>
|
||||
<div class="navbar">
|
||||
<h1 class="title"><a href="/">wanderlost</a></h1>
|
||||
<a href="/blog/">index</a>
|
||||
<a href="/blog/atom.xml">atom</a>
|
||||
<a href="/blog/rss.xml">rss</a>
|
||||
</div>
|
||||
<div class="main">
|
||||
<hr />
|
||||
|
||||
<h1>it's OVER</h1>
|
||||
<p>page not found</p>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
|
@ -4,8 +4,48 @@
|
|||
<link rel="self" type="application/atom+xml" href="/blog/atom.xml"/>
|
||||
<link rel="alternate" type="text/html" href="/blog"/>
|
||||
<generator uri="https://www.getzola.org/">Zola</generator>
|
||||
<updated>2025-04-13T00:00:00+00:00</updated>
|
||||
<updated>2025-04-16T00:00:00+00:00</updated>
|
||||
<id>/blog/atom.xml</id>
|
||||
<entry xml:lang="en">
|
||||
<title>Anubis is a joke</title>
|
||||
<published>2025-04-16T00:00:00+00:00</published>
|
||||
<updated>2025-04-16T00:00:00+00:00</updated>
|
||||
|
||||
<author>
|
||||
<name>
|
||||
|
||||
wanderlost
|
||||
|
||||
</name>
|
||||
</author>
|
||||
|
||||
<link rel="alternate" type="text/html" href="/blog/2025-04-16-anubis-is-a-joke/"/>
|
||||
<id>/blog/2025-04-16-anubis-is-a-joke/</id>
|
||||
|
||||
<content type="html" xml:base="/blog/2025-04-16-anubis-is-a-joke/"><p>Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
|
||||
their sites, primarily Git forges and alternative frontends, against AI scraping.</p>
|
||||
<p>Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
|
||||
browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
|
||||
a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
|
||||
stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
|
||||
challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
|
||||
does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
|
||||
iToy to test this with) and no other browser (until you read the next section) works on this.</p>
|
||||
<p>There's one small problem to Anubis though. By default (which no installation I've checked changes),
|
||||
Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
|
||||
at the time of me writing this. You can check this in /data/botPolicies.json.</p>
|
||||
<p>This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
|
||||
Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
|
||||
I know it's been annoying a lot of people lately. You can curl a site using the default config (most
|
||||
of them), and it won't give an Anubis challenge, it'll just show you the site in its original
|
||||
form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
|
||||
through. This is applicable to your normal browser as well, just give it a user agent that doesn't
|
||||
contain "Mozilla" or any of the other terms in the file and you won't have any problems.</p>
|
||||
<p>I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
|
||||
have to do is give it a UA not containing some keywords.</p>
|
||||
</content>
|
||||
|
||||
</entry>
|
||||
<entry xml:lang="en">
|
||||
<title>XHTML is good, actually</title>
|
||||
<published>2025-04-13T00:00:00+00:00</published>
|
||||
|
|
|
@ -34,6 +34,14 @@
|
|||
|
||||
<div class="posts">
|
||||
|
||||
<h3 class="post-title">
|
||||
2025-04-16 -
|
||||
<a href="/blog/2025-04-16-anubis-is-a-joke/">
|
||||
Anubis is a joke
|
||||
</a>
|
||||
</h3>
|
||||
<p>an easily bypassable one, and not actually protecting your site (against anything other than really low effort scrapes)</p>
|
||||
|
||||
<h3 class="post-title">
|
||||
2025-04-13 -
|
||||
<a href="/blog/2025-04-13-xhtml-is-good-actually/">
|
||||
|
|
31
blog/rss.xml
31
blog/rss.xml
|
@ -7,7 +7,36 @@
|
|||
<generator>Zola</generator>
|
||||
<language>en</language>
|
||||
<atom:link href="/blog/rss.xml" rel="self" type="application/rss+xml"/>
|
||||
<lastBuildDate>Sun, 13 Apr 2025 00:00:00 +0000</lastBuildDate>
|
||||
<lastBuildDate>Wed, 16 Apr 2025 00:00:00 +0000</lastBuildDate>
|
||||
<item>
|
||||
<title>Anubis is a joke</title>
|
||||
<pubDate>Wed, 16 Apr 2025 00:00:00 +0000</pubDate>
|
||||
<author>wanderlost</author>
|
||||
<link>/blog/2025-04-16-anubis-is-a-joke/</link>
|
||||
<guid>/blog/2025-04-16-anubis-is-a-joke/</guid>
|
||||
<description xml:base="/blog/2025-04-16-anubis-is-a-joke/"><p>Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
|
||||
their sites, primarily Git forges and alternative frontends, against AI scraping.</p>
|
||||
<p>Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
|
||||
browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
|
||||
a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
|
||||
stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
|
||||
challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
|
||||
does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
|
||||
iToy to test this with) and no other browser (until you read the next section) works on this.</p>
|
||||
<p>There's one small problem to Anubis though. By default (which no installation I've checked changes),
|
||||
Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
|
||||
at the time of me writing this. You can check this in /data/botPolicies.json.</p>
|
||||
<p>This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
|
||||
Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
|
||||
I know it's been annoying a lot of people lately. You can curl a site using the default config (most
|
||||
of them), and it won't give an Anubis challenge, it'll just show you the site in its original
|
||||
form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
|
||||
through. This is applicable to your normal browser as well, just give it a user agent that doesn't
|
||||
contain "Mozilla" or any of the other terms in the file and you won't have any problems.</p>
|
||||
<p>I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
|
||||
have to do is give it a UA not containing some keywords.</p>
|
||||
</description>
|
||||
</item>
|
||||
<item>
|
||||
<title>XHTML is good, actually</title>
|
||||
<pubDate>Sun, 13 Apr 2025 00:00:00 +0000</pubDate>
|
||||
|
|
|
@ -19,4 +19,8 @@
|
|||
<loc>/blog/2025-04-13-xhtml-is-good-actually/</loc>
|
||||
<lastmod>2025-04-13</lastmod>
|
||||
</url>
|
||||
<url>
|
||||
<loc>/blog/2025-04-16-anubis-is-a-joke/</loc>
|
||||
<lastmod>2025-04-16</lastmod>
|
||||
</url>
|
||||
</urlset>
|
||||
|
|
Loading…
Add table
Reference in a new issue