Thanks Zola for constantly changing my extensions and making me rename

new blog post about Anubis
Signed-off-by: wl <zayd@disroot.org>
2025-04-19 17:03:41 -05:00 · 2025-04-16 01:05:25 -04:00 · 2025-04-16 01:00:48 -04:00
8 changed files with 230 additions and 5 deletions
--- a/_zola/content/2025-04-16-anubis-is-a-joke.md
+++ b/_zola/content/2025-04-16-anubis-is-a-joke.md
@ -0,0 +1,31 @@
+++
+title = "Anubis is a joke"
+date = 2025-04-16
+description = "an easily bypassable one, and not actually protecting your site (against anything other than really low effort scrapes)"
+++
+
+Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
+their sites, primarily Git forges and alternative frontends, against AI scraping.
+
+Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
+browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
+a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
+stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
+challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
+does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
+iToy to test this with) and no other browser (until you read the next section) works on this.
+
+There's one small problem to Anubis though. By default (which no installation I've checked changes),
+Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
+at the time of me writing this. You can check this in /data/botPolicies.json.
+
+This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
+Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
+I know it's been annoying a lot of people lately. You can curl a site using the default config (most
+of them), and it won't give an Anubis challenge, it'll just show you the site in its original
+form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
+through. This is applicable to your normal browser as well, just give it a user agent that doesn't
+contain "Mozilla" or any of the other terms in the file and you won't have any problems.
+
+I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
+have to do is give it a UA not containing some keywords.
--- a/_zola/templates/404.html
+++ b/_zola/templates/404.html
@ -0,0 +1,10 @@
+{% extends "index.html" %}
+
+{% block title %}
+<title>wanderlost - 404</title>
+{% endblock title %}
+
+{% block content %}
+<h1>it's OVER</h1>
+<p>page not found</p>
+{% endblock content %}
--- a/blog/2025-04-16-anubis-is-a-joke/index.xhtml
+++ b/blog/2025-04-16-anubis-is-a-joke/index.xhtml
@ -0,0 +1,66 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
+		  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+	<head>
+		<meta http-equiv="X-UA-Compatible" content="IE=edge" />
+		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1, viewport-fit=cover" />
+		
+<title>wanderlost - Anubis is a joke</title>
+
+		<link rel="stylesheet" href="/assets/css/main.css" />
+
+		
+        <link rel="alternate"
+			  type="application/rss+xml"
+			  title="Atom"
+			  href="/blog/atom.xml" />
+        <link rel="alternate"
+			  type="application/rss+xml"
+			  title="RSS"
+			  href="/blog/rss.xml" />
+		
+	</head>
+	<body>
+		<div class="navbar">
+			<h1 class="title"><a href="/">wanderlost</a></h1>
+			<a href="/blog/">index</a>
+			<a href="/blog/atom.xml">atom</a>
+			<a href="/blog/rss.xml">rss</a>
+		</div>
+		<div class="main">
+			<hr />
+			
+<div class="post">
+	<h1 class="post-title">Anubis is a joke</h1>
+	<h2 class="post-date">2025-04-16</h2>
+	<!-- if Zola just generated compliant XHTML on its own that would be great, but looks like this will have to do -->
+	<p>Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
+their sites, primarily Git forges and alternative frontends, against AI scraping.</p>
+<p>Anubis is a new PoW captcha "solution" that (allegedly) holds out scrapers by slowing down your
+browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it's wasted
+a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
+stupid anime girl (previously AI generated) it shows you give a smile and you're on your way. This
+challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
+does seem to work, though with broken CSS. It doesn't even work on Safari (allegedly, I don't own an
+iToy to test this with) and no other browser (until you read the next section) works on this.</p>
+<p>There's one small problem to Anubis though. By default (which no installation I've checked changes),
+Anubis will only present a challenge to User-Agents with "Mozilla" and some obvious scraper agents,
+at the time of me writing this. You can check this in /data/botPolicies.json.</p>
+<p>This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
+Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
+I know it's been annoying a lot of people lately. You can curl a site using the default config (most
+of them), and it won't give an Anubis challenge, it'll just show you the site in its original
+form. No special options, no custom User-Agent, just curl http://domain.name and it'll let you
+through. This is applicable to your normal browser as well, just give it a user agent that doesn't
+contain "Mozilla" or any of the other terms in the file and you won't have any problems.</p>
+<p>I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
+have to do is give it a UA not containing some keywords.</p>
+
+</div>
+
+
+		</div>
+	</body>
+</html>
--- a/blog/404.xhtml
+++ b/blog/404.xhtml
@ -1,3 +1,40 @@
-<!doctype html>
-<title>404 Not Found</title>
-<h1>404 Not Found</h1>
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
+		  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+	<head>
+		<meta http-equiv="X-UA-Compatible" content="IE=edge" />
+		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+		<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1, viewport-fit=cover" />
+		
+<title>wanderlost - 404</title>
+
+		<link rel="stylesheet" href="/assets/css/main.css" />
+
+		
+        <link rel="alternate"
+			  type="application/rss+xml"
+			  title="Atom"
+			  href="/blog/atom.xml" />
+        <link rel="alternate"
+			  type="application/rss+xml"
+			  title="RSS"
+			  href="/blog/rss.xml" />
+		
+	</head>
+	<body>
+		<div class="navbar">
+			<h1 class="title"><a href="/">wanderlost</a></h1>
+			<a href="/blog/">index</a>
+			<a href="/blog/atom.xml">atom</a>
+			<a href="/blog/rss.xml">rss</a>
+		</div>
+		<div class="main">
+			<hr />
+			
+<h1>it's OVER</h1>
+<p>page not found</p>
+
+		</div>
+	</body>
+</html>
--- a/blog/atom.xml
+++ b/blog/atom.xml
@ -4,8 +4,48 @@
    <link rel="self" type="application/atom+xml" href="/blog/atom.xml"/>
    <link rel="alternate" type="text/html" href="/blog"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
-    <updated>2025-04-13T00:00:00+00:00</updated>
+    <updated>2025-04-16T00:00:00+00:00</updated>
    <id>/blog/atom.xml</id>
+    <entry xml:lang="en">
+        <title>Anubis is a joke</title>
+        <published>2025-04-16T00:00:00+00:00</published>
+        <updated>2025-04-16T00:00:00+00:00</updated>
+        
+        <author>
+          <name>
+            
+              wanderlost
+            
+          </name>
+        </author>
+        
+        <link rel="alternate" type="text/html" href="/blog/2025-04-16-anubis-is-a-joke/"/>
+        <id>/blog/2025-04-16-anubis-is-a-joke/</id>
+        
+        <content type="html" xml:base="/blog/2025-04-16-anubis-is-a-joke/">&lt;p&gt;Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
+their sites, primarily Git forges and alternative frontends, against AI scraping.&lt;&#x2F;p&gt;
+&lt;p&gt;Anubis is a new PoW captcha &quot;solution&quot; that (allegedly) holds out scrapers by slowing down your
+browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it&#x27;s wasted
+a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
+stupid anime girl (previously AI generated) it shows you give a smile and you&#x27;re on your way. This
+challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
+does seem to work, though with broken CSS. It doesn&#x27;t even work on Safari (allegedly, I don&#x27;t own an
+iToy to test this with) and no other browser (until you read the next section) works on this.&lt;&#x2F;p&gt;
+&lt;p&gt;There&#x27;s one small problem to Anubis though. By default (which no installation I&#x27;ve checked changes),
+Anubis will only present a challenge to User-Agents with &quot;Mozilla&quot; and some obvious scraper agents,
+at the time of me writing this. You can check this in &#x2F;data&#x2F;botPolicies.json.&lt;&#x2F;p&gt;
+&lt;p&gt;This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
+Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
+I know it&#x27;s been annoying a lot of people lately. You can curl a site using the default config (most
+of them), and it won&#x27;t give an Anubis challenge, it&#x27;ll just show you the site in its original
+form. No special options, no custom User-Agent, just curl http:&#x2F;&#x2F;domain.name and it&#x27;ll let you
+through. This is applicable to your normal browser as well, just give it a user agent that doesn&#x27;t
+contain &quot;Mozilla&quot; or any of the other terms in the file and you won&#x27;t have any problems.&lt;&#x2F;p&gt;
+&lt;p&gt;I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
+have to do is give it a UA not containing some keywords.&lt;&#x2F;p&gt;
+</content>
+        
+    </entry>
    <entry xml:lang="en">
        <title>XHTML is good, actually</title>
        <published>2025-04-13T00:00:00+00:00</published>
--- a/blog/index.xhtml
+++ b/blog/index.xhtml
@ -34,6 +34,14 @@
 			
 			<div class="posts">
 				
+				<h3 class="post-title">
+					2025-04-16 -
+                    <a href="/blog/2025-04-16-anubis-is-a-joke/">
+						Anubis is a joke
+                    </a>
+                </h3>
+				<p>an easily bypassable one, and not actually protecting your site (against anything other than really low effort scrapes)</p>
+                
 				<h3 class="post-title">
 					2025-04-13 -
                    <a href="/blog/2025-04-13-xhtml-is-good-actually/">
--- a/blog/rss.xml
+++ b/blog/rss.xml
@ -7,7 +7,36 @@
      <generator>Zola</generator>
      <language>en</language>
      <atom:link href="/blog/rss.xml" rel="self" type="application/rss+xml"/>
-      <lastBuildDate>Sun, 13 Apr 2025 00:00:00 +0000</lastBuildDate>
+      <lastBuildDate>Wed, 16 Apr 2025 00:00:00 +0000</lastBuildDate>
+      <item>
+          <title>Anubis is a joke</title>
+          <pubDate>Wed, 16 Apr 2025 00:00:00 +0000</pubDate>
+          <author>wanderlost</author>
+          <link>/blog/2025-04-16-anubis-is-a-joke/</link>
+          <guid>/blog/2025-04-16-anubis-is-a-joke/</guid>
+          <description xml:base="/blog/2025-04-16-anubis-is-a-joke/">&lt;p&gt;Over the past few months, a lot of people have turned to Anubis by Xe Iaso for trying to protect
+their sites, primarily Git forges and alternative frontends, against AI scraping.&lt;&#x2F;p&gt;
+&lt;p&gt;Anubis is a new PoW captcha &quot;solution&quot; that (allegedly) holds out scrapers by slowing down your
+browsing and forcing you to enable JavaScript to pass a challenge to view the site. Once it&#x27;s wasted
+a few seconds of your time and made you reevaluate the worth of whatever you were visiting, the
+stupid anime girl (previously AI generated) it shows you give a smile and you&#x27;re on your way. This
+challenge only will work on Chromium and its Google-funded controlled opposition, Firefox. Basilisk
+does seem to work, though with broken CSS. It doesn&#x27;t even work on Safari (allegedly, I don&#x27;t own an
+iToy to test this with) and no other browser (until you read the next section) works on this.&lt;&#x2F;p&gt;
+&lt;p&gt;There&#x27;s one small problem to Anubis though. By default (which no installation I&#x27;ve checked changes),
+Anubis will only present a challenge to User-Agents with &quot;Mozilla&quot; and some obvious scraper agents,
+at the time of me writing this. You can check this in &#x2F;data&#x2F;botPolicies.json.&lt;&#x2F;p&gt;
+&lt;p&gt;This means all one of those evil scrapers Anubis is supposed to protect against have to do to bypass
+Anubis is not use one of these User-Agents. It also means that you too can completely bypass this as
+I know it&#x27;s been annoying a lot of people lately. You can curl a site using the default config (most
+of them), and it won&#x27;t give an Anubis challenge, it&#x27;ll just show you the site in its original
+form. No special options, no custom User-Agent, just curl http:&#x2F;&#x2F;domain.name and it&#x27;ll let you
+through. This is applicable to your normal browser as well, just give it a user agent that doesn&#x27;t
+contain &quot;Mozilla&quot; or any of the other terms in the file and you won&#x27;t have any problems.&lt;&#x2F;p&gt;
+&lt;p&gt;I was expecting a much more involved workaround to dealing with this piece of shit but no, all you
+have to do is give it a UA not containing some keywords.&lt;&#x2F;p&gt;
+</description>
+      </item>
      <item>
          <title>XHTML is good, actually</title>
          <pubDate>Sun, 13 Apr 2025 00:00:00 +0000</pubDate>
--- a/blog/sitemap.xml
+++ b/blog/sitemap.xml
@ -19,4 +19,8 @@
        <loc>/blog/2025-04-13-xhtml-is-good-actually/</loc>
        <lastmod>2025-04-13</lastmod>
    </url>
+    <url>
+        <loc>/blog/2025-04-16-anubis-is-a-joke/</loc>
+        <lastmod>2025-04-16</lastmod>
+    </url>
 </urlset>
Author	SHA1	Message	Date
wl	4121023984	Thanks Zola for constantly changing my extensions and making me rename	2025-04-16 01:05:25 -04:00
wl	87c49f4976	new blog post about Anubis Signed-off-by: wl <zayd@disroot.org>	2025-04-16 01:00:48 -04:00