From d72010bb646a263502ca101c1ccaba29d71c952e Mon Sep 17 00:00:00 2001 From: WeebDataHoarder <57538841+WeebDataHoarder@users.noreply.github.com> Date: Sun, 13 Apr 2025 16:53:52 +0200 Subject: [PATCH] Split off challenges page from README --- CHALLENGES.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 123 ++++---------------------------------------------- 2 files changed, 126 insertions(+), 115 deletions(-) create mode 100644 CHALLENGES.md diff --git a/CHALLENGES.md b/CHALLENGES.md new file mode 100644 index 0000000..ba7985b --- /dev/null +++ b/CHALLENGES.md @@ -0,0 +1,118 @@ +# Challenges + +Challenges can be [transparent](#transparent) (not shown to user, depends on backend or other logic), [non-JavaScript](#non-javascript) (challenges common browser properties), or [custom JavaScript](README.md#custom-javascript) (from Proof of Work to fingerprinting or Captcha is supported) + +## Transparent + +### http + +Verify incoming requests against a specified backend to allow the user through. Cookies and some other headers are passed. + +For example, this allows verifying the user cookies against the backend to have the user skip all other challenges. + +Example on Forgejo, checks that current user is authenticated: +```yaml + http-cookie-check: + mode: http + url: http://forgejo:3000/user/stopwatches + # url: http://forgejo:3000/repo/search + # url: http://forgejo:3000/notifications/new + parameters: + http-method: GET + http-cookie: i_like_gitea + http-code: 200 +``` + +### preload-link + +Requires HTTP/2+ response parsing and logic, silent challenge (does not display a challenge page). + +Browsers that support [103 Early Hints](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/103) are indicated to fetch a CSS resource via [Link](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Link) preload that solves the challenge. + +The server waits until solved or defined timeout, then continues on other challenges if failed. + +Example: +```yaml + self-preload-link: + condition: '"Sec-Fetch-Mode" in headers && headers["Sec-Fetch-Mode"] == "navigate"' + mode: "preload-link" + runtime: + # verifies that result = key + mode: "key" + probability: 0.1 + parameters: + preload-early-hint-deadline: 3s + key-code: 200 + key-mime: text/css + key-content: "" +``` + +## Non-JavaScript + +### cookie + +Requires HTTP parsing and a Cookie Jar, silent challenge (does not display a challenge page unless failed). + +Serves the client with a Set-Cookie that solves the challenge, and redirects it back to the same page. Browser must present the cookie to load. + +Several tools implement this, but usually not mass scrapers. + +### header-refresh + +Requires HTTP response parsing and logic, displays challenge site instantly. + +Have the browser solve the challenge by following the URL listed on HTTP [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh) instantly. + + +### meta-refresh + +Requires HTTP and HTML response parsing and logic, displays challenge site instantly. + +Have the browser solve the challenge by following the URL listed on HTML `` tag instantly. Equivalent to above. + +### resource-load + +Requires HTTP and HTML response parsing and logic, displays challenge site. + +Servers a challenge page with a linked resource that is loaded by the browser, which solves the challenge. Page refreshes a few seconds later via [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh). + +Example: +```yaml + self-resource-load: + mode: "resource-load" + runtime: + # verifies that result = key + mode: "key" + probability: 0.1 + parameters: + key-code: 200 + key-mime: text/css + key-content: "" +``` + +## Custom JavaScript + +### js-pow-sha256 + +Requires JavaScript and workers, displays challenge site. + +Has the user solve a Proof of Work using SHA256 hashes, with configurable difficulty. + +Example: +```yaml + js-pow-sha256: + # Asset must be under challenges/{name}/static/{asset} + # Other files here will be available under that path + mode: js + asset: load.mjs + parameters: + # difficulty is number of bits that must be set to 0 from start + # Anubis challenge difficulty 5 becomes 5 * 8 = 20 + difficulty: 20 + runtime: + mode: wasm + # Verify must be under challenges/{name}/runtime/{asset} + asset: runtime.wasm + probability: 0.02 +``` + diff --git a/README.md b/README.md index ff2b2e2..913f60f 100644 --- a/README.md +++ b/README.md @@ -7,11 +7,11 @@ Self-hosted abuse detection and rule enforcement against low-effort mass AI scra go-away sits in between your site and the Internet / upstream proxy. -Incoming requests can be selected by [rules](#rich-rule-matching) to be [actioned](#extended-rule-actions) or [challenged](#challenges) to filter suspicious requests. +Incoming requests can be selected by [rules](#rich-rule-matching) to be [actioned](#extended-rule-actions) or [challenged](CHALLENGES.md#challenges) to filter suspicious requests. The tool is designed highly flexible so the operator can minimize impact to legit users, while surgically targeting heavy endpoints or scrapers. -[Challenges](#challenges) can be transparent (not shown to user, depends on backend or other logic), [non-JavaScript](#non-javascript-challenges) (challenges common browser properties), or [custom JavaScript](#custom-javascript-wasm-challenges) (from Proof of Work to fingerprinting or Captcha is supported) +[Challenges](CHALLENGES.md#challenges) can be transparent (not shown to user, depends on backend or other logic), [non-JavaScript](#non-javascript-challenges) (challenges common browser properties), or [custom JavaScript](#custom-javascript-wasm-challenges) (from Proof of Work to fingerprinting or Captcha is supported) See _[Why?](#why)_ section for the challenges and reasoning behind this tool. @@ -104,7 +104,7 @@ Several challenges that do not require JavaScript are offered, some targeting th These can be used for light checking of requests that eliminate most of the low effort scraping. -See [Challenges](#challenges) below for a list of them. +See [Challenges](CHALLENGES.md#challenges) for a list of them. ### Custom JavaScript / WASM challenges @@ -150,7 +150,11 @@ Results will be temporarily cached By default, [DroneBL](https://dronebl.org/) is used. -### Network range loading +### Network range and automated filtering + +Some specific search spiders do follow _robots.txt_ and are well behaved. However, many actors can reuse user agents, so the origin network ranges must be checked. + +The samples provide example network range fetching and rules for Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot. Network ranges can be loaded via fetched JSON / TXT / HTML pages, or via lists. You can filter these using _jq_ or a regex. @@ -363,117 +367,6 @@ services: ``` -## Challenges - -#### http - -Verify incoming requests against a specified backend to allow the user through. Cookies and some other headers are passed. - -For example, this allows verifying the user cookies against the backend to have the user skip all other challenges. - -Example on Forgejo, checks that current user is authenticated: -```yaml - http-cookie-check: - mode: http - url: http://forgejo:3000/user/stopwatches - # url: http://forgejo:3000/repo/search - # url: http://forgejo:3000/notifications/new - parameters: - http-method: GET - http-cookie: i_like_gitea - http-code: 200 -``` - -#### preload-link - -Requires HTTP/2+ response parsing and logic, silent challenge (does not display a challenge page). - -Browsers that support [103 Early Hints](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/103) are indicated to fetch a CSS resource via [Link](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Link) preload that solves the challenge. - -The server waits until solved or defined timeout, then continues on other challenges if failed. - -Example: -```yaml - self-preload-link: - condition: '"Sec-Fetch-Mode" in headers && headers["Sec-Fetch-Mode"] == "navigate"' - mode: "preload-link" - runtime: - # verifies that result = key - mode: "key" - probability: 0.1 - parameters: - preload-early-hint-deadline: 3s - key-code: 200 - key-mime: text/css - key-content: "" -``` - -#### header-refresh - -Requires HTTP response parsing and logic, displays challenge site instantly. - -Have the browser solve the challenge by following the URL listed on HTTP [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh) instantly. - - -#### meta-refresh - -Requires HTTP and HTML response parsing and logic, displays challenge site instantly. - -Have the browser solve the challenge by following the URL listed on HTML `` tag instantly. Equivalent to above. - -#### resource-load - -Requires HTTP and HTML response parsing and logic, displays challenge site. - -Servers a challenge page with a linked resource that is loaded by the browser, which solves the challenge. Page refreshes a few seconds later via [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh). - -Example: -```yaml - self-resource-load: - mode: "resource-load" - runtime: - # verifies that result = key - mode: "key" - probability: 0.1 - parameters: - key-code: 200 - key-mime: text/css - key-content: "" -``` - -#### cookie - -Requires HTTP parsing and a Cookie Jar, silent challenge (does not display a challenge page unless failed). - -Serves the client with a Set-Cookie that solves the challenge, and redirects it back to the same page. Browser must present the cookie to load. - -Several tools implement this, but usually not mass scrapers. - -#### js-pow-sha256 - -Requires JavaScript and workers, displays challenge site. - -Has the user solve a Proof of Work using SHA256 hashes, with configurable difficulty. - -Example: -```yaml - js-pow-sha256: - # Asset must be under challenges/{name}/static/{asset} - # Other files here will be available under that path - mode: js - asset: load.mjs - parameters: - # difficulty is number of bits that must be set to 0 from start - # Anubis challenge difficulty 5 becomes 5 * 8 = 20 - difficulty: 20 - runtime: - mode: wasm - # Verify must be under challenges/{name}/runtime/{asset} - asset: runtime.wasm - probability: 0.02 -``` - - ## Development