Reorganize README
This commit is contained in:
57
README.md
57
README.md
@@ -15,7 +15,7 @@ The tool is designed highly flexible so the operator can minimize impact to legi
|
|||||||
|
|
||||||
See _[Why?](#why)_ section for the challenges and reasoning behind this tool.
|
See _[Why?](#why)_ section for the challenges and reasoning behind this tool.
|
||||||
|
|
||||||
This documentation and go-away are in active development. See [What's left?](#whats-left) section for a breakdown.
|
This documentation and go-away are in active development. See [What's left?](#what-s-left) section for a breakdown.
|
||||||
|
|
||||||
## Support
|
## Support
|
||||||
|
|
||||||
@@ -205,6 +205,31 @@ In case a client connects over IPv4 first then IPv6 due to [Fast Fallback / Happ
|
|||||||
|
|
||||||
This is tracked by tagging challenges with a readable flag indicating the type of address.
|
This is tracked by tagging challenges with a readable flag indicating the type of address.
|
||||||
|
|
||||||
|
## Example policies
|
||||||
|
|
||||||
|
### Forgejo
|
||||||
|
|
||||||
|
The policy file at [examples/forgejo.yml](examples/forgejo.yml) provides a ready template to be used on your own Forgejo instance.
|
||||||
|
|
||||||
|
Important notes:
|
||||||
|
* Edit the `homesite` rule, as it's targeted to common users or orgs on the instance. A better regex might be possible in the future.
|
||||||
|
* Edit the `http-cookie-check` challenge, as this will fetch the listed backend with the given session cookie to check for user login.
|
||||||
|
* Adjust the desired blocked networks or others. A template list of network ranges is provided, feel free to remove these if not needed.
|
||||||
|
* Check the conditions and base rules to change your challenges offered and other ordering.
|
||||||
|
* By default Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot are allowed by useragent and network ranges.
|
||||||
|
|
||||||
|
### Generic
|
||||||
|
|
||||||
|
The policy file at [examples/generic.yml](examples/generic.yml) provides a baseline to place on any site, that can be modified to fit your needs.
|
||||||
|
|
||||||
|
Important notes:
|
||||||
|
* Edit the `homesite` rule, as it's targeted to pages you always want to have available, like landing pages.
|
||||||
|
* Edit the `is-static-asset` condition or the `allow-static-resources` rule to allow static file access as necessary.
|
||||||
|
* If you have an API, add a PASS rule targeting it.
|
||||||
|
* Check the conditions and base rules to change your challenges offered and other ordering.
|
||||||
|
* Add or modify rules to target specific pages on your site as desired.
|
||||||
|
* By default Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot are allowed by useragent and network ranges.
|
||||||
|
|
||||||
## Why?
|
## Why?
|
||||||
In the past few years this small git instance has been hit by waves and waves of scraping.
|
In the past few years this small git instance has been hit by waves and waves of scraping.
|
||||||
This was usually fought back by random useragent blocks for bots that did not follow [robots.txt](/robots.txt), until the past half year, where low-effort mass scraping was used more prominently.
|
This was usually fought back by random useragent blocks for bots that did not follow [robots.txt](/robots.txt), until the past half year, where low-effort mass scraping was used more prominently.
|
||||||
@@ -241,38 +266,12 @@ go-away may not be as straight to configure as Anubis but this was chosen to red
|
|||||||
|
|
||||||
Yes, they can. At the moment their spray-and-pray approach is cheap for them.
|
Yes, they can. At the moment their spray-and-pray approach is cheap for them.
|
||||||
|
|
||||||
If they have to start adding an active browser in their scraping, that makes their collection expensive and slow.
|
If they have to start adding an active browser in their scraping, that makes their collection expensive and slow.
|
||||||
|
|
||||||
This would more or less eliminate the high rate low effort passive scraping and replace it with an active model.
|
This would more or less eliminate the high rate low effort passive scraping and replace it with an active model.
|
||||||
|
|
||||||
go-anubis offers a highly configurable set of challenges and rules that you can adapt to new ways.
|
go-anubis offers a highly configurable set of challenges and rules that you can adapt to new ways.
|
||||||
|
|
||||||
## Example policies
|
|
||||||
|
|
||||||
### Forgejo
|
|
||||||
|
|
||||||
The policy file at [examples/forgejo.yml](examples/forgejo.yml) provides a ready template to be used on your own Forgejo instance.
|
|
||||||
|
|
||||||
Important notes:
|
|
||||||
* Edit the `homesite` rule, as it's targeted to common users or orgs on the instance. A better regex might be possible in the future.
|
|
||||||
* Edit the `http-cookie-check` challenge, as this will fetch the listed backend with the given session cookie to check for user login.
|
|
||||||
* Adjust the desired blocked networks or others. A template list of network ranges is provided, feel free to remove these if not needed.
|
|
||||||
* Check the conditions and base rules to change your challenges offered and other ordering.
|
|
||||||
* By default Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot are allowed by useragent and network ranges.
|
|
||||||
|
|
||||||
### Generic
|
|
||||||
|
|
||||||
The policy file at [examples/generic.yml](examples/generic.yml) provides a baseline to place on any site, that can be modified to fit your needs.
|
|
||||||
|
|
||||||
Important notes:
|
|
||||||
* Edit the `homesite` rule, as it's targeted to pages you always want to have available, like landing pages.
|
|
||||||
* Edit the `is-static-asset` condition or the `allow-static-resources` rule to allow static file access as necessary.
|
|
||||||
* If you have an API, add a PASS rule targeting it.
|
|
||||||
* Check the conditions and base rules to change your challenges offered and other ordering.
|
|
||||||
* Add or modify rules to target specific pages on your site as desired.
|
|
||||||
* By default Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot are allowed by useragent and network ranges.
|
|
||||||
|
|
||||||
|
|
||||||
## What's left?
|
## What's left?
|
||||||
|
|
||||||
go-away has most of the desired features from the original checklist that was made in its development.
|
go-away has most of the desired features from the original checklist that was made in its development.
|
||||||
|
|||||||
Reference in New Issue
Block a user