如何防止機器人試圖猜測我網站上的連結

March 7, 2012

我最近安裝的 logwatch 報告顯示了這個：

--------------------- httpd Begin ------------------------
0.78 MB transferred in 5864 responses  (1xx 0, 2xx 4900, 3xx 0, 4xx 964, 5xx 0)
160 Images (0.16 MB),
857 Content pages (0.62 MB),
4847 Other (0.00 MB)

Requests with error response codes
404 Not Found
 /%E2%80%98planeat%E2%80%99-film-explores-l ... greenfudge-org/: 1 Time(s)
 /10-foods-to-add-to-the-brain-diet-to-help ... -function/feed/: 1 Time(s)
 /10-ways-to-reboot-your-body-with-healthy- ... s-and-exercise/: 1 Time(s)
 /bachmann-holds-her-ground-against-raising ... com-blogs/feed/: 1 Time(s)
 /behind-conan-the-barbarians-diet/: 1 Time(s)
 /tag/dietitian/: 1 Time(s)
 /tag/diets/page/10/: 1 Time(s)
 /tag/directory-products/feed/: 1 Time(s)
 /wp-content/uploads/2011/06/1309268736-49.jpg: 1 Time(s)
 /wp-content/uploads/2011/06/1309271430-30.jpg: 1 Time(s)
 /wp-content/uploads/2011/06/1309339847-35.jpg: 1 Time(s)

我在這裡的註釋：確實有很多像上面這樣的請求，為了清楚起見，我只粘貼了一些。

 A total of 12 ROBOTS were logged
 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 2 Time(s)
 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 5 Time(s)
 Twitterbot/1.0 1 Time(s)
 Mozilla/5.0 (compatible; AhrefsBot/2.0; +http://ahrefs.com/robot/) 4 Time(s)
 Sosospider+(+http://help.soso.com/webspider.htm) 3 Time(s)
 msnbot/2.0b (+http://search.msn.com/msnbot.htm)._ 1 Time(s)
 Mozilla/5.0 (compatible; MJ12bot/v1.4.2; http://www.majestic12.co.uk/bot.php?+) 1    Time(s)
 msnbot-media/1.1 (+http://search.msn.com/msnbot.htm) 77 Time(s)
 Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com) 1 Time(s)
 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 17 Time(s)
 Baiduspider+(+http://www.baidu.com/search/spider.htm) 11 Time(s)
 Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/)    Gecko/2009032608 Firefox/3.0.8 1 Time(s)
 ---------------------- httpd End -------------------------

所以，我認為這是某種機器人（並且可能是上面列出的機器人之一），所以請您指導我如何防止他們猜測連結以希望找到內容？

**編輯：**因為我擁有一台 VPS 伺服器，所以上面有很多域。您能告訴我如何知道特定的 404 發生在哪個域上嗎？例如這一行：/tag/dietitian/

你真的不能阻止普通使用者猜測連結。正確保護您的內容，這無論如何都不是問題。
晦澀的連結不是隱藏內容的安全方法。
你可以確保你有一個正確配置的 robots.txt - 這將阻止大多數合法的機器人。

一種方法是使用fail2ban並對其進行配置以滿足您的需求。簡而言之：在它的其他功能中，fail2ban 可以跟踪您的 Apache 訪問日誌，並且在 X 數量的 Y 種匹配之後可以通過阻止客戶端 IP XX 分鐘來使訪問客戶端受到 Z 分鐘的懲罰。
通常足以嚇跑機器人，但請注意，如果您不夠小心，這很可能會阻止合法使用者。

引用自：https://serverfault.com/questions/367159

如何防止機器人試圖猜測我網站上的連結

相關問答

如果找不到請求的文件，Apache httpd 會使用靜態資源提供 200 響應

如何指示 Apache HTTPd 僅服務於特定 IP 地址組

這個可能的 Apache 漏洞是什麼，我受到影響嗎？

我在 apache httpd conf 文件中進行了重寫，這破壞了 certbot。有沒有辦法改變它，使它不會？

systemctl restart httpd 啟動失敗 Apache HTTP Server httpd pid 已經在執行

如何在 centos 8 VPS 的 apache 伺服器中啟用/載入 mod_wsgi