日誌中奇怪的“GET /api/levels/”和“GET /play/”請求

April 11, 2013

我已經設置了新的 Amazon EC2 實例。一兩天后，大約 10 秒內就有一個來自“類似 google bot”的 IP（例如 66.249.76.84、66.249.74.152）的奇怪“GET”請求（一些範例）：
66.249.74.152 - - [10/Apr/2013:06:05:02 +0000] "GET /play/gp4GbjXBD4B3?sh=04f2fd19ae2dd623e7135d29a1894f03&sh=f172a32c89190e28f9c27123d7c6cf43&sh=04f2fd19ae2dd623e7135d29a1894f03 HTTP/1.1" 404 295 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"    
66.249.76.84 - - [11/Apr/2013:03:51:44 +0000] "GET /api/levels/2ry7ZAh0Y91r HTTP/1.1" 404 295 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
他們正在檢查文件夾中的雜湊值，例如
/play/'some_hash_here'
/profile/'some_hash_here'
/level/'some_hash_here'
/api/'some_hash_here'
我在這個網站上從來沒有這樣的文件夾。但要對此做些什麼，我試圖在 robots.txt 中阻止它們
User-agent: *
Disallow: 
Crawl-delay: 120
Disallow: /play
Disallow: /profile
Disallow: /level
但它根本沒有幫助，它只是不閱讀 robots.txt。為了擺脫他們在我的 error_log 文件中提供的所有混亂，我在 .htaccess 文件中創建了這樣的規則
Redirect 301 /play 'some_other_site'
Redirect 301 /level 'some_other_site'
Redirect 301 /profile 'some_other_site'
Redirect 301 /api 'some_other_site'
此外，我發現了一些真正的 google bot 爬取我的網站的痕跡，它的行為非常正常：它隻請求在我的網站頁面上有連結的頁面。我怎樣才能擺脫這種欺詐掃描？

這些 IP 是 Google IP，因此它們很可能是合法的 GoogleBot 點擊。
我不會擔心他們。他們不太可能進行黑客攻擊。相反，最可能的情況是您的伺服器的 IP 以前是具有這些 URL 的另一個網站的 IP。這在 Amazon EC2 上相當普遍，因為它們的 IP 地址具有浮動性質。

行。我不知道它是什麼，也不知道它想要什麼，但我想我在fail2ban包的基礎上找到了解決方案。

引用自：https://serverfault.com/questions/498422

日誌中奇怪的“GET /api/levels/”和“GET /play/”請求

相關問答

所有這些請求是什麼？

配置 Ubuntu 20.04 伺服器

在 apache 的路徑中拒絕帶有父 .. 的 URL

處理線上支付的 Web 應用程序的 IT 考慮因素？

發送電子郵件活動後從 Microsoft Exchange Online Protection IP 範圍獲得大量 Web 流量，我該如何緩解？

為網站強制執行 128 位 SSL 是否有不利之處