Hoogle appears to be down

Hoogle appears to perhaps be slow because it is getting spammed with requests like the following. Anybody have an idea what could be generating them? Possibly some ide tool? We’d have to improve our logs to capture more traffic if we wanted to narrow it down further…

/?hoogle=Monad%20-is%3Amodule%20-package%3Ado-notation%20-package%3Agithub%20-package%3Adistribution-opensuse%20-package%3Abase%20-package%3Amorpheus-graphql-code-gen-utils%20-package%3ALambdaHack%20-package%3Atermonad%20-package%3Aghc%20-package%3Ario%20-package%3Abasement%20-package%3Aloc%20-package%3Aquaalude%20-package%3Ahaskell-gi-base%20-package%3Aclassy-prelude%20-package%3Abasic-prelude%20-package%3Amassiv-test
/?hoogle=Monad%20-is%3Amodule%20-package%3Ado-notation%20-package%3Agithub%20-package%3Adistribution-opensuse%20-package%3Abase%20-package%3Amorpheus-graphql-code-gen-utils%20-package%3ALambdaHack%20-package%3Atermonad%20-package%3Aghc%20-package%3Ario%20-package%3Abasement%20-package%3Aloc%20-package%3Amixed-types-num%20-package%3Ahedgehog%20-package%3Acabal-install-solver%20-package%3Aamazonka-core%20-package%3Aaudacity
/?hoogle=Monad%20-is%3Amodule%20-package%3Ado-notation%20-package%3Agithub%20-package%3Adistribution-opensuse%20-package%3Abase%20-package%3Amorpheus-graphql-code-gen-utils%20-package%3ALambdaHack%20-package%3Atermonad%20-package%3Aghc%20-package%3Ario%20-package%3Abasement%20-package%3Aloc%20-package%3Aclassy-prelude%20-package%3Ahledger-web%20-package%3Aprelude-compat%20-package%3Ayesod-paginator%20-package%3Acopilot-language
/?hoogle=Monad%20-is%3Amodule%20-package%3Ado-notation%20-package%3Agithub%20-package%3Adistribution-opensuse%20-package%3Abase%20-package%3Amorpheus-graphql-code-gen-utils%20-package%3ALambdaHack%20-package%3Atermonad%20-package%3Aghc%20-package%3Ario%20-package%3Abasement%20-package%3Aloc%20-package%3Anumeric-prelude%20-package%3Astack%20-package%3Abase-compat-batteries%20-package%3Aaudacity%20-package%3Arelude
10 Likes

Are these coming from the same IP addr.? Possible to rate limit requests per client? By no means fool proof, but should avoid the simpler abuse cases like these.

1 Like

After some investigation it appears hoogle (like the rest of the internet) has been getting hammered by multiple AI bot scrapers – from everyone from apple to bytedance to chatgpt to petalbot and beyond (and some malicious ones that don’t even announce they’re scraperbots).

We turned on cloudflare protection and used its relatively new AI bot protection features to improve things quite a bit: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

Wtill getting some malicious cloaked scrapers and some homegrown stuff scraping for exploits (ie. aws credentials or the like) but performance should be much improved. Thanks to all those on the irc channel that gave suggestions and advice!

26 Likes

Now that there’s a Cloudflare, it seems my CI can’t build with stack anymore. Could this be related?

stack error in CI
HttpExceptionRequest Request {
  host                 = "stackage-haddock.haskell.org"
  port                 = 443
  secure               = True
  requestHeaders       = [("Accept","application/json"),("User-Agent","The Haskell Stack")]
  path                 = "/snapshots.json"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
 (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, <REDACTED>, responseOriginalRequest = Request {
  host                 = "stackage-haddock.haskell.org"
  port                 = 443
  secure               = True
  requestHeaders       = [("Accept","application/json"),("User-Agent","The Haskell Stack")]
  path                 = "/snapshots.json"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
, responseEarlyHints = []}) "<!DOCTYPE html><html lang=\"en-US\"><head><title>Just a moment...</title><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\"><meta name=\"robots\" content=\"noindex,nofollow\"><meta name=\"viewport\" content=\"width=device-width,initial-scale=1\"><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji}body{display:flex;flex-direction:column;height:100vh;min-height:100vh}.main-content{margin:8rem auto;max-width:60rem;padding-left:1.5rem}@media (width <= 720px){.main-content{margin-top:4rem}}.h2{font-size:1.5rem;font-weight:500;line-height:2.25rem}@media (width <= 720px){.h2{font-size:1.25rem;line-height:1.5rem}}#challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+);background-repeat:no-repeat;background-size:contain;padding-left:34px}@media (prefers-color-scheme:dark){body{background-color:#222;color:#d9d9d9}}</style><meta http-equiv=\"refresh\" content=\"390\"></head><body class=\"no-js\"><div class=\"main-wrapper\" role=\"main\"><div class=\"main-content\"><noscript><div class=\"h2\"><span id=\"challenge-error-text\">Enable JavaScript and cookies to continue</span></div></noscript></div></div><script>(function(){window._cf_chl_opt={cvId: '3',cZone: \"stackage-haddock.haskell.org\",cType: 'managed',cRay: '8dda3d3ba9d91167',cH: '_kINbS_ICU7mBmdzymj1XqlztJghP2J")
curl gives the same
curl -v 'https://stackage-haddock.haskell.org' -H 'Accept: application/json'
*   Trying 104.18.27.60:443...
* Connected to stackage-haddock.haskell.org (104.18.27.60) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=stackage-haddock.haskell.org
*  start date: Oct 28 22:29:22 2024 GMT
*  expire date: Jan 26 23:29:18 2025 GMT
*  subjectAltName: host "stackage-haddock.haskell.org" matched cert's "stackage-haddock.haskell.org"
*  issuer: C=US; O=Google Trust Services; CN=WE1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x5af14a67ceb0)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET / HTTP/2
> Host: stackage-haddock.haskell.org
> user-agent: curl/7.81.0
> accept: application/json
> 
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 403 
< date: Tue, 05 Nov 2024 04:56:40 GMT
< content-type: text/html; charset=UTF-8
< content-length: 4516
< x-frame-options: SAMEORIGIN
< referrer-policy: same-origin
< cache-control: max-age=15
< expires: Tue, 05 Nov 2024 04:56:55 GMT
< set-cookie: <REDACTED>
< server: cloudflare
< cf-ray: 8dda4237fe31d3ed-KIX
< 
* TLSv1.2 (IN), TLS header, Supplemental data (23):
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" /><![endif]-->
<style>body{margin:0;padding:0}</style>


<!--[if gte IE 10]><!-->
<script>
  if (!navigator.cookieEnabled) {
    window.addEventListener('DOMContentLoaded', function () {
      var cookieEl = document.getElementById('cookie-alert');
      cookieEl.style.display = 'block';
    })
  }
</script>
<!--<![endif]-->


</head>
<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
        <h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> haskell.org</h2>
      </div><!-- /.header -->

      <div class="cf-section cf-highlight">
        <div class="cf-wrapper">
          <div class="cf-screenshot-container cf-screenshot-full">
            
              <span class="cf-no-screenshot error"></span>
            
          </div>
        </div>
      </div><!-- /.captcha-container -->

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>

            <p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>

* TLSv1.2 (IN), TLS header, Supplemental data (23):
            <p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
          </div>
        </div>
      </div><!-- /.section -->

      <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
  <p class="text-13">
    <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">8dda4237fe31d3ed</strong></span>
    <span class="cf-footer-separator sm:hidden">&bull;</span>
    <span id="cf-footer-item-ip" class="cf-footer-item hidden sm:block sm:mb-1">
      Your IP:
      <button type="button" id="cf-footer-ip-reveal" class="cf-footer-ip-reveal-btn">Click to reveal</button>
      <span class="hidden" id="cf-footer-ip">133.106.224.129</span>
      <span class="cf-footer-separator sm:hidden">&bull;</span>
    </span>
    <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" target="_blank">Cloudflare</a></span>
    
  </p>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
  <script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->


    </div><!-- /#cf-error-details -->
  </div><!-- /#cf-wrapper -->

  <script>
  window._cf_translation = {};
  
  
</script>

</body>
</html>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection #0 to host stackage-haddock.haskell.org left intact

Looks related (same IP, time). Even message in HTML support this claim. Looks like cloudflare rules are too strict.

sclv fixed Hoogle, not Stackage. I’ll look into the Stackage issue to see if they are related problems, though. (It seems possible, but not certain).

3 Likes

Thanks for the report and sorry for the trouble. In my haste to block bots I overtuned some settings. I dialed those back down and chreekat also added some exceptions specifically to stackage. We do expect automated processes to access our sites, just not malicious AI crawlers – important to distinguish!

5 Likes

Any updates? My CI pipeline is down with this error:

No information from Hackage index, updating
Selected mirror https://hackage.haskell.org/
Downloading root
HttpExceptionRequest Request {
  host                 = "hackage.haskell.org"
  port                 = 443
  secure               = True
  requestHeaders       = [("Accept-Encoding",""),("User-Agent","Haskell pantry package")]
  path                 = "/root.json"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}

I think this is yet a third outage, this time of Hackage. (The first two were Hoogle and then Stackage).

1 Like