← Back to Blog

The 30 URLs Bots Hit First on Every Web Server (And How to Find Yours)

I spun up a clean t3.micro on a new AWS account last Tuesday, attached an Elastic IP, and pointed nginx at port 80 with nothing but a default landing page. Then I tailed the access log. The first scan hit at 00:00:47 from a Censys range. By the end of hour one, 1,847 distinct paths had been requested across 312 source IPs. Thirty of those paths accounted for 71% of the traffic.

This article is that top-30 list, what each path leaks when it's actually exposed, and the nginx rules to harden against the entire class. If you run anything on the public internet, this is the surface area the scanners care about. Most of it is harmless on a properly configured server. Most servers are not properly configured.

Why Bots Find You This Fast

The speed used to surprise me. It shouldn't. The internet has been thoroughly mapped, and the discovery pipeline is industrialized.

Censys and Shodan rescan the entire IPv4 space every few days. When a new IP starts serving HTTP, it shows up in their public dataset within hours, sometimes minutes. Anyone with a free API key can subscribe to "new HTTP servers" alerts and start scanning the moment a fresh target appears.

ZoomEye, FOFA, Hunter.how, BinaryEdge, Quake are regional or commercial equivalents. The aggregate effect is that no IP on the internet is undiscovered for more than a few hours.

Cheap or free compute means the scanners themselves are everywhere. A single $5/month VPS can scan tens of millions of paths per day. Botnets of compromised IoT and residential routers add another order of magnitude. Tor exit nodes round out the long tail.

The thing scanners are looking for is not your application. They're looking for a misconfiguration, a left-behind file, a default credential, an unpatched CMS. Those are common enough that the math works in their favor.

The Top 30 Paths, Ranked

This is the actual list from one week of logs on a fresh box, deduplicated and ranked by hit count. Your distribution will differ slightly — older domains get more WordPress traffic, IPs in cloud ranges get more cloud-credential paths — but the top of the list is remarkably stable across deployments.

 Rank  Path                                  Approximate hits/day
 ----  ------------------------------------  --------------------
   1.  /.env                                            420
   2.  /wp-login.php                                    310
   3.  /xmlrpc.php                                      290
   4.  /.git/config                                     245
   5.  /wp-admin/                                       210
   6.  /robots.txt                                      190
   7.  /phpmyadmin/                                     175
   8.  /admin/                                          160
   9.  /.git/HEAD                                       145
  10.  /server-status                                   120
  11.  /phpinfo.php                                     115
  12.  /administrator/                                  100
  13.  /config.json                                      95
  14.  /sitemap.xml                                      90
  15.  /backup.sql                                       85
  16.  /.well-known/security.txt                         80
  17.  /api/swagger.json                                 75
  18.  /wp-content/uploads/                              70
  19.  /.DS_Store                                        65
  20.  /actuator/                                        60
  21.  /.aws/credentials                                 55
  22.  /web.config                                       50
  23.  /.htaccess                                        50
  24.  /api/v1/users                                     45
  25.  /manager/html                                     40
  26.  /cgi-bin/                                         35
  27.  /jenkins/                                         30
  28.  /owa/                                             30
  29.  /.npmrc                                           25
  30.  /.ssh/id_rsa                                      20

The shape is informative. Configuration files (.env, .git/config, config.json, .npmrc) dominate. WordPress paths are the next-biggest cluster, even on servers that have never run WordPress. Admin panels (/admin/, /administrator/, /manager/html) round out the high-value targets. The long tail (Spring Boot actuator, Tomcat manager, Exchange OWA, Jenkins) hits less often but pays off bigger when it lands.

What Each Class Actually Leaks

The hit doesn't matter if your server returns 404. What matters is what happens when one of these returns 200.

/.env returning 200

You served your application's environment file. Inside is almost certainly: database credentials, AWS keys, third-party API keys (Stripe, SendGrid, Twilio, OpenAI), JWT signing secrets, OAuth client secrets. Every one is exploitable in minutes. Our hour-by-hour timeline of an .env leak walks through what the bots do next.

/.git/config and /.git/HEAD returning 200

Your entire git history is exposed. The attacker can clone it with git-dumper or wget the objects directly. That gives them your full source code, every secret that was ever committed (even if you later removed it), every email of every contributor, and your deployment configuration. Worse than .env, because the history goes back years.

/phpinfo.php returning 200

You exposed the PHP runtime configuration. The output includes installed extensions, file paths, environment variables (including SCRIPT_FILENAME, often with the document root), database credentials if they're in php.ini, and the exact PHP version with patch level for targeted CVE selection.

/actuator/ or /actuator/env returning 200

Spring Boot in production with management endpoints exposed. The env endpoint leaks every Spring property — including database passwords, encryption keys, and OAuth secrets. The heapdump endpoint downloads a full JVM memory snapshot containing in-flight tokens and session data. Several real breaches in the last few years started here.

/.aws/credentials returning 200

You shipped your developer machine's AWS profile to production. The profile contains a long-lived access key and secret. Validated and abused inside an hour.

/wp-admin returning a login form

This is the noisy class — brute-force attempts on default credentials, exploit attempts against unpatched plugins, XML-RPC pingback abuse. The protection is keeping WordPress patched, locking down xmlrpc.php, and putting the admin behind an IP allowlist or auth proxy.

Test Your Own Server in 60 Seconds

This one-liner checks all 30 paths and reports any that return something other than 404 or 403. Run it against your own infrastructure with permission. Substitute your domain:

DOMAIN="https://yourdomain.example"
PATHS=(
  "/.env" "/wp-login.php" "/xmlrpc.php" "/.git/config" "/wp-admin/"
  "/robots.txt" "/phpmyadmin/" "/admin/" "/.git/HEAD" "/server-status"
  "/phpinfo.php" "/administrator/" "/config.json" "/sitemap.xml"
  "/backup.sql" "/.well-known/security.txt" "/api/swagger.json"
  "/wp-content/uploads/" "/.DS_Store" "/actuator/" "/.aws/credentials"
  "/web.config" "/.htaccess" "/api/v1/users" "/manager/html"
  "/cgi-bin/" "/jenkins/" "/owa/" "/.npmrc" "/.ssh/id_rsa"
)
for p in "${PATHS[@]}"; do
  code=$(curl -s -o /dev/null -w "%{http_code}" -L --max-time 5 "${DOMAIN}${p}")
  if [[ "$code" != "404" && "$code" != "403" ]]; then
    echo "[!]  ${code}  ${DOMAIN}${p}"
  fi
done

Anything that returns 200, 301, 302, or any 5xx is worth investigating. SecureBin's exposure checker runs a more thorough version of this without you having to script it.

Find What Your Server Is Leaking Right Now

The SecureBin Exposure Checker scans your domain for the most common misconfigurations in seconds. Free, no signup. Get a report you can hand to your team.

Run Exposure Check

The Right Way to Fix Each Class

You don't need 30 different fixes. The classes collapse into four hardening patterns.

1. Block dotfiles and VCS metadata at the web server

This single nginx block eliminates the largest chunk of leakage:

location ~ /\. {
    deny all;
    return 404;
}
location ~ /\.git { deny all; return 404; }
location ~ /\.env { deny all; return 404; }
location ~ /\.aws { deny all; return 404; }
location ~ /\.ssh { deny all; return 404; }
location ~ /\.DS_Store { deny all; return 404; }
location ~ /\.htaccess { deny all; return 404; }
location ~ /\.npmrc { deny all; return 404; }

Return 404, not 403. A 403 confirms the file exists; a 404 doesn't. For Apache, the equivalent is <FilesMatch> directives in httpd.conf or per-directory .htaccess.

2. Don't ship .git, .env, or backup dumps to production

The web server block is defense-in-depth. The actual fix is not having those files in the document root at all.

# In your CI/CD or Dockerfile, ensure these never make it to /var/www
# Example .dockerignore
.git
.env
.env.*
.aws/
*.sql
*.dump
node_modules/.cache/
.DS_Store
.vscode/
.idea/

3. Put admin panels behind an auth proxy or IP allowlist

WordPress admin, phpMyAdmin, Jenkins, Spring Boot Actuator, Tomcat Manager, Exchange OWA — none of these should be on the public internet without an authentication layer in front. Options: Cloudflare Access (free for up to 50 users), Tailscale, an IP allowlist on the load balancer, or a self-hosted oauth2-proxy in front. Cost is low, payoff is enormous.

4. Patch the CMS, disable XML-RPC, rate-limit /login

For WordPress specifically:

location = /xmlrpc.php {
    return 444;
}
location = /wp-login.php {
    limit_req zone=login burst=3 nodelay;
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/.htpasswd-admin;
    # ...
}

The brute-force traffic against /wp-login.php is the single biggest source of WordPress compromises.

What Attackers Do Once They Find an Open Path

Detection of an open path is the start, not the end, of the incident. The standard chain:

  1. Path returns 200, attacker downloads the file.
  2. Credentials inside the file get extracted and validated (see the .env leak timeline for the validation pipeline).
  3. Validated credentials get used to pivot into cloud accounts, databases, or source code repositories.
  4. Lateral movement continues until either the credential's blast radius is exhausted or the attacker has set up persistence.

If you discover a leaked path, the response is not just "fix the server config." You also have to rotate everything the file contained, audit access logs for the period the path was open, and verify nothing was used downstream. Our leaked credentials playbook has the rotation runbook.

When the response involves sharing credentials between your team — the on-call who found it, the engineer who owns the service, the security lead — do not paste them into Slack. Use an encrypted, expiring share. SecureBin exists for this exact moment.

Monitoring for Scanner Traffic

You won't stop scanners from probing. You can detect when a probe lands on something real.

fail2ban on the server with a jail for repeated 404s from the same IP. The default recidive jail catches the most aggressive scanners.

CloudWatch Logs Insights query for nginx access logs:

fields @timestamp, c_ip, path, status
| filter path like /(\.env|\.git|wp-admin|phpmyadmin|actuator|admin)/
| filter status >= 200 and status < 400
| stats count() by path
| sort by count desc

Any non-404 result on those paths is worth investigating immediately.

Cloudflare WAF managed ruleset. The free tier blocks the highest-confidence scanner patterns. The paid tier's WAF rules cover most of the long tail.

Share Leak Details With Your Team Without Re-Leaking Them

When you find an exposed credential, the next problem is getting it to whoever rotates it. SecureBin's encrypted expiring paste keeps the leaked value out of Slack history. AES-256, burn-after-read, TTL control.

Create Encrypted Paste

Frequently Asked Questions

Should I return 404 or 403 for blocked paths?

404. A 403 confirms the file exists, which is useful intel for the attacker. A 404 doesn't distinguish "blocked" from "never existed" and gives them no signal.

Will renaming or hiding the admin path stop the scanner?

Partially. Scanners check /admin/, /administrator/, /wp-admin/, /login/, plus a few dozen common variants. If you move to /secret-admin-9472/, you'll stop the dumb scanners. The smart ones enumerate via your sitemap, robots.txt, or by fingerprinting the application. Security through obscurity reduces noise, doesn't eliminate risk. Combine with auth.

Are these scans illegal?

Mostly no, in most jurisdictions. Sending HTTP requests to a public IP is generally not unauthorized access until you actually attempt to access protected content. Once a scanner finds an open path and downloads sensitive data, you're into computer-fraud territory. But the probing itself is broadly tolerated.

What about bots that pretend to be Googlebot or Bingbot?

Common. The protection is to verify the bot via reverse DNS — Google publishes verification instructions. Anything claiming to be Googlebot from an unverifiable PTR record is fake. Cloudflare Bot Management does this automatically.

Does putting my admin panel on a random port help?

Marginally. Internet-wide scanners do cover non-standard ports (Shodan and Censys scan the top ~20 ports including 8080, 8443, 8000, 8888, 9000, 9090, 7777). Less probed than 80/443, but not invisible. Auth is still the answer.

How often should I rerun the path-check script?

On every deploy that could affect web server config. At minimum, monthly. A new container layer or a misapplied nginx config can re-expose paths you previously closed.

Key Takeaways

  • 30 paths account for the majority of scanner traffic on any public web server. /.env and /.git/config top the list because they pay off the biggest.
  • The fix isn't hardening 30 paths; it's four hardening patterns — deny dotfiles, don't ship sensitive files, put admin behind auth, patch the CMS.
  • Return 404, not 403, for blocked paths. Don't confirm existence.
  • Run the path-check script against your own infrastructure after every deploy. Catching a regression manually beats discovering one in a breach report.
  • When you find a leak, the response involves rotation plus secure sharing inside your team. Don't move the leaked values into Slack while fixing them.

The scanners aren't going away. Your job is to make them find nothing.

Related reading: The Danger of Exposed .env Files, What Happens When .env Files Leak, Detect Secrets in GitHub Repositories, Leaked AWS Credentials Playbook, Kubernetes Security Best Practices. Related tools: Exposure Checker, SSL Checker, Whois Lookup, DNS Lookup, Threat Map, 70+ more free tools.

UK
Written by Usman Khan
DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.