Zero Day Diaries

Breaking Down Security, Bit by Bit

[Project] Autonomous Recon Agent with LLMs for Hack The Box

Introduction

Reconnaissance is the backbone of any successful penetration test or red team engagement. Yet, it’s often a tedious and repetitive process: run a bunch of tools, parse messy output, figure out what’s important, and decide the next steps. What if you could automate all of that, and make it smart?

That’s what this project is about: a self-triaging recon agent that uses Large Language Models (LLMs) to analyze tool output, summarize findings, recommend follow-ups, and even suggest possible CVEs — all in a fully automated workflow.

Why This Matters

There are plenty of recon scripts and tools out there, but very few do intelligent triage. This project:

  • Automates noisy, repetitive recon workflows
  • Adds logic and insight using LLMs (Groq / openAI / Ollama)
  • Structures and stores output cleanly for review
  • Suggests relevant next steps and possible exploits

This isn’t just a tool — it’s an assistant.

High-Level Architecture

[Host System]
└── start.sh
    ├── Validates OVPN + target IP
    ├── Builds Docker image
    └── Runs Docker container

[Inside Container]
└── agent.py
    ├── Establishes VPN
    ├── Runs nmap (-sC -sV -p-)
    ├── Captures and summarizes output
    ├── Calls LLM for triage suggestions
    ├── Runs follow-up tools (e.g., gobuster, ffuf)
    ├── Maps services to CVEs using searchsploit
    └── Generates markdown executive summary

LLM-Driven Intelligence

The real magic comes from tight LLM integration:

  • Input: Raw output from nmapgobusternikto, etc.
  • Prompt Engineering: Strong constraints enforce JSON output: summary, recommended commands, and discovered services.
  • Repair Logic: Malformed responses are automatically fixed via secondary LLM call.
  • CVE Mapping: Services found are piped into searchsploit, then wrapped into a final executive summary.
  • Example output from a post-nmap step
{
  "summary": "- Apache 2.4.41 found on port 80.\n- Potential directory listing enabled.",
  "recommended_steps": ["gobuster dir -u http://10.10.10.10 -w wordlist.txt"],
  "services_found": ["apache 2.4.41"]
}

Getting Started

Prerequisites
  • Docker
  • Python 3.8+
  • A Hack The Box VPN (.ovpn file)
  • API key for Groq OR a running Ollama instance
Setup Steps
git clone https://github.com/jackhax/htb_recon_agent
cd htb-recon-agent

# Create and fill in your .env file
cp .env.example .env

# Example .env (Groq)
LLM_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXX
LLM_PROVIDER=groq
MODEL=meta-llama/llama-4-scout-17b-16e-instruct
OLLAMA_HOST=http://host.docker.internal:11434 #if using ollama

# Run recon against a box
./start.sh --force-build 10.10.11.123 path/to/htb.ovpn machinename
Output Directory
triage/10.10.11.123/
├── nmap.txt
├── gobuster.txt
├── summary.md
├── exploits.txt
└── summary_exec.md

Challenges Faced

  • Docker disk space limitations (solved via phased apt installs)
  • Handling unstructured tool output (e.g., gobuster flooding)
  • Forcing LLMs to behave predictably (prompt design is key!)
  • Connection edge cases for Ollama in Docker (solved with host.docker.internal)

What’s Next?

  • Add nuclei or jaeles for more automated vuln scanning
  • Automatically detect CMSes and invoke wpscanjoomscan
  • Export report to PDF or HTML with styling
  • Build a simple dashboard to review multiple boxes
  • Allow attack simulation or flag enumeration as a future module

Conclusion

This agent saves hours of manual work, reduces human error, and makes recon actually fun again. Whether you’re grinding away on Hack The Box or working through a red team engagement, this approach can drastically improve your workflow.

You can find the full code and setup instructions here: https://github.com/jackhax/htb_recon_agent