## CTI Agent

## Set up

In [1]:
%%capture --no-stderr
%pip install --quiet -U langgraph langchain-community langchain-google-genai langchain-tavily


[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import getpass
import os

def set_env_variable(var_name):
    if var_name not in os.environ:
        os.environ[var_name] = getpass.getpass(f"{var_name}=")

set_env_variable("GEMINI_API_KEY")
set_env_variable("TAVILY_API_KEY")

### CTI Agent

In [3]:
from typing import List
from typing_extensions import TypedDict

class ReWOO(TypedDict):
    task: str
    plan_string: str
    steps: List
    results: dict
    result: str

#### Planner

In [4]:
from langchain_google_genai import GoogleGenerativeAI

llm = GoogleGenerativeAI(model="gemini-2.5-flash", api_key=os.environ["GEMINI_API_KEY"])

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
prompt = """For the following task, make plans that can solve the problem step by step. For each plan, indicate \
which external tool together with tool input to retrieve evidence. You can store the evidence into a \
variable #E that can be called by later tools. (Plan, #E1, Plan, #E2, Plan, ...)

Tools can be one of the following:
(1) Google[input]: Worker that searches results from Google. Useful when you need to find short
and succinct answers about a specific topic. The input should be a search query.
(2) LLM[input]: A pretrained LLM like yourself. Useful when you need to act with general
world knowledge and common sense. Prioritize it when you are confident in solving the problem
yourself. Input can be any instruction.

For example,
Task: Thomas, Toby, and Rebecca worked a total of 157 hours in one week. Thomas worked x
hours. Toby worked 10 hours less than twice what Thomas worked, and Rebecca worked 8 hours
less than Toby. How many hours did Rebecca work?
Plan: Given Thomas worked x hours, translate the problem into algebraic expressions and solve
with Wolfram Alpha. #E1 = WolframAlpha[Solve x + (2x ‚àí 10) + ((2x ‚àí 10) ‚àí 8) = 157]
Plan: Find out the number of hours Thomas worked. #E2 = LLM[What is x, given #E1]
Plan: Calculate the number of hours Rebecca worked. #E3 = Calculator[(2 ‚àó #E2 ‚àí 10) ‚àí 8]

Begin!
Describe your plans with rich details. Each Plan should be followed by only one #E.

Task: {task}"""

In [6]:
task = "What are the latest CTI reports of the ATP that uses the T1566.002: Spearphishing Links techniques?"

In [7]:
result = llm.invoke(prompt.format(task=task))

In [8]:
print(result)

Plan: Search for the latest CTI reports that specifically mention ATP groups using the T1566.002: Spearphishing Links technique. I will prioritize recent publications.
#E1 = Google[latest CTI reports ATP T1566.002 Spearphishing Links]
Plan: Review the search results from #E1 to identify relevant reports from reputable cybersecurity intelligence sources. I will look for titles or snippets that indicate a focus on ATP activities and the specified MITRE ATT&CK technique. I will then extract the most pertinent information about the ATPs and their use of T1566.002.
#E2 = LLM[Analyze the search results from #E1 to identify specific CTI reports (title, source, date) that discuss ATPs using T1566.002: Spearphishing Links. Summarize the key findings from these reports, mentioning any specific ATP groups identified.]


#### Planner Node

In [9]:
import re

from langchain_core.prompts import ChatPromptTemplate

# Regex to match expressions of the form E#... = ...[...]
regex_pattern = r"Plan:\s*(.+)\s*(#E\d+)\s*=\s*(\w+)\s*\[([^\]]+)\]"
prompt_template = ChatPromptTemplate.from_messages([("user", prompt)])
planner = prompt_template | llm


def get_plan(state: ReWOO):
    task = state["task"]
    result = planner.invoke({"task": task})
    # Find all matches in the sample text
    matches = re.findall(regex_pattern, result)
    return {"steps": matches, "plan_string": result}

### Executor

In [10]:
from langchain_tavily import TavilySearch

search_config = {
    "api_key": os.environ["TAVILY_API_KEY"],
    "max_results": 10,
    "search_depth": "advanced",
    "include_raw_content": True
}

search = TavilySearch(**search_config)

In [11]:
def _get_current_task(state: ReWOO):
    if "results" not in state or state["results"] is None:
        return 1
    if len(state["results"]) == len(state["steps"]):
        return None
    else:
        return len(state["results"]) + 1


def tool_execution(state: ReWOO):
    """Worker node that executes the tools of a given plan."""
    _step = _get_current_task(state)
    _, step_name, tool, tool_input = state["steps"][_step - 1]
    _results = (state["results"] or {}) if "results" in state else {}
    for k, v in _results.items():
        tool_input = tool_input.replace(k, v)
    if tool == "Google":
        result = search.invoke(tool_input)
    elif tool == "LLM":
        result = llm.invoke(tool_input)
    else:
        raise ValueError
    _results[step_name] = str(result)
    return {"results": _results}

### Solver

In [12]:
solve_prompt = """Solve the following task or problem. To solve the problem, we have made step-by-step Plan and \
retrieved corresponding Evidence to each Plan. Use them with caution since long evidence might \
contain irrelevant information.

{plan}

Now solve the question or task according to provided Evidence above. Respond with the answer
directly with no extra words.

Task: {task}
Response:"""


def solve(state: ReWOO):
    plan = ""
    for _plan, step_name, tool, tool_input in state["steps"]:
        _results = (state["results"] or {}) if "results" in state else {}
        for k, v in _results.items():
            tool_input = tool_input.replace(k, v)
            step_name = step_name.replace(k, v)
        plan += f"Plan: {_plan}\n{step_name} = {tool}[{tool_input}]"
    prompt = solve_prompt.format(plan=plan, task=state["task"])
    result = llm.invoke(prompt)
    return {"result": result}

### Define Graph

In [13]:
def _route(state):
    _step = _get_current_task(state)
    if _step is None:
        # We have executed all tasks
        return "solve"
    else:
        # We are still executing tasks, loop back to the "tool" node
        return "tool"

In [14]:
from langgraph.graph import END, StateGraph, START

graph = StateGraph(ReWOO)
graph.add_node("plan", get_plan)
graph.add_node("tool", tool_execution)
graph.add_node("solve", solve)
graph.add_edge("plan", "tool")
graph.add_edge("solve", END)
graph.add_conditional_edges("tool", _route)
graph.add_edge(START, "plan")

app = graph.compile()

In [15]:
from typing import Dict, Any

def format_output(state: Dict[str, Any]) -> str:
    """Format the CTI agent output for better readability."""
    output = []

    for node_name, node_data in state.items():
        output.append(f"\nüîπ **{node_name.upper()}**")
        output.append("=" * 50)

        if node_name == "plan":
            if "plan_string" in node_data:
                output.append("üìã **Generated Plan:**")
                output.append(node_data["plan_string"])

            if "steps" in node_data and node_data["steps"]:
                output.append("\nüìù **Extracted Steps:**")
                for i, (plan, step_name, tool, tool_input) in enumerate(node_data["steps"], 1):
                    output.append(f"  {i}. {plan}")
                    output.append(f"     üîß {step_name} = {tool}[{tool_input}]")

        elif node_name == "tool":
            if "results" in node_data:
                output.append("üîç **Execution Results:**")
                for step_name, result in node_data["results"].items():
                    output.append(f"  {step_name}:")
                    # Truncate long results for readability
                    result_str = str(result)
                    if len(result_str) > 500:
                        result_str = result_str[:500] + "... [truncated]"
                    output.append(f"    {result_str}")

        elif node_name == "solve":
            if "result" in node_data:
                output.append("‚úÖ **Final Answer:**")
                output.append(node_data["result"])

        output.append("")

    return "\n".join(output)


In [16]:
print("**CTI Agent Execution**")
print("=" * 60)

for s in app.stream({"task": task}):
    formatted_output = format_output(s)
    print(formatted_output)
    print("-" * 60)

**CTI Agent Execution**

üîπ **PLAN**
üìã **Generated Plan:**
Plan: Search for the latest CTI reports that specifically mention ATPs and the MITRE ATT&CK technique T1566.002 (Spearphishing Links). I will use keywords to narrow down the search to recent publications.
#E1 = Google[latest CTI reports ATP T1566.002 "Spearphishing Links" 2023 2024]
Plan: Review the search results from #E1 to identify specific CTI reports from reputable sources (e.g., major cybersecurity vendors, government agencies) that discuss ATPs utilizing spearphishing links. Synthesize the key findings, including the names of ATPs and the context of their T1566.002 usage.
#E2 = LLM[Based on the search results in #E1, identify and summarize the latest CTI reports that detail ATPs using T1566.002: Spearphishing Links. Include the names of the ATPs and a brief description of their activities related to this technique.]

üìù **Extracted Steps:**
  1. Search for the latest CTI reports that specifically mention ATPs and 