Threat Hunting with YARA: Writing Rules That Catch Malware

Threat hunting is the proactive practice of searching through your environment for threats that your automated defenses have not caught. Antivirus uses signatures of known malware. Threat hunting looks for behaviors, patterns, and anomalies that indicate compromise even when no signature exists. YARA is the most widely used language for writing malware detection rules — used by VirusTotal, malware analysts worldwide, and every major security vendor. This guide shows you how to write rules from scratch.

What is YARA and Why It Matters

YARA is a pattern-matching tool designed for malware researchers. A YARA rule describes a file (or memory region) using a combination of strings, byte patterns, and logical conditions. If a file matches the rule, YARA reports a hit. This sounds simple — and it is — but well-crafted YARA rules can detect an entire malware family across thousands of samples, including new variants that antivirus has never seen.

Your First YARA Rule

# Install YARA
sudo apt-get install yara
# Or from source for latest version:
pip install yara-python

# Basic rule structure:
rule MyFirstRule {
    meta:
        author = "PlainlySec"
        description = "Detects files containing a suspicious string"
        date = "2026-05-01"
    
    strings:
        $suspicious = "cmd.exe /c whoami"
        $also_bad = "net user hacker P@ssw0rd /add"
    
    condition:
        any of them
}

# Run against a file
yara my_rule.yar suspicious_file.exe

# Run against an entire directory
yara -r my_rule.yar /path/to/scan/

# Run against memory (requires elevated privileges)
yara my_rule.yar -p <PID>

Writing Effective YARA Rules from a Malware Sample

The workflow for creating a good YARA rule starts with malware analysis — you need to identify characteristics that are unique to this malware family but unlikely to appear in legitimate software.

# Step 1: Extract interesting strings from the malware
strings -n 8 malware.exe | sort -u > malware_strings.txt

# Step 2: Identify unique/suspicious strings
# Good candidates: unusual registry paths, specific mutex names, 
# hardcoded C2 addresses, unique error messages, PDB paths
grep -E "HKCU\\|mutex|C2|beacon|implant" malware_strings.txt

# Step 3: Extract byte sequences (hex patterns)
# Use CyberChef or Python to get hex of key byte sequences:
python3 -c "with open('malware.exe','rb') as f: data=f.read(); print(data[0x100:0x120].hex())"

# Step 4: Build the rule
rule Emotet_Loader_2026 {
    meta:
        malware_family = "Emotet"
        threat_level = "High"
    
    strings:
        // Mutex name used by this Emotet variant
        $mutex = "Global\em_loader_2026"
        
        // Registry persistence key
        $reg_key = "Software\Microsoft\Windows\CurrentVersion\Run" wide
        
        // C2 communication pattern (hex bytes)
        $c2_pattern = { 48 83 EC 28 48 8B 05 ?? ?? ?? ?? 48 85 C0 74 }
        
        // PowerShell download cradle
        $ps_download = "IEX((New-Object Net.WebClient).DownloadString" nocase
    
    condition:
        uint16(0) == 0x5A4D  // PE file (MZ header)
        and filesize < 5MB
        and ($mutex or $reg_key)
        and ($c2_pattern or $ps_download)
}

Advanced YARA Conditions

# PE module: analyze PE structure directly
import "pe"

rule Suspicious_PE {
    condition:
        pe.is_pe and
        pe.number_of_sections > 8 and  // Unusually many sections
        pe.imphash() == "fcab201c2e41893bd1b2bf1f47fb5d1f" and  // Import hash
        not pe.is_signed()  // Unsigned PE
}

# Math module: detect encryption/packing by entropy
import "math"

rule High_Entropy_Section {
    condition:
        for any i in (0..pe.number_of_sections-1):
            (math.entropy(pe.sections[i].raw_data_offset, 
                         pe.sections[i].raw_data_size) > 7.0)
}

# Time-based hunting: recently compiled malware
rule Recently_Compiled {
    condition:
        pe.timestamp > 1700000000 and  // After November 2023
        pe.timestamp < 1800000000
}

YARA for Threat Hunting in Practice

# Scan all running processes for malware in memory
yara -r rules/ -p $(pgrep -d, .)

# Scan with multiple rule files
yara -r /opt/yara_rules/*.yar /path/to/suspicious/files/

# Use with YARA-X (Rust rewrite, much faster)
pip install yara-x
yrx scan rules.yar target_directory/

# Integrate with Velociraptor for hunting across all endpoints
# VQL query to run YARA against all processes:
SELECT * FROM hunt_yara(rules=YaraRules, 
                        pids=PIDs,
                        context=true)

Free YARA Rule Collections

  • YARA-Rules (GitHub) — Community rules for hundreds of malware families
  • Neo23x0/signature-base — Florian Roth’s extensive rule collection, widely trusted
  • Elastic Security YARA rules — High-quality rules from Elastic’s threat research team
  • MalwareBytes YARA rules — Production rules used by MalwareBytes products
  • VirusTotal YARA retrohunt — Run your rules against VirusTotal’s database of 2+ billion files to find related samples

Threat Hunting Hypothesis Examples

Good threat hunting starts with a hypothesis. Examples of actionable hunting hypotheses with YARA:

  • “Find PE files in temp directories that have high entropy and no digital signature — possible packed malware dropped by loader”
  • “Find Office documents with embedded VBA macros that contain PowerShell download strings — common phishing payload delivery”
  • “Find PE files where the compile timestamp is in the future or before Windows XP release — timestamp tampering by malware authors”
  • “Find files claiming to be PDF but with PE magic bytes — dual-extension tricks used in phishing attacks”

YARA is one of those tools that rewards investment. Writing your first rule takes an hour. Writing your hundredth rule takes fifteen minutes. The skills compound, and a library of well-tested YARA rules becomes an increasingly powerful detection capability that complements, and often outperforms, commercial antivirus on novel threats.