OSINT Methodology

#writeup #infosec #osint #methodology #reference #published

Introduction

If you've read my Rhysida writeup, you know I ended up inside a ransomware gang's command-and-control server. But what I glossed over was just how much OSINT work happened between "I have an IP address" and "I'm on a call with the FBI." The truth is, most threat research isn't elegant zero-day exploitation or Hollywood-style hacking—it's staring at WHOIS records at 2AM and wondering if you're chasing ghosts.

I've been doing some flavor of OSINT for as long as I can remember. Before it had a fancy acronym, we just called it "being nosy on the internet." These days I focus that nosiness on threat actors, infrastructure, and the occasional mystery server beaconing suspicious traffic across our client networks. This guide is everything I wish someone had handed me years ago.

Fair warning: this is about hunting threat actors, not people. If you're looking for a guide on stalking your ex, close this tab and seek therapy.

The OSINT Mindset

Before we get into the tools, let's talk philosophy. The best OSINT practitioners I've met share a few traits that have nothing to do with technical skill:

Start with what you know, expand outward. Every investigation begins with a single data point—an IP, a username, an email address. Your job is to pivot from that into connected data points until you've built a picture. It's less "finding a needle in a haystack" and more "following a thread through a labyrinth." Sometimes that thread leads somewhere interesting. Sometimes it ends in a wall. Both are valid outcomes.

Document everything. I cannot stress this enough. You will absolutely thank yourself later when you need to explain to a Federal Agent how you ended up where you ended up. I keep running notes in Obsidian during every investigation, timestamped, with screenshots. It feels tedious until the third time you need to reference something from two weeks ago and it's right there waiting for you.

Verify, verify, verify. False positives are the enemy. Finding a username that matches your target on some obscure forum feels great until you realize it's a completely different person in a different country. Attribution is hard. Cross-reference everything. When I identified victims in the Rhysida investigation, I didn't just trust the IP geolocation—I ran WHOIS, checked the ASN, found the organization's public information, and then made contact.

OPSEC isn't optional. When you're poking around threat actor infrastructure, you need to assume they might notice. Use VMs, use VPNs (reputable ones), and consider sock puppet accounts for any social engineering. I do all my threat research from isolated VMs that get torched regularly. Yes, this is paranoid. Yes, I've seen researchers get burned by not being paranoid enough.

The Pivot Chain

The concept of pivoting is the backbone of OSINT. You take one piece of information and use it to discover related information, building a chain of connections. Let me walk through a few common patterns:

Email → Everything

suspicious@gmail.com
    ↓
[Holehe] → Accounts registered: Twitter, Instagram, GitHub, Discord
    ↓
[GHunt] → Google account details, YouTube channel, Drive activity
    ↓
[osint.industries] → Full profile with timeline, linked phone numbers
    ↓
Username "sk3ptic_h4ck" found across platforms
    ↓
[Blackbird/Maigret] → Same username on hacking forums, paste sites
    ↓
Forum posts reference "my server at home"
    ↓
Correlated with infrastructure from original investigation

Username → Identity

"darkn3t_0ps" seen in C2 panel
    ↓
[Maigret] → Username found on 47 sites
    ↓
Steam profile with linked Discord
    ↓
Discord server with other members using similar handles
    ↓
Cross-reference with breach data
    ↓
Email address recovered from 2019 breach
    ↓
[Epieos] → Phone number associated with email
    ↓
Continue pivoting...

The key insight here is that threat actors are human, and humans are lazy. They reuse handles, they forget to compartmentalize, they have personal accounts that touch operational accounts. Every OPSEC failure is an opportunity.

Tool Arsenal

Now for the fun part. These are the tools I actually use, not just ones that look good on a GitHub awesome-list. I'll give you the honest rundown on each.

Username & Email Investigation

Tool	Purpose	Notes
Blackbird	Username search across 600+ sites	WhatsMyName integration, async requests, exports to PDF/JSON. My go-to for initial username enumeration.
Maigret	Username search across 3000+ sites	Fork of Sherlock on steroids. Has a web interface and Telegram bot if you're into that. More comprehensive than Blackbird but slower.
Holehe	Email account checker	Uses password reset functions to identify registered accounts. Covers 120+ sites. Stealthy—doesn't trigger login alerts.
Epieos	Reverse email/phone lookup	Checks 140+ services, has Maltego integration. Good for connecting emails to phone numbers and vice versa.
osint.industries	Comprehensive lookup platform	1500+ sources with timeline visualization. Trusted by law enforcement. Not free, but worth it for serious work.

Blackbird in Action

$ python blackbird.py -u "threat_actor_handle"

[*] Searching 634 sites for username: threat_actor_handle

[+] GitHub: https://github.com/threat_actor_handle
[+] Twitter: https://twitter.com/threat_actor_handle
[+] Keybase: https://keybase.io/threat_actor_handle
[+] HackTheBox: https://app.hackthebox.com/users/threat_actor_handle
[+] Telegram: https://t.me/threat_actor_handle

[*] Search complete. Found 5 accounts.
[*] Report saved to: threat_actor_handle_report.pdf

Blackbird is fast and hits the sites that matter. I run it first on any username I'm investigating. The PDF reports are particularly useful when you need to share findings with others.

Maigret In Action

$ maigret threat_actor_handle --all-sites --folderoutput ./investigation/

[*] Checking 3127 sites...

[+] VK: https://vk.com/threat_actor_handle (confidence: high)
[+] GitHub: https://github.com/threat_actor_handle (confidence: high)
[+] RuTracker: https://rutracker.org/forum/profile.php?mode=viewprofile&u=threat_actor_handle (confidence: medium)
[+] XSS.is: https://xss.is/members/?username=threat_actor_handle (confidence: medium)
[+] Exploit.in: https://exploit.in/members/?username=threat_actor_handle (confidence: medium)

[*] Found 47 accounts across sites.
[*] HTML report generated: ./investigation/report.html

Maigret casts a wider net and often catches things Blackbird misses—especially on Russian and Eastern European platforms, which is relevant when you're hunting certain threat actor populations. The confidence scores help filter noise.

Holehe for Email Enumeration

$ holehe suspicious.email@gmail.com

[+] Twitter: Account exists
[+] Instagram: Account exists
[+] Discord: Account exists
[+] Spotify: Account exists
[+] GitHub: Account exists
[-] Facebook: No account
[-] LinkedIn: No account
[+] Adobe: Account exists
[+] Duolingo: Account exists

[*] 7 accounts found for suspicious.email@gmail.com

The beautiful thing about Holehe is that it's passive. It uses password reset functionality to check if accounts exist without actually triggering any "someone tried to log in" notifications. Stealthy enumeration is best enumeration.

Google-Specific Tools

Tool	Purpose	Notes
GHunt	Google account investigation	Gmail → YouTube, Drive activity, Google Reviews, device info. Incredible depth.

GHunt In Action

If your target is using Gmail, GHunt is absurdly powerful:

$ ghunt email suspicious.threat.actor@gmail.com

[+] Email: suspicious.threat.actor@gmail.com
[+] Google ID: 1234567890123456789
[+] Last Profile Edit: 2024-11-15

[YouTube]
[+] Channel: https://youtube.com/channel/UC...
[+] Subscriptions: Public (47 channels)
[+] Liked Videos: Public

[Google Maps]
[+] Reviews: 23 reviews found
[+] Locations reviewed: Phoenix AZ, Tempe AZ, Moscow RU

[Google Calendar]
[+] Calendar found: "Work Schedule" (public)

[Device Info]
[+] Pixel 7 Pro - Last active 2 hours ago
[+] Windows 11 - Last active 3 days ago

The location data from Google Reviews alone has helped me narrow down threat actor geography more than once. People review their local coffee shops, gyms, and restaurants without thinking about what that reveals.

Specialized & Niche Tools

Tool	Purpose	Notes
Cupidcr4wl	Dating/adult site searches	Username and phone searches on adult platforms. Niche, but occasionally invaluable.

I'm not going to pretend this one isn't awkward to explain, but threat actors are people with personal lives. I've seen cases where OpSec-conscious criminals who compartmentalized everything... except their dating profiles. The username they'd never use anywhere else? Also their Tinder handle. Humans gonna human.

Automation & Scale

Tool	Purpose	Notes
SpiderFoot	OSINT automation	200+ modules, passive/active scanning, correlation rules. Essential for bulk work.

SpiderFoot for Infrastructure Mapping

When you're investigating threat actor infrastructure rather than individuals, SpiderFoot is your best friend:

$ spiderfoot -s 66.85.173.11 -m all -o csv

[*] Starting scan of 66.85.173.11
[*] Loading 200+ modules...

[DNS_HOST] → patterson.pureskin.cloud resolves to 66.85.173.11
[NETBLOCK_OWNER] → Cloudie Limited (AS398101)
[GEOIP] → Tempe, Arizona, United States
[SSL_CERT] → CN=patterson.pureskin.cloud, issued 2024-12-15
[SHODAN] → Open ports: 22, 80, 4001, 4321
[VIRUSTOTAL] → 3/90 engines flagged as malicious
[LINKED_DOMAINS] → 12 other domains on same IP
[HISTORICAL_WHOIS] → Previously registered to different entity in 2023

[*] Scan complete. 847 data points collected.
[*] Results exported to: investigation_66.85.173.11.csv

During the Rhysida investigation, I used SpiderFoot to map related infrastructure after identifying the initial C2. It found related domains, historical DNS data, and connections I would have missed manually.

Threat Actor Research Workflows

Let me walk through some abstract workflows based on common starting points. These aren't hypothetical—they're patterns I've actually used.

Starting from an IOC (IP/Domain)

When you've got infrastructure—maybe from a SIEM alert, firewall logs, or malware analysis—the workflow looks like this:

Initial Enumeration
- WHOIS lookup for registration details
- Passive DNS for historical resolutions
- Shodan/Censys for exposed services
- SSL certificate transparency logs
Infrastructure Mapping
- SpiderFoot scan for related hosts
- Reverse DNS for shared infrastructure
- ASN analysis for patterns
Attribution Pivot
- WHOIS registrant email → Holehe/Epieos
- DNS admin email → Cross-reference with breach data
- Certificate subjects → Related domains

This is exactly how the Rhysida investigation progressed. A single IP from firewall logs led to a domain, which led to exposed directories, which led to a C2 panel, which led to victim identification.

Starting from a Username/Handle

When you've spotted a handle in a C2 panel, forum post, or malware sample:

Account Enumeration
- Blackbird for quick hits
- Maigret for deep enumeration
- Manual checking of relevant hacking forums (XSS.is, Exploit.in, etc.)
Profile Mining
- Historical posts for technical details
- Personal information leaks (timezone references, language, location mentions)
- Linked accounts or alternate handles mentioned
Email Recovery
- Password reset flows on identified accounts
- Breach data correlation
- Forum registration email (sometimes leaked)
Infrastructure Correlation
- Domains/IPs mentioned in posts
- Code repositories with commits
- Personal projects that reveal more

Starting from an Email Address

This is often the most productive starting point:

suspicious.actor@gmail.com
         ↓
    [Holehe] ─────────────────────────→ Account list
         ↓
    [GHunt] ──────────────────────────→ Google ecosystem data
         ↓
    [Epieos] ─────────────────────────→ Linked phone number
         ↓
    [osint.industries] ───────────────→ Full profile timeline
         ↓
    [Breach databases] ───────────────→ Passwords, additional emails
         ↓
    Password analysis ────────────────→ Patterns reveal identity

Password patterns from breach data are underrated. People who use fluffy2019! as their password often have a cat named Fluffy and created the account in 2019. That's two more data points to pivot from.

Threat Actor Specific Considerations

Hunting threat actors is different from general OSINT work. A few things I've learned:

Infrastructure Patterns

Threat actors need infrastructure: C2 servers, phishing domains, credential harvesting pages. This infrastructure has patterns:

Hosting preferences: Many groups favor specific ASNs or hosting providers known for loose abuse policies
Domain naming: Random generation vs. typosquatting vs. compromised legitimate domains
SSL certificates: Let's Encrypt certificates issued in bulk, self-signed certs, or stolen/purchased certs
Port usage: Non-standard ports for C2 (like port 4001 in the Rhysida case)

When you identify one piece of infrastructure, search for these patterns to find related hosts.

OPSEC Failures They Make

Even sophisticated threat actors slip up:

Handle reuse: The same username on a hacking forum and a gaming site
Timezone leaks: Forum post timestamps, last-active indicators, scheduled task timing in malware
Language artifacts: Cyrillic comments in code, Russian-language error messages, non-English keyboard layouts
Exposed directories: Unsecured admin panels, directory listings, backup files left in webroot (this is literally how I got into Rhysida's C2)
Version control: Git repos with full commit history including author emails
Personal touches: Custom tools with embedded strings, metadata in documents, watermarks

Engage vs. Observe

This is a judgment call that depends heavily on context. During the Rhysida investigation, I faced this decision multiple times:

Observe when:

You're gathering intelligence for law enforcement
The threat actor doesn't know they're compromised
You're trying to identify victims who can still be saved
Attribution is more valuable than disruption

Engage (or hand off) when:

Active attacks are in progress against identifiable victims
You have enough information for law enforcement action
Continued observation risks burning your access
The ethical calculus favors immediate action

I chose to monitor Rhysida's C2 while notifying victims, then handed everything to the FBI. If I'd reported the C2 to the hosting provider immediately, it would have gone dark and the attackers would have just spun up new infrastructure. The delay allowed us to identify and warn 14+ potential victims.

Working with Law Enforcement

If your investigation yields actionable intelligence:

Document everything with timestamps
Preserve evidence in forensically sound ways (hashes, chain of custody)
Know who to contact: FBI's IC3, your local field office, or relevant CERT teams
Be prepared to explain how you obtained access (legally, I hope)
Don't expect immediate action: These things take time

The FBI agents I worked with were professional and grateful for the intel. Your mileage may vary, but in my experience, coming to them with well-documented, legally-obtained information is always appreciated.

Red Flags & False Positives

Attribution is hard. Here's where people get it wrong:

Username Collision

"darkn3t_ops" is not a unique string. I've seen researchers confidently attribute activity to the wrong person because they found a matching username without verifying it was actually the same individual. Always look for:

Account age consistency
Overlapping activity patterns
Multiple corroborating data points
Writing style and technical knowledge level

The Confirmation Bias Trap

When you're deep in an investigation, you want it to lead somewhere. This can make you see connections that aren't there. I combat this by:

Playing devil's advocate against my own conclusions
Asking colleagues to poke holes in my attribution
Documenting alternative explanations
Sleeping on it before finalizing conclusions

Verification Techniques

Before you attribute, verify:

Cross-reference across multiple tools (if Blackbird and Maigret both show the account, it's more likely real)
Check account age and post history
Look for independent confirmation (same email in multiple breach databases)
Consider honeypots and sockpuppets (threat actors read OSINT guides too)

Legal & Ethical Considerations

I am not a lawyer. This is not legal advice. But here's what I've learned:

Public vs. Private Data

OSINT tools rely on publicly accessible information. "Publicly accessible" is doing a lot of work in that sentence:

Clearly public: Information posted openly on social media, forum posts, WHOIS records
Technically public: Data indexed by search engines, exposed directories, unsecured APIs
Questionably public: Breach data, leaked databases, scraped private profiles

I generally stick to the first two categories. Breach data is useful for correlation but legally murky—I treat it as "confirming information I've found through other means" rather than a starting point.

Active vs. Passive Reconnaissance

There's a spectrum:

Passive: Reading public posts, querying search engines, checking DNS records
Active: Port scanning, directory enumeration, attempting authentication
Unauthorized access: Actually logging into systems you shouldn't (even if the password was sitting in an open directory)

That last one is where I had an interesting ethical moment during the Rhysida investigation. The password to their C2 panel was literally exposed in an unsecured directory. I used it. Was that unauthorized access? Technically, probably. Was it justified given the circumstances? I thought so. Would I do it again? Absolutely. Your ethics may differ.

Documentation for Handoff

If you're doing legitimate threat research with the intent to share with law enforcement:

Keep detailed, timestamped notes
Screenshot everything with timestamps
Hash files to prove they haven't been modified
Document your methodology (how you found what you found)
Keep it factual and avoid speculation in your notes

The packet I sent to the FBI included my full notes, screenshots, network captures, malware samples, and a timeline of my investigation. Overkill? Maybe. But when you're explaining to a Federal Agent how you ended up inside a ransomware gang's server, "overkill" starts to feel like "appropriate."

Responsible Disclosure

When you find something—a vulnerable target, exposed infrastructure, potential victim—you have decisions to make:

Notify the victim directly if possible (this is what I did with Rhysida victims)
Report to relevant authorities if criminal activity is involved
Consider the timeline: Immediate action vs. gathering more intelligence
Document everything regardless of which path you choose

There's no universal right answer. Context matters. But doing nothing when you could prevent harm is, in my opinion, also a choice.

Wrapping Up

OSINT for threat research is part art, part science, and part stubborn refusal to let go of a thread. The tools I've covered here are my current toolkit, but the landscape changes constantly. New platforms emerge, old tools break, threat actors adapt.

What doesn't change is the methodology: start with what you know, pivot to what's connected, verify everything twice, and document obsessively. The best OSINT practitioners I know aren't necessarily the most technical—they're the most patient, the most curious, and the most willing to follow a lead even when it seems like a dead end.

If you're just getting started, pick a tool, pick a target (a CTF challenge, a public investigation, your own digital footprint), and start pulling threads. The only way to get good at this is to do it.

And if you ever find yourself staring at a ransomware gang's C2 panel at 2AM wondering how your life led to this moment—well, welcome to threat research. It only gets weirder from here.

Hunting the Giant Centipede - Rhysida - OSINT in action during threat research
Search Tools - Comprehensive tool reference
Ransomware Leak Sites - Threat actor intelligence gathering