OSINT Methodology
#writeup #infosec #osint #methodology #reference #published
Introduction
If you've read my Rhysida writeup, you know I ended up inside a ransomware gang's command-and-control server. But what I glossed over was just how much OSINT work happened between "I have an IP address" and "I'm on a call with the FBI." The truth is, most threat research isn't elegant zero-day exploitation or Hollywood-style hacking—it's staring at WHOIS records at 2AM and wondering if you're chasing ghosts.
I've been doing some flavor of OSINT for as long as I can remember. Before it had a fancy acronym, we just called it "being nosy on the internet." These days I focus that nosiness on threat actors, infrastructure, and the occasional mystery server beaconing suspicious traffic across our client networks. This guide is everything I wish someone had handed me years ago.
Fair warning: this is about hunting threat actors, not people. If you're looking for a guide on stalking your ex, close this tab and seek therapy.
The OSINT Mindset
Before we get into the tools, let's talk philosophy. The best OSINT practitioners I've met share a few traits that have nothing to do with technical skill:
Start with what you know, expand outward. Every investigation begins with a single data point—an IP, a username, an email address. Your job is to pivot from that into connected data points until you've built a picture. It's less "finding a needle in a haystack" and more "following a thread through a labyrinth." Sometimes that thread leads somewhere interesting. Sometimes it ends in a wall. Both are valid outcomes.
Document everything. I cannot stress this enough. You will absolutely thank yourself later when you need to explain to a Federal Agent how you ended up where you ended up. I keep running notes in Obsidian during every investigation, timestamped, with screenshots. It feels tedious until the third time you need to reference something from two weeks ago and it's right there waiting for you.
Verify, verify, verify. False positives are the enemy. Finding a username that matches your target on some obscure forum feels great until you realize it's a completely different person in a different country. Attribution is hard. Cross-reference everything. When I identified victims in the Rhysida investigation, I didn't just trust the IP geolocation—I ran WHOIS, checked the ASN, found the organization's public information, and then made contact.
OPSEC isn't optional. When you're poking around threat actor infrastructure, you need to assume they might notice. Use VMs, use VPNs (reputable ones), and consider sock puppet accounts for any social engineering. I do all my threat research from isolated VMs that get torched regularly. Yes, this is paranoid. Yes, I've seen researchers get burned by not being paranoid enough.
The Pivot Chain
The concept of pivoting is the backbone of OSINT. You take one piece of information and use it to discover related information, building a chain of connections. Let me walk through a few common patterns:
Email → Everything
suspicious@gmail.com
↓
[Holehe] → Accounts registered: Twitter, Instagram, GitHub, Discord
↓
[GHunt] → Google account details, YouTube channel, Drive activity
↓
[osint.industries] → Full profile with timeline, linked phone numbers
↓
Username "sk3ptic_h4ck" found across platforms
↓
[Blackbird/Maigret] → Same username on hacking forums, paste sites
↓
Forum posts reference "my server at home"
↓
Correlated with infrastructure from original investigation
Username → Identity
"darkn3t_0ps" seen in C2 panel
↓
[Maigret] → Username found on 47 sites
↓
Steam profile with linked Discord
↓
Discord server with other members using similar handles
↓
Cross-reference with breach data
↓
Email address recovered from 2019 breach
↓
[Epieos] → Phone number associated with email
↓
Continue pivoting...
The key insight here is that threat actors are human, and humans are lazy. They reuse handles, they forget to compartmentalize, they have personal accounts that touch operational accounts. Every OPSEC failure is an opportunity.
Tool Arsenal
Now for the fun part. These are the tools I actually use, not just ones that look good on a GitHub awesome-list. I'll give you the honest rundown on each.
Username & Email Investigation
| Tool | Purpose | Notes |
|---|---|---|
| Blackbird | Username search across 600+ sites | WhatsMyName integration, async requests, exports to PDF/JSON. My go-to for initial username enumeration. |
| Maigret | Username search across 3000+ sites | Fork of Sherlock on steroids. Has a web interface and Telegram bot if you're into that. More comprehensive than Blackbird but slower. |
| Holehe | Email account checker | Uses password reset functions to identify registered accounts. Covers 120+ sites. Stealthy—doesn't trigger login alerts. |
| Epieos | Reverse email/phone lookup | Checks 140+ services, has Maltego integration. Good for connecting emails to phone numbers and vice versa. |
| osint.industries | Comprehensive lookup platform | 1500+ sources with timeline visualization. Trusted by law enforcement. Not free, but worth it for serious work. |
Blackbird in Action
$ python blackbird.py -u "threat_actor_handle"
[*] Searching 634 sites for username: threat_actor_handle
[+] GitHub: https://github.com/threat_actor_handle
[+] Twitter: https://twitter.com/threat_actor_handle
[+] Keybase: https://keybase.io/threat_actor_handle
[+] HackTheBox: https://app.hackthebox.com/users/threat_actor_handle
[+] Telegram: https://t.me/threat_actor_handle
[*] Search complete. Found 5 accounts.
[*] Report saved to: threat_actor_handle_report.pdf
Blackbird is fast and hits the sites that matter. I run it first on any username I'm investigating. The PDF reports are particularly useful when you need to share findings with others.
Maigret In Action
$ maigret threat_actor_handle --all-sites --folderoutput ./investigation/
[*] Checking 3127 sites...
[+] VK: https://vk.com/threat_actor_handle (confidence: high)
[+] GitHub: https://github.com/threat_actor_handle (confidence: high)
[+] RuTracker: https://rutracker.org/forum/profile.php?mode=viewprofile&u=threat_actor_handle (confidence: medium)
[+] XSS.is: https://xss.is/members/?username=threat_actor_handle (confidence: medium)
[+] Exploit.in: https://exploit.in/members/?username=threat_actor_handle (confidence: medium)
[*] Found 47 accounts across sites.
[*] HTML report generated: ./investigation/report.html
Maigret casts a wider net and often catches things Blackbird misses—especially on Russian and Eastern European platforms, which is relevant when you're hunting certain threat actor populations. The confidence scores help filter noise.
Holehe for Email Enumeration
$ holehe suspicious.email@gmail.com
[+] Twitter: Account exists
[+] Instagram: Account exists
[+] Discord: Account exists
[+] Spotify: Account exists
[+] GitHub: Account exists
[-] Facebook: No account
[-] LinkedIn: No account
[+] Adobe: Account exists
[+] Duolingo: Account exists
[*] 7 accounts found for suspicious.email@gmail.com
The beautiful thing about Holehe is that it's passive. It uses password reset functionality to check if accounts exist without actually triggering any "someone tried to log in" notifications. Stealthy enumeration is best enumeration.
Google-Specific Tools
| Tool | Purpose | Notes |
|---|---|---|
| GHunt | Google account investigation | Gmail → YouTube, Drive activity, Google Reviews, device info. Incredible depth. |
GHunt In Action
If your target is using Gmail, GHunt is absurdly powerful:
$ ghunt email suspicious.threat.actor@gmail.com
[+] Email: suspicious.threat.actor@gmail.com
[+] Google ID: 1234567890123456789
[+] Last Profile Edit: 2024-11-15
[YouTube]
[+] Channel: https://youtube.com/channel/UC...
[+] Subscriptions: Public (47 channels)
[+] Liked Videos: Public
[Google Maps]
[+] Reviews: 23 reviews found
[+] Locations reviewed: Phoenix AZ, Tempe AZ, Moscow RU
[Google Calendar]
[+] Calendar found: "Work Schedule" (public)
[Device Info]
[+] Pixel 7 Pro - Last active 2 hours ago
[+] Windows 11 - Last active 3 days ago
The location data from Google Reviews alone has helped me narrow down threat actor geography more than once. People review their local coffee shops, gyms, and restaurants without thinking about what that reveals.
Specialized & Niche Tools
| Tool | Purpose | Notes |
|---|---|---|
| Cupidcr4wl | Dating/adult site searches | Username and phone searches on adult platforms. Niche, but occasionally invaluable. |
I'm not going to pretend this one isn't awkward to explain, but threat actors are people with personal lives. I've seen cases where OpSec-conscious criminals who compartmentalized everything... except their dating profiles. The username they'd never use anywhere else? Also their Tinder handle. Humans gonna human.
Automation & Scale
| Tool | Purpose | Notes |
|---|---|---|
| SpiderFoot | OSINT automation | 200+ modules, passive/active scanning, correlation rules. Essential for bulk work. |
SpiderFoot for Infrastructure Mapping
When you're investigating threat actor infrastructure rather than individuals, SpiderFoot is your best friend:
$ spiderfoot -s 66.85.173.11 -m all -o csv
[*] Starting scan of 66.85.173.11
[*] Loading 200+ modules...
[DNS_HOST] → patterson.pureskin.cloud resolves to 66.85.173.11
[NETBLOCK_OWNER] → Cloudie Limited (AS398101)
[GEOIP] → Tempe, Arizona, United States
[SSL_CERT] → CN=patterson.pureskin.cloud, issued 2024-12-15
[SHODAN] → Open ports: 22, 80, 4001, 4321
[VIRUSTOTAL] → 3/90 engines flagged as malicious
[LINKED_DOMAINS] → 12 other domains on same IP
[HISTORICAL_WHOIS] → Previously registered to different entity in 2023
[*] Scan complete. 847 data points collected.
[*] Results exported to: investigation_66.85.173.11.csv
During the Rhysida investigation, I used SpiderFoot to map related infrastructure after identifying the initial C2. It found related domains, historical DNS data, and connections I would have missed manually.
Threat Actor Research Workflows
Let me walk through some abstract workflows based on common starting points. These aren't hypothetical—they're patterns I've actually used.
Starting from an IOC (IP/Domain)
When you've got infrastructure—maybe from a SIEM alert, firewall logs, or malware analysis—the workflow looks like this:
-
Initial Enumeration
- WHOIS lookup for registration details
- Passive DNS for historical resolutions
- Shodan/Censys for exposed services
- SSL certificate transparency logs
-
Infrastructure Mapping
- SpiderFoot scan for related hosts
- Reverse DNS for shared infrastructure
- ASN analysis for patterns
-
Attribution Pivot
- WHOIS registrant email → Holehe/Epieos
- DNS admin email → Cross-reference with breach data
- Certificate subjects → Related domains
This is exactly how the Rhysida investigation progressed. A single IP from firewall logs led to a domain, which led to exposed directories, which led to a C2 panel, which led to victim identification.
Starting from a Username/Handle
When you've spotted a handle in a C2 panel, forum post, or malware sample:
-
Account Enumeration
- Blackbird for quick hits
- Maigret for deep enumeration
- Manual checking of relevant hacking forums (XSS.is, Exploit.in, etc.)
-
Profile Mining
- Historical posts for technical details
- Personal information leaks (timezone references, language, location mentions)
- Linked accounts or alternate handles mentioned
-
Email Recovery
- Password reset flows on identified accounts
- Breach data correlation
- Forum registration email (sometimes leaked)
-
Infrastructure Correlation
- Domains/IPs mentioned in posts
- Code repositories with commits
- Personal projects that reveal more
Starting from an Email Address
This is often the most productive starting point:
suspicious.actor@gmail.com
↓
[Holehe] ─────────────────────────→ Account list
↓
[GHunt] ──────────────────────────→ Google ecosystem data
↓
[Epieos] ─────────────────────────→ Linked phone number
↓
[osint.industries] ───────────────→ Full profile timeline
↓
[Breach databases] ───────────────→ Passwords, additional emails
↓
Password analysis ────────────────→ Patterns reveal identity
Password patterns from breach data are underrated. People who use fluffy2019! as their password often have a cat named Fluffy and created the account in 2019. That's two more data points to pivot from.
Threat Actor Specific Considerations
Hunting threat actors is different from general OSINT work. A few things I've learned:
Infrastructure Patterns
Threat actors need infrastructure: C2 servers, phishing domains, credential harvesting pages. This infrastructure has patterns:
- Hosting preferences: Many groups favor specific ASNs or hosting providers known for loose abuse policies
- Domain naming: Random generation vs. typosquatting vs. compromised legitimate domains
- SSL certificates: Let's Encrypt certificates issued in bulk, self-signed certs, or stolen/purchased certs
- Port usage: Non-standard ports for C2 (like port 4001 in the Rhysida case)
When you identify one piece of infrastructure, search for these patterns to find related hosts.
OPSEC Failures They Make
Even sophisticated threat actors slip up:
- Handle reuse: The same username on a hacking forum and a gaming site
- Timezone leaks: Forum post timestamps, last-active indicators, scheduled task timing in malware
- Language artifacts: Cyrillic comments in code, Russian-language error messages, non-English keyboard layouts
- Exposed directories: Unsecured admin panels, directory listings, backup files left in webroot (this is literally how I got into Rhysida's C2)
- Version control: Git repos with full commit history including author emails
- Personal touches: Custom tools with embedded strings, metadata in documents, watermarks
Engage vs. Observe
This is a judgment call that depends heavily on context. During the Rhysida investigation, I faced this decision multiple times:
Observe when:
- You're gathering intelligence for law enforcement
- The threat actor doesn't know they're compromised
- You're trying to identify victims who can still be saved
- Attribution is more valuable than disruption
Engage (or hand off) when:
- Active attacks are in progress against identifiable victims
- You have enough information for law enforcement action
- Continued observation risks burning your access
- The ethical calculus favors immediate action
I chose to monitor Rhysida's C2 while notifying victims, then handed everything to the FBI. If I'd reported the C2 to the hosting provider immediately, it would have gone dark and the attackers would have just spun up new infrastructure. The delay allowed us to identify and warn 14+ potential victims.
Working with Law Enforcement
If your investigation yields actionable intelligence:
- Document everything with timestamps
- Preserve evidence in forensically sound ways (hashes, chain of custody)
- Know who to contact: FBI's IC3, your local field office, or relevant CERT teams
- Be prepared to explain how you obtained access (legally, I hope)
- Don't expect immediate action: These things take time
The FBI agents I worked with were professional and grateful for the intel. Your mileage may vary, but in my experience, coming to them with well-documented, legally-obtained information is always appreciated.
Red Flags & False Positives
Attribution is hard. Here's where people get it wrong:
Username Collision
"darkn3t_ops" is not a unique string. I've seen researchers confidently attribute activity to the wrong person because they found a matching username without verifying it was actually the same individual. Always look for:
- Account age consistency
- Overlapping activity patterns
- Multiple corroborating data points
- Writing style and technical knowledge level
The Confirmation Bias Trap
When you're deep in an investigation, you want it to lead somewhere. This can make you see connections that aren't there. I combat this by:
- Playing devil's advocate against my own conclusions
- Asking colleagues to poke holes in my attribution
- Documenting alternative explanations
- Sleeping on it before finalizing conclusions
Verification Techniques
Before you attribute, verify:
- Cross-reference across multiple tools (if Blackbird and Maigret both show the account, it's more likely real)
- Check account age and post history
- Look for independent confirmation (same email in multiple breach databases)
- Consider honeypots and sockpuppets (threat actors read OSINT guides too)
Legal & Ethical Considerations
I am not a lawyer. This is not legal advice. But here's what I've learned:
Public vs. Private Data
OSINT tools rely on publicly accessible information. "Publicly accessible" is doing a lot of work in that sentence:
- Clearly public: Information posted openly on social media, forum posts, WHOIS records
- Technically public: Data indexed by search engines, exposed directories, unsecured APIs
- Questionably public: Breach data, leaked databases, scraped private profiles
I generally stick to the first two categories. Breach data is useful for correlation but legally murky—I treat it as "confirming information I've found through other means" rather than a starting point.
Active vs. Passive Reconnaissance
There's a spectrum:
- Passive: Reading public posts, querying search engines, checking DNS records
- Active: Port scanning, directory enumeration, attempting authentication
- Unauthorized access: Actually logging into systems you shouldn't (even if the password was sitting in an open directory)
That last one is where I had an interesting ethical moment during the Rhysida investigation. The password to their C2 panel was literally exposed in an unsecured directory. I used it. Was that unauthorized access? Technically, probably. Was it justified given the circumstances? I thought so. Would I do it again? Absolutely. Your ethics may differ.
Documentation for Handoff
If you're doing legitimate threat research with the intent to share with law enforcement:
- Keep detailed, timestamped notes
- Screenshot everything with timestamps
- Hash files to prove they haven't been modified
- Document your methodology (how you found what you found)
- Keep it factual and avoid speculation in your notes
The packet I sent to the FBI included my full notes, screenshots, network captures, malware samples, and a timeline of my investigation. Overkill? Maybe. But when you're explaining to a Federal Agent how you ended up inside a ransomware gang's server, "overkill" starts to feel like "appropriate."
Responsible Disclosure
When you find something—a vulnerable target, exposed infrastructure, potential victim—you have decisions to make:
- Notify the victim directly if possible (this is what I did with Rhysida victims)
- Report to relevant authorities if criminal activity is involved
- Consider the timeline: Immediate action vs. gathering more intelligence
- Document everything regardless of which path you choose
There's no universal right answer. Context matters. But doing nothing when you could prevent harm is, in my opinion, also a choice.
Wrapping Up
OSINT for threat research is part art, part science, and part stubborn refusal to let go of a thread. The tools I've covered here are my current toolkit, but the landscape changes constantly. New platforms emerge, old tools break, threat actors adapt.
What doesn't change is the methodology: start with what you know, pivot to what's connected, verify everything twice, and document obsessively. The best OSINT practitioners I know aren't necessarily the most technical—they're the most patient, the most curious, and the most willing to follow a lead even when it seems like a dead end.
If you're just getting started, pick a tool, pick a target (a CTF challenge, a public investigation, your own digital footprint), and start pulling threads. The only way to get good at this is to do it.
And if you ever find yourself staring at a ransomware gang's C2 panel at 2AM wondering how your life led to this moment—well, welcome to threat research. It only gets weirder from here.
Related
- Hunting the Giant Centipede - Rhysida - OSINT in action during threat research
- Search Tools - Comprehensive tool reference
- Ransomware Leak Sites - Threat actor intelligence gathering