Understanding VirusTotal: A Comprehensive Guide

Introduction

In today’s hyper‑connected world, the sheer volume of files, URLs, and executables that cross network perimeters makes it impossible for any single organization to maintain an exhaustive, up‑to‑date signature database. Threat actors constantly mutate their payloads, and new malicious artifacts appear every few minutes. VirusTotal (VT) has emerged as a de‑facto community‑driven hub for aggregating the results of dozens of antivirus engines, URL scanners, and sandboxes into a single, searchable platform.

This guide dives deep into everything you need to know about VirusTotal:

What the service does and how it fits into a modern security stack
The public web interface versus the paid API tiers
Practical, real‑world use cases for analysts, SOC engineers, and developers
Step‑by‑step code examples (Python, Bash, PowerShell) for automating submissions and parsing results
Limitations, privacy considerations, and best‑practice recommendations

By the end of this article you’ll be equipped to integrate VirusTotal into your daily workflow, design automated pipelines for bulk analysis, and understand the nuances that separate a superficial scan from a thorough threat‑intelligence operation.

1. What Is VirusTotal?

VirusTotal is a free online service owned by Google (via Chronicle) that accepts files, URLs, IP addresses, and domain names for analysis. Once submitted, the platform forwards the artifact to a large pool of anti‑malware engines, URL scanners, sandbox environments, and reputation services. The results are then displayed in a unified dashboard, complete with:

Result Type	Typical Providers
Antivirus signatures	Kaspersky, Bitdefender, McAfee, Sophos, etc.
Static hash look‑ups	NSRL, VirusShare
Dynamic sandbox execution	Cuckoo, FireEye, Any.Run
URL reputation	Google Safe Browsing, PhishTank
Reputation scores	Web of Trust (WoT), Cisco Talos
Community comments	User‑generated tags and observations

Because VirusTotal aggregates multiple independent viewpoints, a single detection can be corroborated (or refuted) across many engines, reducing false positives and increasing confidence.

1.1 Public vs. Private (API) Access

Feature	Public Web UI	Paid API (Professional/Enterprise)
Daily submission limit	4 MB per file, 32 MB total per day per IP	Up to 200 MB per file, higher quota per API key
Automation	Manual uploads only	Programmatic access via REST endpoints
Advanced reports	Basic JSON view	Full JSON, PCAP, and sandbox video
Private scanning	No (all submissions are public)	Option for private, non‑shared analysis
Integration	Browser extensions, CLI wrappers	SDKs, SIEM connectors, custom scripts

The public interface is ideal for ad‑hoc analysis, learning, or low‑volume research. For organizations that need high throughput, private sharing, or historical data, the paid API tiers provide the necessary elasticity.

2. Core Concepts and Terminology

Before we jump into hands‑on examples, let’s clarify the key terms you’ll encounter when working with VirusTotal.

Term	Definition
Hash	Cryptographic fingerprint (MD5, SHA‑1, SHA‑256) used to uniquely identify a file.
Scan ID	A unique identifier returned after submitting an artifact; used to retrieve results later.
Analysis ID	The newer nomenclature for the same concept in v3 of the API.
Verdict	The overall classification (malicious, suspicious, clean, undetected) derived from engine votes.
Community score	A rating contributed by VirusTotal users based on their own investigations.
Behaviors	Observed actions during sandbox execution (e.g., file writes, network connections).
YARA rules	Pattern‑matching rules that can be applied to VT’s detections via the “YARA” endpoint (Enterprise).

Understanding these concepts is essential for interpreting the data returned by the platform, especially when building automated pipelines that need to make decisions based on the verdict.

3. Real‑World Use Cases

3.1 Incident Response (IR)

When a suspicious file lands on an endpoint, IR teams often:

Submit the file to VirusTotal.
Correlate the VT verdict with internal IOC (Indicator of Compromise) feeds.
Extract sandbox behavior (e.g., registry changes) to inform containment actions.

Note: Because public submissions are visible to everyone, many IR teams use the private (Enterprise) API to keep malicious samples confidential.

3.2 Threat Hunting

Threat hunters can leverage VT’s massive historical dataset to:

Search for known malicious hashes that have appeared in past attacks.
Identify domains that have been flagged across multiple engines.
Use VT’s “Live Hunting” feature (Enterprise) to run YARA rules against the entire corpus in near‑real time.

3.3 Malware Research & Reverse Engineering

Researchers often:

Pull sandbox execution videos to see how a payload behaves without setting up a local environment.
Retrieve PE metadata (imports, exports) that VT extracts automatically.
Compare static signatures across multiple antivirus vendors to understand detection gaps.

3.4 DevSecOps & CI/CD Pipelines

In modern DevSecOps, it’s common to embed a VT scan into a CI job:

# Example GitHub Actions step
- name: Scan artifact with VirusTotal
  uses: actions/checkout@v3
- run: |
    curl -s --request POST \
      --url https://www.virustotal.com/api/v3/files \
      --header "x-apikey: ${{ secrets.VT_API_KEY }}" \
      --form file=@path/to/artifact.exe \
      -o vt_response.json
  env:
    VT_API_KEY: ${{ secrets.VT_API_KEY }}

If the scan returns a malicious verdict, the pipeline can be configured to fail the build, preventing the artifact from being promoted to production.

4. Getting Started with the VirusTotal API (v3)

VirusTotal’s current API version (v3) is REST‑ful and returns JSON responses. Below we walk through the basic workflow:

Obtain an API key – Sign up at virustotal.com, then navigate to “My API key”.
Upload a file – POST to /files.
Poll for results – GET /analyses/{analysis_id} until the status is completed.
Parse the verdict – Examine malicious, suspicious, harmless counts.

4.1 Prerequisites

Python 3.9+
requests library (pip install requests)
A valid VirusTotal API key (free tier: 4 requests/minute)

4.2 Python Example – Submitting and Retrieving a File Scan

import time
import hashlib
import requests
import json
from pathlib import Path

API_KEY = "YOUR_VT_API_KEY"
BASE_URL = "https://www.virustotal.com/api/v3"

def get_sha256(file_path: Path) -> str:
    """Calculate SHA‑256 hash for a local file."""
    h = hashlib.sha256()
    with file_path.open("rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

def upload_file(file_path: Path) -> str:
    """Upload a file to VirusTotal and return the analysis ID."""
    url = f"{BASE_URL}/files"
    headers = {"x-apikey": API_KEY}
    files = {"file": (file_path.name, file_path.open("rb"))}
    response = requests.post(url, headers=headers, files=files)
    response.raise_for_status()
    analysis_id = response.json()["data"]["id"]
    print(f"Uploaded {file_path.name}, analysis ID: {analysis_id}")
    return analysis_id

def poll_analysis(analysis_id: str, interval: int = 15) -> dict:
    """Poll the analysis endpoint until the scan is finished."""
    url = f"{BASE_URL}/analyses/{analysis_id}"
    headers = {"x-apikey": API_KEY}
    while True:
        resp = requests.get(url, headers=headers)
        resp.raise_for_status()
        result = resp.json()
        status = result["data"]["attributes"]["status"]
        if status == "completed":
            return result
        print(f"Analysis {analysis_id} still in progress... waiting {interval}s")
        time.sleep(interval)

def main():
    file_path = Path("sample.exe")
    sha256 = get_sha256(file_path)
    print(f"SHA‑256: {sha256}")

    # First, check if the hash is already known (avoids re‑upload)
    lookup_url = f"{BASE_URL}/files/{sha256}"
    lookup_resp = requests.get(lookup_url, headers={"x-apikey": API_KEY})
    if lookup_resp.status_code == 200:
        print("Hash already present in VT, fetching existing report...")
        report = lookup_resp.json()
    else:
        analysis_id = upload_file(file_path)
        report = poll_analysis(analysis_id)

    # Pretty‑print a summary
    attrs = report["data"]["attributes"]
    stats = attrs["last_analysis_stats"]
    print("\n=== Verdict Summary ===")
    print(json.dumps(stats, indent=2))

    # List engines that flagged the file as malicious
    malicious_engines = [
        engine for engine, result in attrs["last_analysis_results"].items()
        if result["category"] == "malicious"
    ]
    print(f"\nEngines reporting malicious: {len(malicious_engines)}")
    for eng in malicious_engines:
        print(f"- {eng}")

if __name__ == "__main__":
    main()

Explanation of key steps:

Hash pre‑check – Saves API quota by avoiding duplicate uploads.
Polling – VT processes files asynchronously; the script loops until status == "completed".
Result parsing – last_analysis_stats gives a quick tally; last_analysis_results provides per‑engine details.

4.3 Bash / cURL Example – Quick URL Scan

#!/usr/bin/env bash
API_KEY="YOUR_VT_API_KEY"
URL="http://malicious.example.com"

# Submit URL for scanning
response=$(curl -s -X POST "https://www.virustotal.com/api/v3/urls" \
  -H "x-apikey: $API_KEY" \
  --data "url=$URL")

analysis_id=$(echo "$response" | jq -r '.data.id')
echo "Submitted URL, analysis ID: $analysis_id"

# Poll until ready
while true; do
  result=$(curl -s -X GET "https://www.virustotal.com/api/v3/analyses/$analysis_id" \
    -H "x-apikey: $API_KEY")
  status=$(echo "$result" | jq -r '.data.attributes.status')
  if [[ "$status" == "completed" ]]; then
    echo "Analysis complete."
    echo "$result" | jq '.data.attributes.last_analysis_stats'
    break
  else
    echo "Waiting for analysis... (status=$status)"
    sleep 10
  fi
done

Tip: The jq utility is invaluable for extracting fields from JSON responses in shell scripts.

4.4 PowerShell Example – Scanning a File in Windows Environments

# Requires PowerShell 7+ and Invoke-RestMethod
$ApiKey = "YOUR_VT_API_KEY"
$FilePath = "C:\Temp\sample.exe"
$BaseUri = "https://www.virustotal.com/api/v3"

# Compute SHA256
$Hash = Get-FileHash -Path $FilePath -Algorithm SHA256
$Sha256 = $Hash.Hash.ToLower()

# Check if hash already exists
$Lookup = Invoke-RestMethod -Method Get `
    -Uri "$BaseUri/files/$Sha256" `
    -Headers @{ "x-apikey" = $ApiKey } `
    -ErrorAction SilentlyContinue

if ($Lookup) {
    Write-Host "File already known. Displaying existing report..."
    $Report = $Lookup
} else {
    # Upload file
    $Upload = Invoke-RestMethod -Method Post `
        -Uri "$BaseUri/files" `
        -Headers @{ "x-apikey" = $ApiKey } `
        -Form @{ file = Get-Item $FilePath }
    $AnalysisId = $Upload.data.id

    # Poll for result
    do {
        Start-Sleep -Seconds 15
        $Status = Invoke-RestMethod -Method Get `
            -Uri "$BaseUri/analyses/$AnalysisId" `
            -Headers @{ "x-apikey" = $ApiKey }
        $State = $Status.data.attributes.status
        Write-Host "Current status: $State"
    } while ($State -ne "completed")

    $Report = $Status
}

# Summarize
$Stats = $Report.data.attributes.last_analysis_stats
Write-Host "`n=== Verdict Summary ==="
$Stats | ConvertTo-Json -Depth 3

5. Interpreting the Results

A raw VT JSON payload contains a wealth of data. Below we highlight the most actionable sections.

5.1 `last_analysis_stats`

{
  "harmless": 0,
  "malicious": 12,
  "suspicious": 3,
  "undetected": 50,
  "timeout": 0
}

Malicious > 0 – At least one engine flagged the sample.
Suspicious – Engines are unsure; often a heuristic detection.
Undetected – Majority of engines saw nothing; still not a guarantee of safety.

5.2 `last_analysis_results`

Each engine’s detailed verdict is stored here:

{
  "Kaspersky": {
    "category": "malicious",
    "engine_name": "Kaspersky",
    "engine_version": "21.3.0.1085",
    "result": "Trojan.Win32.Agent"
  },
  "Microsoft": {
    "category": "harmless",
    "engine_name": "Microsoft",
    "engine_version": "1.1.18500.11",
    "result": null
  }
}

Result field – Engine‑specific family name (useful for mapping to ATT&CK).
Category – One of malicious, suspicious, harmless, undetected, timeout.

5.3 Sandbox Behaviors (Enterprise)

Enterprise users have access to behavioral graphs and PCAP files that illustrate:

Files created or modified
Registry keys accessed
Network connections (domains, IPs, ports)

These can be exported via the /files/{id}/behaviour endpoint and fed into a YARA or Sigma rule engine for deeper correlation.

5.4 Community Tags and Comments

The public UI shows user‑submitted tags such as #ransomware, #banking-trojan. While not authoritative, they provide context on how the community perceives the sample.

Important: Always cross‑reference community tags with vendor detections; crowdsourced data can be noisy.

6. Best Practices for Using VirusTotal

Never rely on a single engine – Use the aggregate verdict and look for consensus.
Throttle API calls – Respect the rate limits (free tier: 4 req/min). Exceeding limits may result in temporary bans.
Hash‑first workflow – Compute SHA‑256 locally before uploading; this saves quota and speeds up look‑ups.
Sanitize sensitive data – Public submissions are visible to everyone; use the private API for PII or zero‑day samples.
Combine with internal intel – Correlate VT results with your own threat‑intel feeds for richer context.
Log every request – Store request IDs, timestamps, and raw responses for auditability.
Automate remediation – Tie VT verdicts to SOAR playbooks (e.g., automatically isolate a host if a malicious file is detected).

7. Limitations and Caveats

Limitation	Impact	Mitigation
Public submissions are shared	Sensitive samples could be exposed	Use Enterprise private scans or self‑hosted sandbox for zero‑day analysis
Engine coverage varies	Some niche malware families may be missed	Supplement VT with specialized sandboxes (e.g., Cuckoo, FireEye)
Rate limits	Free tier may be insufficient for high‑volume SOCs	Purchase Professional/Enterprise API key
False positives/negatives	Engines disagree; some may flag benign software	Perform manual triage on borderline cases
File size cap (public)	4 MB limit may prevent scanning large installers	Split large binaries, or use the paid tier which supports up to 200 MB

Being aware of these constraints helps you design a defense‑in‑depth approach where VirusTotal is a critical component, not the sole source of truth.

8. Advanced Topics

8.1 Bulk Hunting with the “Live Hunting” Feature (Enterprise)

Live Hunting lets you run YARA or Sigma queries against the entire VirusTotal dataset in near‑real time. Example YARA rule to find any PE executable that imports WinInet.dll:

rule WinInet_Import {
    meta:
        description = "Detects binaries importing WinInet"
        author = "YourName"
    condition:
        pe.imports("WinInet.dll")
}

You submit the rule via the /intelligence/hunting endpoint; VT returns a list of matching hash IDs, which you can then download for deeper analysis.

8.2 Integrating with SIEMs (Splunk, Elastic, QRadar)

Most SIEM platforms have pre‑built VirusTotal connectors. The typical flow:

Forward file hashes from endpoint logs to a lookup script.
The script queries VT (cached locally for speed).
Enrich the event with malicious count and engine list.
Alert if malicious >= 3 and the file originated from an untrusted source.

8.3 Using VT as a Threat‑Intel Source for ATT&CK Mapping

When a VT report flags a sample as Trojan.Win32.Agent, you can map it to ATT&CK technique T1059 – Command and Scripting Interpreter (if the sandbox shows PowerShell execution). Automating this mapping creates richer detection rules for your EDR.

# Pseudocode for ATT&CK mapping
if "PowerShell" in behaviors["processes"]:
    technique = "T1059.001"
elif "cmd.exe" in behaviors["processes"]:
    technique = "T1059.003"

9. Frequently Asked Questions (FAQ)

Question	Answer
Can I scan a URL without uploading a file?	Yes. Use the `/urls` endpoint (public) or the `/intelligence/search` endpoint (Enterprise) to submit a URL directly.
What happens to my uploaded file?	VirusTotal stores the file for 90 days (public) or 365 days (Enterprise) for re‑analysis and community sharing.
Is there a limit on the number of analyses per day?	Free tier: 4 requests/minute, 500 requests per day. Paid tiers increase limits dramatically (up to 10 k/min for Enterprise).
Can I delete a file I uploaded?	Public uploads cannot be deleted. Enterprise customers can request removal via the support portal.
Do antivirus vendors rely on VirusTotal?	Many vendors use VT as a source of community intel, but they also run their own proprietary detection pipelines.

Conclusion

VirusTotal has evolved from a simple “upload‑and‑see” web tool into a cornerstone of modern cyber‑defense. Its ability to aggregate dozens of detection engines, provide sandbox behavior, and expose a robust API makes it indispensable for:

Incident responders seeking rapid triage
Threat hunters looking for historical context
DevSecOps engineers automating security gates in CI/CD pipelines
Researchers needing quick access to a massive sample repository

However, the platform is not a silver bullet. Effective use requires understanding its limits, respecting privacy considerations, and combining VT data with internal telemetry. By following the best practices outlined in this guide—hash‑first checks, proper rate‑limiting, automated enrichment, and thoughtful remediation—you can turn VirusTotal from a passive lookup service into an active, programmable intelligence engine that strengthens your organization’s security posture.

Stay curious, stay vigilant, and let VirusTotal be a trusted ally in the ever‑changing battle against malware.

Resources

VirusTotal Official Site – Main portal for file/URL scanning and documentation.
VirusTotal API v3 Documentation – Detailed reference for all endpoints, request limits, and response schemas.
MITRE ATT&CK Framework – Mapping VT behavioral data to adversary tactics and techniques.
Google Chronicle Blog – Threat Intelligence at Scale – Insights on how Google integrates VirusTotal into broader threat‑intel pipelines.
FireEye Blog – Sandboxing vs. Static Scanning – Comparative analysis useful when supplementing VT with dedicated sandboxes.

Introduction#

1. What Is VirusTotal?#

1.1 Public vs. Private (API) Access#

2. Core Concepts and Terminology#

3. Real‑World Use Cases#

3.1 Incident Response (IR)#

3.2 Threat Hunting#

3.3 Malware Research & Reverse Engineering#

3.4 DevSecOps & CI/CD Pipelines#

4. Getting Started with the VirusTotal API (v3)#

4.1 Prerequisites#

4.2 Python Example – Submitting and Retrieving a File Scan#

4.3 Bash / cURL Example – Quick URL Scan#

4.4 PowerShell Example – Scanning a File in Windows Environments#

5. Interpreting the Results#

5.1 last_analysis_stats#

5.2 last_analysis_results#

5.3 Sandbox Behaviors (Enterprise)#

5.4 Community Tags and Comments#

6. Best Practices for Using VirusTotal#

7. Limitations and Caveats#

8. Advanced Topics#

8.1 Bulk Hunting with the “Live Hunting” Feature (Enterprise)#

8.2 Integrating with SIEMs (Splunk, Elastic, QRadar)#

8.3 Using VT as a Threat‑Intel Source for ATT&CK Mapping#

9. Frequently Asked Questions (FAQ)#

Conclusion#

Resources#