Table of Contents
- Introduction
- Installing wget
- Basic Usage
- Advanced Options
- Scripting with wget
- Common Pitfalls & Troubleshooting
- wget vs. curl: When to Use Which?
- Real‑World Use Cases
- Security Considerations
10 Conclusion
11 Resources
Introduction
wget—short for World Wide Web GET—is a powerful, non‑interactive command‑line utility designed to retrieve files from the Internet using HTTP, HTTPS, and FTP protocols. Since its first release in 1996 as part of the GNU Project, wget has become a staple in the toolbox of system administrators, developers, DevOps engineers, and hobbyist power users alike.
Why does wget remain relevant in an era dominated by graphical download managers and sophisticated APIs? The answer lies in its simplicity, robustness, and automation‑friendly design:
- Non‑interactive: Once launched,
wgetcan run unattended, making it ideal for cron jobs, CI pipelines, and remote servers without a graphical interface. - Recursive capabilities: It can mirror entire websites, follow links, and rebuild directory structures automatically.
- Fault tolerance: Built‑in retry mechanisms, resume support, and intelligent handling of network hiccups keep large downloads reliable.
- Portability: Available on virtually every Unix‑like system, Windows (via Cygwin, WSL, or native ports), and even embedded devices.
This guide dives deep into wget’s feature set, from installation through advanced scripting, and equips you with practical examples you can copy‑paste into your own workflows.
Installing wget
Most Linux distributions ship wget by default, but if you need to install or upgrade it, follow the instructions for your platform.
Debian / Ubuntu
sudo apt-get update
sudo apt-get install wget
Fedora / CentOS / RHEL
# Fedora
sudo dnf install wget
# CentOS / RHEL 7
sudo yum install wget
macOS (Homebrew)
brew install wget
Windows
- WSL (Windows Subsystem for Linux) – Install any Linux distribution and then follow the Linux steps.
- Chocolatey –
choco install wget - Cygwin – Include the
wgetpackage during setup.
Verify the installation:
wget --version
You should see output similar to:
GNU Wget 1.21.3 built on linux-gnu.
...
Basic Usage
At its core, wget takes a URL and writes the retrieved content to a file in the current directory.
wget https://example.com/file.zip
Common Flags
| Flag | Description |
|---|---|
-O <file> | Write output to file instead of the default filename. |
-q | Quiet mode (no output). |
-nv | Non‑verbose output (status messages only). |
-c | Continue a partially downloaded file. |
-t <n> | Set number of retries (default 20). |
-T <seconds> | Set network timeout. |
-U <agent> | Specify a custom User‑Agent string. |
Example – Download with a custom filename and silent mode:
wget -q -O latest-release.tar.gz https://example.org/releases/v2.5.0.tar.gz
Advanced Options
Recursive Downloads & Mirroring
One of wget’s hallmark features is its ability to recursively traverse links and download entire directory trees or whole websites.
wget --recursive --no-parent https://example.com/docs/
--recursive(-r) tellswgetto follow links.--no-parentprevents climbing to parent directories.--level=<n>limits recursion depth (default: infinite).
Mirroring a Site
The --mirror shortcut combines several flags to produce a faithful local copy:
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
https://example.com/blog/
--convert-linksrewrites links to point to local files.--adjust-extensionadds appropriate extensions (e.g.,.html).--page-requisitesgrabs CSS, images, scripts needed for proper rendering.
Timestamping & Conditional Requests
Avoid re‑downloading unchanged files with -N (or --timestamping):
wget -N https://example.com/data/dataset.csv
wget adds an If-Modified-Since header; the server replies with 304 Not Modified if the file hasn’t changed, saving bandwidth.
Bandwidth Limiting
When sharing a network connection with other services, throttle wget using --limit-rate:
wget --limit-rate=500k https://largefile.com/backup.iso
Units can be k, m, or g (kilobytes, megabytes, gigabytes per second).
Authentication & Cookies
HTTP Basic/Digest Authentication
wget --user=alice --password=secret https://secure.example.com/report.pdf
For security, avoid plain‑text passwords on the command line; instead, use a .wgetrc file with appropriate permissions:
user = alice
password = secret
Cookies
If a site requires login via a cookie, you can export the cookie from a browser (e.g., using the “Export Cookies” extension) and supply it:
wget --load-cookies cookies.txt https://example.com/private/data.json
Proxy Support
Set environment variables or use wget options:
export http_proxy="http://proxy.example.com:8080"
export https_proxy="http://proxy.example.com:8080"
wget https://internal.example.com/resource
Or directly:
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:8080 \
https://internal.example.com/resource
HTTPS, FTP, and Other Protocols
wget supports:
HTTPS – Encrypted downloads; use
--secure-protocol=auto(default) or--secure-protocol=TLSv1_2for strict TLS versions.FTP – Classic file transfer; example:
wget ftp://ftp.example.org/pub/software.tar.gzSFTP – Not native to
wget(requirescurlorssh), but you can invokewgetvia a wrapper script if needed.
Resuming Interrupted Downloads
Large downloads are prone to interruption. Use -c (continue) to resume:
wget -c https://mirror.example.com/large.iso
If the remote server does not support byte ranges, wget will restart from the beginning.
Robots.txt and Ethical Scraping
By default, wget obeys the robots.txt policy of a site. Override with:
wget --no-robots https://example.com/
Caution: Ignoring robots.txt may violate a site’s terms of service and can lead to IP bans. Always respect crawling policies, and consider using a delay:
wget --wait=2 --random-wait --limit-rate=200k \
--recursive https://example.com/data/
Output Control & Logging
Redirect wget’s log output to a file for later analysis:
wget -o download.log -a append.log https://example.com/file.zip
-owrites a new log file.-aappends to an existing log (useful for batch jobs).
You can also suppress the progress bar while keeping status messages:
wget --quiet --show-progress URL
Scripting with wget
wget shines in automation. Below are common patterns for inclusion in shell scripts, cron jobs, or CI pipelines.
Example 1: Daily Backup of Remote Assets
#!/usr/bin/env bash
# backup.sh – pull remote assets nightly
BASE_URL="https://assets.example.com/releases"
DEST="/var/backups/assets"
TODAY=$(date +%F)
mkdir -p "$DEST/$TODAY"
wget --quiet \
--recursive \
--no-parent \
--timestamping \
--directory-prefix="$DEST/$TODAY" \
"$BASE_URL"
# Log rotation
find "$DEST" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;
Add to cron (crontab -e):
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1
Example 2: Parallel Downloads with GNU Parallel
wget itself is single‑threaded per URL, but you can parallelize multiple downloads:
cat urls.txt | parallel -j 4 wget -c -nv {}
Example 3: Conditional Download Based on HTTP Status
#!/usr/bin/env bash
URL="https://example.com/report.pdf"
TMPFILE=$(mktemp)
# Perform a HEAD request first
if wget --spider -S "$URL" 2>&1 | grep -q "200 OK"; then
wget -O report.pdf "$URL"
else
echo "File not available (status != 200)" >&2
fi
rm -f "$TMPFILE"
Common Pitfalls & Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
wget: unable to resolve host address | DNS misconfiguration or no internet connectivity. | Verify /etc/resolv.conf, test with dig or ping. |
ERROR 403: Forbidden | Server blocks non‑browser User‑Agents. | Use --user-agent="Mozilla/5.0" or proper authentication. |
ERROR 404: Not Found after a recursive crawl | Link was relative and wget rewrote it incorrectly. | Use --adjust-extension and --convert-links; check --no-parent. |
Downloads restart from the beginning despite -c | Remote server doesn’t support byte‑range requests. | Use --continue only on servers that advertise Accept-Ranges: bytes. |
| Excessive bandwidth usage | Missing --limit-rate or --wait. | Add throttling options, or schedule during off‑peak hours. |
| SSL/TLS handshake failures | Out‑dated wget version or missing CA certificates. | Update wget or install ca-certificates package. |
Debugging tip: Increase verbosity with -d (debug) to see the exact HTTP exchange.
wget -d https://example.com/file.tar.gz
wget vs. curl: When to Use Which?
Both tools are ubiquitous, but they excel in different scenarios.
| Feature | wget | curl |
|---|---|---|
| Recursive downloading / mirroring | ✅ (native) | ❌ (requires external scripts) |
| Resume support | ✅ | ✅ |
| POST/PUT with complex payloads | ❌ (limited) | ✅ (full HTTP method support) |
| HTTPS with client certificates | ✅ (limited) | ✅ (robust) |
| Downloading to stdout | ❌ (requires -O -) | ✅ (default) |
| Progress bar | Simple text bar | Advanced, configurable bar |
| Scripting convenience | Good for simple GET/recursive tasks | Better for API interactions |
Rule of thumb: Use wget for straightforward file retrieval, bulk downloads, or site mirroring. Reach for curl when you need fine‑grained HTTP control, custom headers, or to interact with REST APIs.
Real‑World Use Cases
Automated Dataset Collection
Researchers often need to pull large CSV or image collections from public repositories.wget’s--timestampingensures only new files are downloaded each night.Continuous Integration Artifact Retrieval
CI pipelines can fetch pre‑built binaries from an internal Nexus repository:wget --auth-no-challenge --user=ci --password=$CI_PASS \ https://nexus.example.com/repository/releases/app-1.2.3.tar.gzOffline Documentation Mirroring
Companies ship internal documentation to air‑gapped environments by mirroring a Confluence space:wget --mirror --convert-links --adjust-extension \ --page-requisites --no-parent \ https://docs.internal.company.com/Backup of Remote Log Files
System administrators can rotate remote log archives via a cron job that pulls.log.gzfiles from a central logging server.IoT Firmware Updates
Embedded Linux devices with limited UI can invokewgetto download new firmware images over HTTPS, then flash them automatically.
Security Considerations
While wget is reliable, misuse can expose systems to risk:
- Untrusted URLs – Downloading executables from unknown sources may lead to malware. Always verify checksums (
sha256sum) after download. - Plain‑text credentials – Avoid passing passwords on the command line; use
.netrcor environment variables with restricted permissions. - TLS verification – By default,
wgetvalidates server certificates. Disabling verification (--no-check-certificate) defeats this protection and should be reserved for testing. - Directory traversal – When mirroring, ensure
--no-parentand proper--rejectpatterns to avoid unintentionally downloading sensitive files. - Rate limiting – Excessive crawling may trigger DDoS protection on target sites. Respect
robots.txtand use--wait/--random-wait.
Conclusion
wget remains a cornerstone of command‑line networking, offering a blend of simplicity, power, and reliability that few modern tools can match. Whether you’re backing up a critical dataset, mirroring an entire website for offline access, or automating nightly builds, mastering wget unlocks a world of efficient, scriptable file retrieval.
Key takeaways:
- Install the latest version for TLS and security improvements.
- Leverage recursive and mirroring options for large‑scale downloads.
- Use
-c,-N, and--limit-rateto make downloads resilient and network‑friendly. - Combine
wgetwith shell scripting, cron, or GNU Parallel for robust automation pipelines. - Always respect ethical scraping practices and secure your credentials.
Armed with the concepts and examples in this guide, you’re ready to harness wget in both everyday tasks and complex production workflows.
Resources
GNU Wget Manual – Comprehensive official documentation.
https://www.gnu.org/software/wget/manual/Linux man page for wget – Quick reference for all command‑line options.
https://man7.org/linux/man-pages/man1/wget.1.htmlWget Tips & Tricks (DigitalOcean) – Practical recipes for common scenarios.
https://www.digitalocean.com/community/tutorials/how-to-use-wget-to-download-files-from-the-internetUnderstanding robots.txt (Google Developers) – Guidance on ethical crawling.
https://developers.google.com/search/docs/advanced/robots/introComparing wget and curl (Stack Overflow) – Community insights on when to use each tool.
https://stackoverflow.com/questions/10261931/difference-between-wget-and-curl