curl-wget
Linux Commands Comparison: curl vs wget
1. Overview
We may wish to send HTTP requests without using a web browser or other interactive app. For this, Linux provides us with two commands: curl and wget.
Both commands are quite helpful as they provide a mechanism for non-interactive download and upload of data. We can use them for web crawling, automating scripts, testing of APIs, etc.
In this tutorial, we will be looking at the differences between these two utilities.
2. Protocols
Both curl and wget support HTTP, HTTPS, and FTP protocols. So if we want to get a page from a website, say baeldung.com, then we can run them with the web address as the parameter:
wget link:/
--2019-10-02 22:00:34-- link:/
Resolving www.baeldung.com (www.baeldung.com)... 2606:4700:30::6812:3e4e, 2606:4700:30::6812:3f4e, 104.18.63.78, ...
Connecting to www.baeldung.com (www.baeldung.com)|2606:4700:30::6812:3e4e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’
index.html [ <=> ] 122.29K --.-KB/s in 0.08s
2019-10-02 22:00:35 (1.47 MB/s) - ‘index.html’ saved [125223]
The main difference between them is that curl will show the output in the console. On the other hand, wget will download it into a file.
We can save the data in a file with curl by using the -o parameter:
curl link:/ -o baeldung.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 122k 0 122k 0 0 99k 0 --:--:-- 0:00:01 --:--:-- 99k
2.2. Download and Upload using FTP
wget --user=abhi --password='myPassword' ftp://abc.com/hello.pdf
curl -u abhi:myPassword 'ftp://abc.com/hello.pdf' -o hello.pdf
We can also upload files to an FTP server with curl. For this, we can use the -T parameter:
curl -T "img.png" ftp://ftp.example.com/upload/
We should note that when uploading to a directory, we must use provide the trailing /, otherwise curl will think that the path represents a file.
2.3. Differences
The difference between the two is that curl supports a plethora of other protocols. This includes DICT, FILE, FTPS, GOPHER, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, and TFTP.
We can treat curl as a general-purpose tool for transferring data to or from a server.
On the other hand, wget is basically a network downloader.
3. Recursive Download
When we wish to make a local copy of a website, wget is the tool to use. curl does not provide recursive download, as it cannot be provided for all its supported protocols.
We can download a website with wget in a single command:
wget --recursive link:
This will download the homepage and any resources linked from it. As we can see, www.baeldung.com links to various other resources like:
-
Start here
-
REST with Spring course
-
Learn Spring Security course
-
Learn Spring course
wget will follow each of these resources and download them individually:
--2019-10-02 22:09:17-- link:/start-here
...
Saving to: ‘www.baeldung.com/start-here’
www.baeldung.com/start-here [ <=> ] 134.85K 321KB/s in 0.4s
2019-10-02 22:09:18 (321 KB/s) - ‘www.baeldung.com/start-here’ saved [138087]
--2019-10-02 22:09:18-- link:/rest-with-spring-course
...
Saving to: ‘www.baeldung.com/rest-with-spring-course’
www.baeldung.com/rest-with-spring-cou [ <=> ] 244.77K 395KB/s in 0.6s
2019-10-02 22:09:19 (395 KB/s) - ‘www.baeldung.com/rest-with-spring-course’ saved [250646]
... more output omitted
3.1. Recursive Download with HTTP
The recursive download is one of the most powerful features of wget. This means that wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site.
Recursive downloading in wget is breadth-first. In other words, it first downloads the requested document, then the documents linked from that document, then the documents linked by those documents, and so on. The default maximum depth is set to five, but it can be overridden using the -l parameter:
wget ‐l=1 ‐‐recursive ‐‐no-parent http://example.com
In the case of HTTP or HTTPS URLs, wget scans and parses the HTML or CSS. Then, it retrieves the files the document refers to, through markups like href or src.
By default, wget will exclude paths under robots.txt (Robot Exclusion Standard). To switch this off, we can use the -e parameter:
wget -e robots=off http://example.com
4. Conclusion
wget is a simpler solution and only supports a small number of protocols. It is very good for downloading files and can download directory structures recursively.
We also saw how curl supports a much larger range of protocols, making it a more general-purpose tool.