curl and proxies

Aditya Kamath
24 min readJul 2, 2021

--

Introduction

In simple terms curl is a command line utility to make HTTP requests and see the response. I like to think of curl is a debugging tool for HTTP just as ping and traceroute are for IP connectivity.

There are probably many existing blogs and tutorials on the internet that explain how to use curl and I do not want to repeat much of that but just enough to build context.

What I would like to instead cover in this blog are HTTP/HTTPS proxies and how you can use curl to make requests through the proxy, understand what is going on and pin pointing if there is an issue with the proxy or with something else. The reason to cover this topic is that enterprise companies and their networks often have egress proxies at the edge of their networks that act like L7 firewalls and also become a point of logging and auditing all traffic that is leaving the network. While they are useful from a network security perspective, I have noticed that many people don’t really understand how these proxies work, how to use them correctly and how to narrow down issues to either the proxy or the application using it.

simple http/https requests

Let’s start by running some simple HTTP and HTTPS requests with curl to get familiar with the output.

First a simple HTTP request:

[aditya@localhost ~]$ curl http://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
[aditya@localhost ~]$

We can get some more information by using the verbose mode using the -v flag

[aditya@localhost ~]$ curl -v http://google.com
* Trying 216.58.194.206:80...
* Connected to google.com (216.58.194.206) port 80 (#0)
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Mon, 14 Jun 2021 01:55:08 GMT
< Expires: Wed, 14 Jul 2021 01:55:08 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact
[aditya@localhost ~]$

Verbose mode gets us a lot more information, for example that the TCP connection was made to 216.58.194.206 on port 80. We also see the plaintext HTTP request and the response along with the headers which are indicated by the > and <prefixes.

Moving to a HTTPS request

[aditya@localhost ~]$ curl https://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved <A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
[aditya@localhost ~]$

And the same in verbose mode

[aditya@localhost ~]$ curl -v https://google.com
* Trying 172.217.6.46:443...
* Connected to google.com (172.217.6.46) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.google.com
* start date: May 17 01:36:50 2021 GMT
* expire date: Aug 9 01:36:49 2021 GMT
* subjectAltName: host "google.com" matched cert's "google.com"
* issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x562b1efc1bf0)
> GET / HTTP/2
> Host: google.com
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< date: Mon, 14 Jun 2021 02:00:49 GMT
< expires: Wed, 14 Jul 2021 02:00:49 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact
[aditya@localhost ~]$

Now this has a lot more information on what happened under-the-hood in establishing the TLS session, things like which trusted Certificate Authority is being used to verify the server certs against

*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none

The ALPN (Application Layer Protocol Negotiation) exchange that determines which HTTP version will be used, HTTP/2 in this case

* ALPN, server accepted to use h2
. . .
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)

And information about the server certificate also

* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.google.com
* start date: May 17 01:36:50 2021 GMT
* expire date: Aug 9 01:36:49 2021 GMT
* subjectAltName: host "google.com" matched cert's "google.com"
* issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
* SSL certificate verify ok.

curl will use the default port based on the scheme for the TCP connectivity, like 80 for HTTP and 443 for HTTPS, however you can explicitly specify the port in the request in case the app is running on a different port, for example to connect on port 12345

curl -v https://google.com:12345

curl with proxies

Now onto working with HTTP and HTTPS proxies.

Setting up the proxy

For this demonstration, I’m going to use a local Docker container that is running the popular proxy squid. I used a simple Dockerfile as follows for the image

FROM fedora:34RUN dnf -y update && dnf install -y vim squid
ENTRYPOINT ["/usr/sbin/squid", "--foreground"]

I will be using the default configuration that came with the Fedora package with some slight modifications which can be viewed here.

Once you have that config file on disk somewhere, and the Dockerfile above build as squid:latest, you can run the proxy in a terminal using a command like

docker run -p 3128:3128 -it --rm -v `pwd`/squid.conf:/etc/squid/squid.conf squid:latest

Once the container is running, the proxy will be accessible on localhost:3128

HTTP request through the proxy

To use the proxy with curl you can use the --proxy option or its shorthand -x. So a simple HTTP request through the proxy can be executed like

[aditya@localhost curl_blog]$ curl -v -x localhost:3128 http://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
> GET http://google.com/ HTTP/1.1
> Host: google.com
> User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Tue, 15 Jun 2021 21:42:40 GMT
< Expires: Thu, 15 Jul 2021 21:42:40 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
< X-Cache: MISS from 0f3085d1eb34
< X-Cache-Lookup: MISS from 0f3085d1eb34:3128
< Via: 1.1 0f3085d1eb34 (squid/5.0.6)
< Connection: keep-alive
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host localhost left intact
[aditya@localhost curl_blog]$

What we see here is that, first the TCP connection was first established with the proxy

*   Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)

And the GET request was sent over that connection. Squid acts like a regular HTTP proxy in this case, processes the GET request, forms another GET request of its own and sends that to Google. The response from Google is again processed by squid, some extra headers added and then sent back to curl.

We can also confirm this theory by looking at a packet capture that I ran simultaneously when I ran the above request. The first screenshot here is showing the GET request that curl made to the proxy. (A big shoutout to termshark.io for the in terminal packet capture viewer that I am using in these screenshots.)

And the second screenshot here is showing the GET request that squid made to Google. You can also notice that some extra headers like X-Forwarded-For were also added by squid in the request to Google.

The important thing to note here is that a plaintext HTTP request was sent to the proxy, since the request is plaintext, squid can process it, mangle it any which way it seems fit. The same is true for the response on the reverse direction.

HTTP proxies can also cache certain requests like GET and depending on the cache control headers. This way squid can cache the 301 response from Google and then reply back with the cached response for another GET request without ever talking to Google for it. Such proxies are usually known as caching proxies. The mode I am running squid in here is also what is known as a forward proxy. I will defer further explanation other better explanation for these terms on the internet.

HTTPS request through the proxy

Coming to an HTTPS request, there are now a few options to deal with the TLS layer:

1 - Proxy acts like a tunnel without interfering with the TLS handshake

a. Client will see the certificate of the actual destination server.

b. Proxy will not see the actual HTTP requests/responses over the encrypted channel.

2 - Proxy terminates TLS and decrypts requests, processes them before re-encrypting it towards the destination server.

a. Client will see a certificate presented by the proxy and not the actual destination server.

b. Trust of the certificate presented by the destination server will be determined by the proxy.

c. Proxy will see the plaintext HTTP request/response exchanged.

In this blog I will mainly be covering (1) and the underlying HTTP CONNECT protocol. Proxies in (2) are usually setup in what is known as transparent mode where the client is unaware of the presence of the proxy. (2) gives the administrator of the network greater visibility and greater audit and enforcement capability than (1), however the added TLS termination is more resource intensive on the proxy and all the clients in the organization will need to trust the certificate presented by the proxy regardless of the destination in order to make TLS work. Option (2) is also not something that would work with curl’s -x option.

Coming to option (1), we simply change the curl request from our previous section to use a HTTPS destination like so

[aditya@localhost curl_blog]$ curl -v -x localhost:3128 https://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.76.1
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* CONNECT phase completed!
* CONNECT phase completed!
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [15 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [4502 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [79 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.google.com
* start date: May 17 01:36:50 2021 GMT
* expire date: Aug 9 01:36:49 2021 GMT
* subjectAltName: host "google.com" matched cert's "google.com"
* issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55ce16c7ebf0)
} [5 bytes data]
> GET / HTTP/2
> Host: google.com
> user-agent: curl/7.76.1
> accept: */*
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): { [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [264 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< date: Tue, 15 Jun 2021 22:49:48 GMT
< expires: Thu, 15 Jul 2021 22:49:48 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
{ [5 bytes data]
* Connection #0 to host localhost left intact
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>

Now there is a lot going on here. Since we are dealing with option (1) and the proxy is not terminating TLS, there needs to be a way for the client and destination to talk to each other through the proxy in order to establish trust and exchange encrypted messages that the proxy, a man in the middle in this case, from decrypting or mangling in any fashion. This achieved by establishing a tunnel through the proxy where the proxy will transit bytes in both directions without trying to interpret them after the tunnel is established. The proxy can however prevent the tunnel from being established as a matter of policy which we will see later.

The tunnel is established through a special HTTP request known as CONNECT. At a high level this works like follows:

The client first sends a CONNECT request to the proxy mentioning the intended destination server and port. We can see this in the first few lines of the curl output too:

[aditya@localhost curl_blog]$ curl -v -x localhost:3128 https://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.76.1
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* Proxy replied 200 to CONNECT request

The proxy can at this time can deny the request based on its policy, but that isn’t the case here. Once the policy permits, the proxy will try to establish a TCP connection with the given destination server+port from the CONNECT request. If that request is successful, the proxy sets up pipes within itself to transfer bytes from the client to the server and vice versa to fulfill the tunnel and sends a 200 response back.

Once the client receives the 200 Connection established message, the client can assume that the TCP connection it established with the proxy now magically extends to the destination server and overlay any message to the destination server. In our case since we wanted to send out a HTTPS request, we first needed to establish TLS and thats what curldoes. And once TLS is established, we can then send and receive regular HTTP requests through the same connection.

Again to show this in more detail, let’s look at some packet captures:

In the image above, we see the flow I just described actually playing out. The client first sends the connect request to the proxy (first red box). The proxy then establishes the TCP connection with google.com:443 as seen in the second red box. And finally we see the TLS handshake message Client Hello being sent from the client to proxy and then to Google in the third red box.

Use in enterprise environments

The option (1) which I just described is often a means to setup an egress filtering point in enterprise networks. Setting up option (2) with TLS termination incurs extra overhead of managing TLS trust, security and handling sensitive information that may be exchanged over TLS connections. Option (1) on the other hand does not incur that cost while still being able to provide a mechanism to enforce policy and provide audit logs as to which clients have reached out to which destinations. The only downside being that the egress filter will not be able to introspect if any sensitive information that could be leaving the network is indeed leaving it via an encrypted connection, for example if a bad actor has infiltrated the network is trying to ship data out.

Environment variables to set the proxy

Now that we have seen and used the proxy through curl and its —proxy|-x options, often times we are running applications that don’t explicitly have a configuration option to set the proxy to be used. In such cases, usually the HTTP client embedded deep within the application will honor two environment variables http_proxy and https_proxy. And so does curl. I’ll use curl here to demonstrate how these variables get used and what their impact is on the behavior of HTTP and HTTPS requests.

Let’s start with the HTTP_PROXY variable, you can set it on your running shell by running a command like

export http_proxy=localhost:3128

Let’s try a simple HTTP request now

[aditya@localhost curl_blog]$ curl -v http://google.com
* Uses proxy env variable http_proxy == 'localhost:3128'
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
> GET http://google.com/ HTTP/1.1
> Host: google.com
> User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Tue, 15 Jun 2021 21:42:40 GMT
< Expires: Thu, 15 Jul 2021 21:42:40 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
< Age: 344475
< Warning: 113 squid/5.0.6 "This cache hit is still fresh and more than 1 day old"
< X-Cache: HIT from 0f3085d1eb34
< X-Cache-Lookup: HIT from 0f3085d1eb34:3128
< Via: 1.1 0f3085d1eb34 (squid/5.0.6)
< Connection: keep-alive
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host localhost left intact
[aditya@localhost curl_blog]$

We can see that this request did go through the proxy, evident from the additional headers such as X-Cache which squid adds. In this case it was even a cache hit due to the same request being made earlier.

Now a HTTPS request with just the http_proxy variable

[aditya@localhost curl_blog]$ curl -v https://google.com
* Trying 216.58.194.206:443...
* Connected to google.com (216.58.194.206) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=*.google.com
* start date: May 31 01:36:33 2021 GMT
* expire date: Aug 23 01:36:32 2021 GMT
* subjectAltName: host "google.com" matched cert's "google.com"
* issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x560fdd853bf0)
> GET / HTTP/2
> Host: google.com
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< date: Sat, 19 Jun 2021 21:32:44 GMT
< expires: Mon, 19 Jul 2021 21:32:44 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact

This request didn’t go through the proxy and there was no CONNECT request made for this.

Now let’s try exporting just the https_proxy variable. In order to remove the previously set http_proxy variable using the unset command. So overall something like

unset http_proxy
export https_proxy=localhost:3128

Let’s try a HTTPS request first

[aditya@localhost curl_blog]$ curl -v https://google.com
* Uses proxy env variable https_proxy == 'localhost:3128'
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.76.1
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=*.google.com
* start date: May 24 01:36:00 2021 GMT
* expire date: Aug 16 01:35:59 2021 GMT
* subjectAltName: host "google.com" matched cert's "google.com"
* issuer: C=US; O=Google Trust Services; CN=GTS CA 1O1
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5594bf80dbf0)
> GET / HTTP/2
> Host: google.com
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< date: Sun, 20 Jun 2021 01:42:45 GMT
< expires: Tue, 20 Jul 2021 01:42:45 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host localhost left intact

So this clearly went through the proxy evident from the CONNECT request being made. And a HTTP request doesn’t go through the proxy in this scenario

[aditya@localhost curl_blog]$ curl -v http://google.com
* Trying 172.217.6.78:80...
* Connected to google.com (172.217.6.78) port 80 (#0)
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Sun, 20 Jun 2021 01:45:55 GMT
< Expires: Tue, 20 Jul 2021 01:45:55 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact

Depending on what you need you can set one or both the environment variables. Most people often just set both since they often don’t know if the app switches between the protocols at any time causing weird failures.

Error scenarios

Everything we discussed above are the good scenarios. I want to discuss next some errors one can encounter while using a proxy and what the client sees in those cases.

Policy reasons

Like I alluded to earlier the proxy can have policies which can apply ACLs on requests going through it. Let’s add a deny rule in the squid configuration to block requests going to the google.com domain. In squid configuration, we can do this by adding lines like the following in the config file and reloading squid. I should add that this is one of the simplest deny rules, but in practice the policies can be very rich.

There is more to how squid config works and the relative placement of these lines matters a lot which is a more in-depth topic that isn’t in scope for this blog.

acl deny_google dstdomain google.com
http_access deny deny_google

Let’s run the same HTTP and HTTPS requests and see what happens

[aditya@localhost curl_blog]$ curl -v -x localhost:3128 http://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
> GET http://google.com/ HTTP/1.1
> Host: google.com > User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< Server: squid/5.0.6
< Mime-Version: 1.0
< Date: Sun, 20 Jun 2021 23:34:38 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3495
< X-Squid-Error: ERR_ACCESS_DENIED 0
< Vary: Accept-Language
< Content-Language: en < X-Cache: MISS from 0f3085d1eb34
< X-Cache-Lookup: NONE from 0f3085d1eb34:3128
< Via: 1.1 0f3085d1eb34 (squid/5.0.6)
< Connection: keep-alive
<
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
… squid error page continues …

So we see a 403 Forbidden response, the more interesting bit here though is that the X-Squid-Error header has the value ERR_ACCESS_DENIED which indicates that squid returned the error and it was due to an ACL that denied the request.

Here’s the same with a HTTPS request

[aditya@localhost curl_blog]$ curl -v -x localhost:3128 https://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.76.1
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 403 Forbidden
< Server: squid/5.0.6
< Mime-Version: 1.0
< Date: Sun, 20 Jun 2021 23:41:37 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3483
< X-Squid-Error: ERR_ACCESS_DENIED 0
< Vary: Accept-Language
< Content-Language: en
< X-Cache: MISS from 0f3085d1eb34
< X-Cache-Lookup: NONE from 0f3085d1eb34:3128
< Via: 1.1 0f3085d1eb34 (squid/5.0.6)
< Connection: keep-alive
<
* Received HTTP code 403 from proxy after CONNECT
* CONNECT phase completed!
* Closing connection 0
curl: (56) Received HTTP code 403 from proxy after CONNECT

In this case, we get the 403 in response the CONNECT call and we see the same ERR_ACCESS_DENIED value in the header.

Network Errors

If there are no policies that block any particular request, there are still some network errors that the proxy can encounter. While serving any request HTTP or HTTPS, the proxy has to perform 2 actions

  1. Lookup the IP address of the destination domain
  2. Establish a TCP connection to one of the IP addresses from above and the specified port from the request

We can encounter errors in either of these scenarios.

Let’s start with the DNS failure. I am going to use a domain that currently does not have a valid DNS record as an example, however there are other approaches I could have used such as blocking DNS queries from the proxy host to simulate the same behavior.

So after some trail and error, I found that unknown-dns.net does not exist. You can try it by running something like

dig unknown-dns.net

Which will show you the DNS query result.

Next we will try a curl for both HTTP and HTTPS to this domain

[aditya@localhost ~]$ curl -v -x localhost:3128 http://unknown-dns.net
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
> GET http://unknown-dns.net/ HTTP/1.1
> Host: unknown-dns.net
> User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Server: squid/4.14
< Mime-Version: 1.0
< Date: Fri, 02 Jul 2021 22:40:49 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3718
< X-Squid-Error: ERR_DNS_FAIL 0
< Vary: Accept-Language
< Content-Language: en
< X-Cache: MISS from bf9f77b130a4
< X-Cache-Lookup: MISS from bf9f77b130a4:3128
< Via: 1.1 bf9f77b130a4 (squid/4.14)
< Connection: keep-alive
<
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
- squid error page -

So we get a 503 status code back and notice the X-Squid-Error header says ERR_DNS_FAIL which indicates that squid was not able to resolve the particular domain. With an HTTPS request we get a similar response back for the CONNECT request:

[aditya@localhost ~]$ curl -v -x localhost:3128 https://unknown-dns.net         *   Trying ::1:3128...                         
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer! * Establish HTTP proxy tunnel to unknown-dns.net:443
> CONNECT unknown-dns.net:443 HTTP/1.1
> Host: unknown-dns.net:443 > User-Agent: curl/7.76.1 > Proxy-Connection: Keep-Alive
>
< HTTP/1.1 503 Service Unavailable < Server: squid/4.14 < Mime-Version: 1.0 < Date: Fri, 02 Jul 2021 22:38:08 GMT < Content-Type: text/html;charset=utf-8 < Content-Length: 3706
< X-Squid-Error: ERR_DNS_FAIL 0
< Vary: Accept-Language
< Content-Language: en
<
* Received HTTP code 503 from proxy after CONNECT * CONNECT phase completed! * Closing connection 0
curl: (56) Received HTTP code 503 from proxy after CONNECT

Next in the case where DNS succeeds but squid is unable to establish a TCP connection to the particular IP address and port mentioned. For this case I will be adding some iptables rules on the squid container to drop all packets to the HTTP and HTTPS default ports using the following commands:

iptables -A OUTPUT --protocol tcp --destination-port 80 -j DROP
iptables -A OUTPUT --protocol tcp --destination-port 443 -j DROP

Let’s try our google.com requests again

[aditya@localhost ~]$ curl -v -x localhost:3128http://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0) > GET http://google.com/ HTTP/1.1
> Host: google.com
> User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive >
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< Server: squid/4.14
< Mime-Version: 1.0
< Date: Fri, 02 Jul 2021 22:48:41 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3558
< X-Squid-Error: ERR_CONNECT_FAIL 110
< Vary: Accept-Language
< Content-Language: en
< X-Cache: MISS from bf9f77b130a4
< X-Cache-Lookup: MISS from bf9f77b130a4:3128
< Via: 1.1 bf9f77b130a4 (squid/4.14)
< Connection: keep-alive
<
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
- squid error page -

Notice this time that X-Squid-Error is ERR_CONNECT_FAIL which indicates that the TCP establishment to the destination server was not successful. The same happens for HTTPS requests as well:

[aditya@localhost ~]$ curl -v -x localhost:3128 https://google.com
* Trying ::1:3128...
* Connected to localhost (::1) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.76.1
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 503 Service Unavailable
< Server: squid/4.14
< Mime-Version: 1.0
< Date: Fri, 02 Jul 2021 22:50:38 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3512
< X-Squid-Error: ERR_CONNECT_FAIL 110
< Vary: Accept-Language
< Content-Language: en
<
* Received HTTP code 503 from proxy after CONNECT
* CONNECT phase completed!
* Closing connection 0
curl: (56) Received HTTP code 503 from proxy after CONNECT

Squid 5 differences

While running the experiments for this blog I noticed that Squid version 5 was behaving a little differently, specifically with the error codes and error messages being returned. Looking at the release notes for Squid 5 it seems that a change was made to return the HTTP code 500 in some error scenarios and X-Squid-Error now has the value ERR_CANNOT_FORWARD, for example with DNS failures:

[aditya@localhost ~]$ curl -v -x localhost:3128 http://unknown-dns.net  *   Trying ::1:3128...                                                  * Connected to localhost (::1) port 3128 (#0)                           > GET http://unknown-dns.net/ HTTP/1.1         
> Host: unknown-dns.net
> User-Agent: curl/7.76.1
> Accept: */*
> Proxy-Connection: Keep-Alive >
* Mark bundle as not supporting multiuse < HTTP/1.1 500 Internal Server Error
< Server: squid/5.0.6
< Mime-Version: 1.0
< Date: Fri, 02 Jul 2021 22:59:21 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 3833
< X-Squid-Error: ERR_CANNOT_FORWARD 0
< Vary: Accept-Language
< Content-Language: en
< X-Cache: MISS from bf9f77b130a4
< X-Cache-Lookup: MISS from bf9f77b130a4:3128
< Via: 1.1 bf9f77b130a4 (squid/5.0.6)
< Connection: keep-alive
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
- squid error page -

I haven’t yet found a way to extract out the error that it was a DNS failure like Squid before version 5 would indicate.

TLS with the proxy

This section is a quick note on a recent feature that was added to curl but was supported on the server side for quite a while in proxies like Squid. In all of the examples above, whenever a HTTPS request was made, the very first HTTP CONNECT request was still sent in plaintext to the proxy and a man in the middle could still sniff it and mangle it. Since there would have been a TLS handshake after the tunnel was established, the client could still verify if the destination the tunnel was established to was indeed the destination.

The other issue with the plaintext exchange was with AuthN and AuthZ. Proxy deployments in enterprise networks often have ACLs and policies tuned based on who is accessing the proxy. The proxy understands the “who” through some means of identity, either a username+password or nowadays through client certificates that the client presents when establishing a TLS connection.

All this calls for the proxy server to listen for TLS requests on the port it listens to and some way for the client to establish TLS with the proxy before interacting with it further. This is where the recent feature in curl comes in and is nicely described in this blog.

Setting up a certificates and TLS for demo purposes is a lot of work and will hopefully be covered in a follow-up post.

Conclusion

My aim through this blog was to show how one can use the curl command line tool in the presence of a proxy server, what its behavior is like for HTTP and HTTPS requests and how to interpret the output from the command and I hope I have attained that goal.

--

--

Aditya Kamath
Aditya Kamath

Written by Aditya Kamath

Software Engineer, all things networking, systems, databases and infrastructure

Responses (3)