summaryrefslogtreecommitdiff
path: root/public/http-notes.html
diff options
context:
space:
mode:
Diffstat (limited to 'public/http-notes.html')
-rw-r--r--public/http-notes.html110
1 files changed, 87 insertions, 23 deletions
diff --git a/public/http-notes.html b/public/http-notes.html
index b99b643..99b13c4 100644
--- a/public/http-notes.html
+++ b/public/http-notes.html
@@ -2,25 +2,45 @@
<html lang="en">
<head>
<meta charset="utf-8">
- <title>Notes on subtleties of HTTP implementation — Luke Shumaker</title>
+ <title>Notes on subtleties of HTTP implementation — Luke T. Shumaker</title>
+ <meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="assets/style.css">
<link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/>
</head>
<body>
-<header><a href="/">Luke Shumaker</a> » <a href=/blog>blog</a> » http-notes</header>
+<header><a href="/">Luke T. Shumaker</a> » <a href=/blog>blog</a> » http-notes</header>
<article>
-<h1 id="notes-on-subtleties-of-http-implementation">Notes on subtleties of HTTP implementation</h1>
-<p>I may add to this as time goes on, but I’ve written up some notes on subtleties HTTP/1.1 message syntax as specified in RFC 2730.</p>
-<h2 id="why-the-absolute-form-is-used-for-proxy-requests">Why the absolute-form is used for proxy requests</h2>
-<p><a href="https://tools.ietf.org/html/rfc7230#section-5.3.2">RFC7230§5.3.2</a> says that a (non-CONNECT) request to an HTTP proxy should look like</p>
+<h1 id="notes-on-subtleties-of-http-implementation">Notes on subtleties
+of HTTP implementation</h1>
+<p>I may add to this as time goes on, but I’ve written up some notes on
+subtleties HTTP/1.1 message syntax as specified in RFC 2730.</p>
+<h2 id="why-the-absolute-form-is-used-for-proxy-requests">Why the
+absolute-form is used for proxy requests</h2>
+<p><a
+href="https://tools.ietf.org/html/rfc7230#section-5.3.2">RFC7230§5.3.2</a>
+says that a (non-CONNECT) request to an HTTP proxy should look like</p>
<pre><code>GET http://authority/path HTTP/1.1</code></pre>
<p>rather than the usual</p>
<pre><code>GET /path HTTP/1.1
Host: authority</code></pre>
-<p>And doesn’t give a hint as to why the message syntax is different here.</p>
-<p><a href="https://parsiya.net/blog/2016-07-28-thick-client-proxying---part-6-how-https-proxies-work/#3-1-1-why-not-use-the-host-header">A blog post by Parsia Hakimian</a> claims that the reason is that it’s a legacy behavior inherited from HTTP/1.0, which had proxies, but not the Host header field. Which is mostly true. But we can also realize that the usual syntax does not allow specifying a URI scheme, which means that we cannot specify a transport. Sure, the only two HTTP transports we might expect to use today are TCP (scheme: http) and TLS (scheme: https), and TLS requires we use a CONNECT request to the proxy, meaning that the only option left is a TCP transport; but that is no reason to avoid building generality into the protocol.</p>
-<h2 id="on-taking-short-cuts-based-on-early-header-field-values">On taking short-cuts based on early header field values</h2>
-<p><a href="https://tools.ietf.org/html/rfc7230#section-3.2.2">RFC7230§3.2.2</a> says:</p>
+<p>And doesn’t give a hint as to why the message syntax is different
+here.</p>
+<p><a
+href="https://parsiya.net/blog/2016-07-28-thick-client-proxying---part-6-how-https-proxies-work/#3-1-1-why-not-use-the-host-header">A
+blog post by Parsia Hakimian</a> claims that the reason is that it’s a
+legacy behavior inherited from HTTP/1.0, which had proxies, but not the
+Host header field. Which is mostly true. But we can also realize that
+the usual syntax does not allow specifying a URI scheme, which means
+that we cannot specify a transport. Sure, the only two HTTP transports
+we might expect to use today are TCP (scheme: http) and TLS (scheme:
+https), and TLS requires we use a CONNECT request to the proxy, meaning
+that the only option left is a TCP transport; but that is no reason to
+avoid building generality into the protocol.</p>
+<h2 id="on-taking-short-cuts-based-on-early-header-field-values">On
+taking short-cuts based on early header field values</h2>
+<p><a
+href="https://tools.ietf.org/html/rfc7230#section-3.2.2">RFC7230§3.2.2</a>
+says:</p>
<blockquote>
<pre><code>The order in which header fields with differing field names are
received is not significant. However, it is good practice to send
@@ -29,40 +49,84 @@ requests and Date on responses, so that implementations can decide
when not to handle a message as early as possible.</code></pre>
</blockquote>
<p>Which is great! We can make an optimization!</p>
-<p>This is only a valid optimization for deciding to <em>not handle</em> a message. You cannot use it to decide to route to a backend early based on this. Part of the reason is that <a href="https://tools.ietf.org/html/rfc7230#section-5.4">§5.4</a> tells us we must inspect the entire header field set to know if we need to respond with a 400 status code:</p>
+<p>This is only a valid optimization for deciding to <em>not handle</em>
+a message. You cannot use it to decide to route to a backend early based
+on this. Part of the reason is that <a
+href="https://tools.ietf.org/html/rfc7230#section-5.4">§5.4</a> tells us
+we must inspect the entire header field set to know if we need to
+respond with a 400 status code:</p>
<blockquote>
<pre><code>A server MUST respond with a 400 (Bad Request) status code to any
HTTP/1.1 request message that lacks a Host header field and to any
request message that contains more than one Host header field or a
Host header field with an invalid field-value.</code></pre>
</blockquote>
-<p>However, if I decide not to handle a request based on the Host header field, the correct thing to do is to send a 404 status code. Which implies that I have parsed the remainder of the header field set to validate the message syntax. We need to parse the entire field-set to know if we need to send a 400 or a 404. Did this just kill the possibility of using the optimization?</p>
-<p>Well, there are a number of “A server MUST respond with a XXX code if” rules that can all be triggered on the same request. So we get to choose which to use. And fortunately for optimizing implementations, <a href="https://tools.ietf.org/html/rfc7230#section-3.2.5">§3.2.5</a> gave us:</p>
+<p>However, if I decide not to handle a request based on the Host header
+field, the correct thing to do is to send a 404 status code. Which
+implies that I have parsed the remainder of the header field set to
+validate the message syntax. We need to parse the entire field-set to
+know if we need to send a 400 or a 404. Did this just kill the
+possibility of using the optimization?</p>
+<p>Well, there are a number of “A server MUST respond with a XXX code
+if” rules that can all be triggered on the same request. So we get to
+choose which to use. And fortunately for optimizing implementations, <a
+href="https://tools.ietf.org/html/rfc7230#section-3.2.5">§3.2.5</a> gave
+us:</p>
<blockquote>
<pre><code>A server that receives a ... set of fields,
larger than it wishes to process MUST respond with an appropriate 4xx
(Client Error) status code.</code></pre>
</blockquote>
-<p>Since the header field set is longer than we want to process (since we want to short-cut processing), we are free to respond with whichever 4XX status code we like!</p>
+<p>Since the header field set is longer than we want to process (since
+we want to short-cut processing), we are free to respond with whichever
+4XX status code we like!</p>
<h2 id="on-normalizing-target-uris">On normalizing target URIs</h2>
-<p>An implementer is tempted to normalize URIs all over the place, just for safety and sanitation. After all, <a href="https://tools.ietf.org/html/rfc3986#section-6.1">RFC3986§6.1</a> says it’s safe!</p>
-<p>Unfortunately, most URI normalization implementations will normalize an empty path to “/”. Which is not always safe; <a href="https://tools.ietf.org/html/rfc7230#section-2.7.3">RFC7230§2.7.3</a>, which defines this “equivalence”, actually says:</p>
+<p>An implementer is tempted to normalize URIs all over the place, just
+for safety and sanitation. After all, <a
+href="https://tools.ietf.org/html/rfc3986#section-6.1">RFC3986§6.1</a>
+says it’s safe!</p>
+<p>Unfortunately, most URI normalization implementations will normalize
+an empty path to “/”. Which is not always safe; <a
+href="https://tools.ietf.org/html/rfc7230#section-2.7.3">RFC7230§2.7.3</a>,
+which defines this “equivalence”, actually says:</p>
<blockquote>
<pre><code> When not being used in
absolute form as the request target of an OPTIONS request, an empty
path component is equivalent to an absolute path of &quot;/&quot;, so the
normal form is to provide a path of &quot;/&quot; instead.</code></pre>
</blockquote>
-<p>Which means we can’t use the usual normalization implementation if we are making an OPTIONS request!</p>
-<p>Why is that? Well, if we turn to <a href="https://tools.ietf.org/html/rfc7230#section-5.3.4">§5.3.4</a>, we find the answer. One of the special cases for when the request target is not a URI, is that we may use “*” as the target for an OPTIONS request to request information about the origin server itself, rather than a resource on that server.</p>
-<p>However, as discussed above, the target in a request to a proxy must be an absolute URI (and <a href="https://tools.ietf.org/html/rfc7230#section-5.3.2">§5.3.2</a> says that the origin server must also understand this syntax). So, we must define a way to map “*” to an absolute URI.</p>
-<p>Naively, one might be tempted to use “/*” as the path. But that would make it impossible to have a resource actually named “/*”. So, we must define a special case in the URI syntax that doesn’t obstruct a real path.</p>
-<p>If we didn’t have this special case in the URI normalization rules, and we handled the “/” path as the same as empty in the OPTIONS handler of the last proxy server, then it would be impossible to request OPTIONS for the “/” resources, as it would get translated into “*” and treated as OPTIONS for the entire server.</p>
+<p>Which means we can’t use the usual normalization implementation if we
+are making an OPTIONS request!</p>
+<p>Why is that? Well, if we turn to <a
+href="https://tools.ietf.org/html/rfc7230#section-5.3.4">§5.3.4</a>, we
+find the answer. One of the special cases for when the request target is
+not a URI, is that we may use “*” as the target for an OPTIONS request
+to request information about the origin server itself, rather than a
+resource on that server.</p>
+<p>However, as discussed above, the target in a request to a proxy must
+be an absolute URI (and <a
+href="https://tools.ietf.org/html/rfc7230#section-5.3.2">§5.3.2</a> says
+that the origin server must also understand this syntax). So, we must
+define a way to map “*” to an absolute URI.</p>
+<p>Naively, one might be tempted to use “/*” as the path. But that would
+make it impossible to have a resource actually named “/*”. So, we must
+define a special case in the URI syntax that doesn’t obstruct a real
+path.</p>
+<p>If we didn’t have this special case in the URI normalization rules,
+and we handled the “/” path as the same as empty in the OPTIONS handler
+of the last proxy server, then it would be impossible to request OPTIONS
+for the “/” resources, as it would get translated into “*” and treated
+as OPTIONS for the entire server.</p>
</article>
<footer>
-<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
-<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
+ <aside class="sponsor"><p>I'd love it if you <a class="em"
+ href="/sponsor/">sponsored me</a>. It will allow me to continue
+ <a class="em" href="/imworkingon/">my work</a> on the GNU/Linux
+ ecosystem. Thanks!</p></aside>
+
+<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@lukeshu.com">Luke T. Shumaker</a>.</p>
+<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> license.</p>
</footer>
</body>
</html>