From 82b66f23e0da5ebe98638fd4bb320b45ab31a927 Mon Sep 17 00:00:00 2001 From: Luke Shumaker Date: Fri, 30 Sep 2016 19:16:34 -0400 Subject: http-notes: proof read --- public/http-notes.md | 73 ++++++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 39 deletions(-) diff --git a/public/http-notes.md b/public/http-notes.md index 5f8021b..520639d 100644 --- a/public/http-notes.md +++ b/public/http-notes.md @@ -43,42 +43,38 @@ protocol. > requests and Date on responses, so that implementations can decide > when not to handle a message as early as possible. -I took that as a notice that I can use the first Host or similar -header to quickly route along to my sub-component before I've parsed -the entire header field set. +Which is great! We can make an optimization! -However, it later states in [§5.4][RFC7230§5.4]: +This is only a valid optimization for deciding to *not handle* a +message. You cannot use it to decide to route to a backend early +based on this. Part of the reason is that [§5.4][RFC7230§5.4] tells +us we must inspect the entire header field set to know if we need to +respond with a 400 status code: > A server MUST respond with a 400 (Bad Request) status code to any > HTTP/1.1 request message that lacks a Host header field and to any > request message that contains more than one Host header field or a > Host header field with an invalid field-value. -Which means that I must parse the entire header field set. - -However, if I look a bit closer at §3.2.2, I see that this short-cut -is only valid for deciding to *not handle* a message; if I am handling -it, I cannot use this short-cut. - -Except that if I decide not to handle a request based on the Host -header field, the correct thing to do is to send a 404 status code. -Which implies that I have parsed the remainder of the header field set -to validate the message syntax. Oh no, what do I do? +However, if I decide not to handle a request based on the Host header +field, the correct thing to do is to send a 404 status code. Which +implies that I have parsed the remainder of the header field set to +validate the message syntax. We need to parse the entire field-set to +know if we need to send a 400 or a 404. Did this just kill the +possibility of using the optimization? Well, there are a number of "A server MUST respond with a XXX code if" rules that can all be triggered on the same request. So we get to -choose which to use. - -And fortunately for optimizing implementations, +choose which to use. And fortunately for optimizing implementations, [§3.2.5][RFC7230§3.2.5] gave us: > A server that receives a ... set of fields, > larger than it wishes to process MUST respond with an appropriate 4xx > (Client Error) status code. -And since the header field set is longer than we want to process -(since we want to short-cut processing), we are free to respond with -whichever 4XX status code we like! +Since the header field set is longer than we want to process (since we +want to short-cut processing), we are free to respond with whichever +4XX status code we like! # On normalizing target URIs @@ -86,40 +82,39 @@ An implementer is tempted to normalize URIs all over the place, just for safety and sanitation. After all, [RFC3986§6.1][] says it's safe! -Unfortunately, most URI normalizers implementations will normalize an -empty path to "/". Which is not always save; -[RFC7230§2.7.3][], which defines this -"equivalence", actually says: +Unfortunately, most URI normalization implementations will normalize an +empty path to "/". Which is not always safe; [RFC7230§2.7.3][], which +defines this "equivalence", actually says: > When not being used in > absolute form as the request target of an OPTIONS request, an empty > path component is equivalent to an absolute path of "/", so the > normal form is to provide a path of "/" instead. -Which means we can't use the usual normalizer implementation if we are -making an OPTIONS request! +Which means we can't use the usual normalization implementation if we +are making an OPTIONS request! -Why is that? Well, if we turn to [§5.3.4][RFC7230§5.3.4], we -find the answer. One of the special cases for when the request target -is not a URI, is that we may use "\*" as the target for an OPTIONS -request to request information about the origin server itself, rather -than a resource on that server. +Why is that? Well, if we turn to [§5.3.4][RFC7230§5.3.4], we find the +answer. One of the special cases for when the request target is not a +URI, is that we may use "\*" as the target for an OPTIONS request to +request information about the origin server itself, rather than a +resource on that server. However, as discussed above, the target in a request to a proxy must -be an absolute URI (and [§5.3.2][RFC7230§5.3.2] says that the -origin server must also understand this syntax). So, we must define a -way to map "\*" to an absolute URI. +be an absolute URI (and [§5.3.2][RFC7230§5.3.2] says that the origin +server must also understand this syntax). So, we must define a way to +map "\*" to an absolute URI. Naively, one might be tempted to use "/\*" as the path. But that would make it impossible to have a resource actually named "/\*". So, we must define a special case in the URI syntax that doesn't obstruct a real path. -If we didn't have this special case in the URI normalizer, and we -handled the "/" path as the same as empty in the OPTIONS handler of -the last proxy server, then it would be impossible to request OPTIONS -for the "/" resources, as it would get translated into "\*" and -treated as OPTIONS for the entire server. +If we didn't have this special case in the URI normalization rules, +and we handled the "/" path as the same as empty in the OPTIONS +handler of the last proxy server, then it would be impossible to +request OPTIONS for the "/" resources, as it would get translated into +"\*" and treated as OPTIONS for the entire server. [RFC3986§6.1]: https://tools.ietf.org/html/rfc3986#section-6.1 [RFC7230§2.7.3]: https://tools.ietf.org/html/rfc7230#section-2.7.3 -- cgit v1.2.3