banner
cos

cos

愿热情永存,愿热爱不灭,愿生活无憾
github
tg_channel
bilibili

Youth Training Camp | "HTTP Practical Guide"

Introduction to HTTP#

Input URL -> browser process handles input information -> browser kernel initiates request to server -> browser kernel reads response -> browser kernel renders -> browser process page loading completed

image.png

  • Hyper Text Transfer Protocol (HTTP)

  • It is an application layer protocol, based on the transport layer TCP protocol

  • Request, response

  • Simple extensible (custom request headers can be defined as long as both client and server can understand)

  • Stateless

    image.png

Protocol Analysis#

Development History#

image.png

Message Structure#

HTTP/1.1#

image.png

As shown in the figure, you can see the request and response headers, the returned status code, etc.

MethodDescription
GETRequests a representation of a specified resource; GET requests should only be used to retrieve data
POSTUsed to submit an entity to a specified resource, usually resulting in a change in state or side effects on the server
PUTReplaces all current representations of the target resource with the request payload
DELETEDeletes the specified resource
HEADRequests a response identical to that of a GET request, but without a response body (less commonly used)
CONNECTEstablishes a tunnel to the server identified by the target resource. (less commonly used)
OPTIONSUsed to describe the communication options for the target resource.
TRACEPerforms a message loop-back test along the path to the target resource. (less commonly used)
PATCHUsed to apply partial modifications to a resource.
  • Safe: Methods that do not modify server data, such as reading data GET, HEAD, OPTIONS, etc.

  • Idempotent: The effect of executing the same request once is the same as executing it multiple times; the server's state remains the same. All safe methods are idempotent, such as GET, HEAD, OPTIONS, PUT, DELETE, etc.

Status Codes#

image.png

  • 200 OK - Client's request succeeded
  • 301 - Resource (webpage, etc.) has been permanently moved to another URL
  • 302 - Temporary redirect
  • 401 - Unauthorized - Request not authorized
  • 404 - Requested resource does not exist, possibly due to an incorrect URL
  • 500 - An unexpected error occurred on the server
  • 504 Gateway Timeout - The gateway or proxy server could not get the desired response in the allotted time

RESTful API#

A style of API design: REST - Representational State Transfer

  • Each URI represents a resource
  • Between client and server, a certain representation of this resource is passed
  • The client operates on server-side resources through HTTP methods, achieving "representation state transformation".
RequestReturn CodeMeaning
GET /zoos200 OKLists all zoos, server successfully returned
POST /zoos201 CREATEDCreates a new zoo, server creation successful
PUT /zoos/ID400 INVALID REQUESTUpdates information for a specified zoo (providing all information for that zoo); the user's request has an error, and the server did not create or modify data
DELETE /zoos/ID204 NO CONTENTDeletes a specified zoo, deletion successful

Common Request Headers#

Request HeaderDescription
AcceptAcceptable types, indicating the MIME types supported by the browser (corresponding to the Content-Type returned by the server)
Content-TypeThe type of entity content sent by the client
Cache-ControlSpecifies the cache mechanism to be followed by requests and responses, such as no-cache
If-Modified-SinceCorresponds to the server's Last-Modified, used to check if the file has changed, can only be accurate to within 1 second
ExpiresCache control; will not request during this time, directly using the cache, server time
Max-ageRepresents how many seconds the resource should be cached locally, will not request during the valid time, but use the cache
If-None-MatchCorresponds to the server's ETag, used to check if the file content has changed (very precise)
CookieCookies will be automatically sent when accessing the same domain
RefererThe source URL of the page (applies to all types of requests, will be precise to the detailed page address, commonly used for CSRF interception)
OriginWhere the initial request was initiated from (will only be precise to the port), Origin respects privacy more than Referer **
User-AgentNecessary information about the user client, such as UA header, etc.

Common Response Headers#

Response HeaderDescription
Content-TypeThe type of entity content returned by the server
Cache-ControlSpecifies the cache mechanism to be followed by requests and responses, such as no-cache
Last-ModifiedThe last modification time of the requested resource
ExpiresWhen to consider the document expired and no longer cache it
Max-ageHow many seconds the client's local resource should be cached; effective after Cache-Control is enabled
ETagAn identifier for a specific version of the resource, similar to a fingerprint
Set-CookieSets the cookie associated with the page, the server sends the cookie to the client through this header
ServerSome related information about the server
Access-Control-Allow-OriginThe allowed Origin header for requests on the server side (e.g., *)

Caching#

Strong Caching

Use directly if available locally

  • Expires (expiration time), timestamp
  • Cache-Control
    • Cacheability
      • no-cache: Negotiated cache validation
      • no-store: Do not use any cache
      • public, private, etc.
    • Expiration
      • max-age: measured in seconds, the maximum lifespan of stored data, relative to the request time
    • Revalidation *reload
      • must-revalidate: Once the resource expires, it cannot be used until successfully validated with the original server.

Negotiated Caching

Communicate with the server to determine whether to use it

  • Etag/If-None-Match: An identifier for the specific version of the resource, similar to a fingerprint
  • Last-Modified/If-Modified-Since: Last modification time. (absolute)

image.png

Cookies#

Set-Cookie - response

Name=valueVarious cookie names and values
Expires=DateThe validity period of the cookie; by default, the cookie is only valid until the browser is closed.
Path= PathLimits the file directory that specifies the sending range of the cookie, defaulting to the current one
Domain=domainLimits the domain name where the cookie is effective, defaulting to the service domain name that created the cookie
secureThe cookie can only be sent over HTTPS secure connections
HttpOnlyJavaScript scripts cannot access the cookie
SameSite=[None|Strict|Lax]None allows both same-site and cross-site requests; Strict only sends on the same site; allows sending with top-level navigation and with GET requests initiated by third-party websites

Development#

Overview of HTTP/2: Faster, more stable, simpler

  • Frame

    • The smallest unit of communication in HTTP/2, each frame contains a frame header, which at least identifies the data stream to which the current frame belongs.

    • Version 1.0 transmits text, while version 2 transmits binary data, which is more efficient. It also has a new compression algorithm.

    • image.png

  • Message: A complete series of frames corresponding to a logical request or response message.

  • Data Stream: A bidirectional byte stream within an established connection that can carry one or more messages.

    • Interleaved sending, the receiver reorganizes.

      image.png

  • HTTP/2 connections are all permanent, and only one connection is needed for each origin.

  • Flow control: A mechanism to prevent the sender from sending a large amount of data to the receiver.

  • Server push

    • image.png

Overview of HTTPS#

  • HTTPS: Hypertext Transfer Protocol Secure

  • Encrypted via TSL/SSL

  • Symmetric encryption: Both encryption and decryption use the same key

  • Asymmetric encryption: Encryption and decryption require two different keys: a public key and a private key

image.png

Common Scenario Analysis#

Static Resources#

Taking Toutiao as an example, open the network panel to view its requests and find the request for the CSS file.

image.png

You can see that the returned status code is 200, so was a request really initiated? (The parentheses next to it say, from disk cache)

image.png

From the response headers in the above image, we can see:

  • Cache strategy?
    • Strong cache (max-age=xxxxx)
      • Cache-control: calculated to be 1 year
  • Other information?
    • Allows access from all domains (access-control-allow-origin)
    • Resource type: css (content-type)

Static resource solution: cache + CDN + file name hash

  • CDN: Content Delivery Network
  • By judging user proximity and server load, CDN ensures that content is served to user requests in a highly efficient manner.

image.png

With such a long cache period, how can we ensure that the content users receive is up-to-date?

File name hash: when the file content changes, the file name changes/adds a version number, so the cached file cannot match and must be requested again.

Login - Cross-Domain#

image.png

image.png

Cross-domain issues lead to the request method being OPTIONS.

image.png

Protocol, hostname, port differing in any one will cause a cross-domain issue (the default port number for HTTP is 443).

image.png

Solving Cross-Domain Issues#

  • Cross-Origin Resource Sharing (CORS)

    • Cross-Origin Resource Sharing (CORS) is a mechanism based on HTTP headers that allows servers to indicate that resources can be requested from a different origin (domain, protocol, and port) than their own. CORS also includes a mechanism to check whether the server will allow the actual request to be sent, by initiating a "preflight" request to the server hosting the cross-origin resource. In the preflight, the headers sent by the browser indicate the HTTP methods and headers that will be used in the actual request.

      For security reasons, browsers restrict cross-origin HTTP requests initiated from scripts. For example, XMLHttpRequest and the Fetch API adhere to the same-origin policy. This means that web applications using these APIs can only request HTTP resources from the same domain that loaded the application, unless the response includes the correct CORS response headers.

    • Pre-request: To find out if the server allows the cross-origin request (complex request)

    • Related protocol headers

      • access-control-....
  • Proxy Server

    • The same-origin policy is a security policy of the browser, not HTTP.
  • Iframe many inconveniences

image.png

image.png

As shown in the figure, what actions were taken during login?

What information was carried, and what information was returned?

  • Carried information
    • Post body, data format is form
    • Desired data format is json
    • Existing cookies
  • Returned information
    • Data format json
    • Cookie information

So why can the login state be remembered the next time the page is accessed?

Authentication#

  • Session + cookie (most portal websites use this)
    • The user submits a request to the server, including username and password, etc.
    • The server processes and verifies correctness; if correct, it returns a session and sets it in the cookie (Set-Cookie: session = ......)
    • When the user sends again: GET Cookie: session=....
    • The server processes the verification and returns some login information.
  • JWT (JSON Web Token)
    • The server does not store it locally.
    • The returned token is unique, with a short login time, etc.

image.png

image.png

  • SSO: Single Sign-On

image.png

As shown in the figure, it is explained very clearly.

Practical Applications#

XMLHttpRequest - Web API Interface Reference | MDN (mozilla.org)

AJAX** and **XHR#

  • XHR: XMLHttpRequest
  • readyState
0UNSENTThe proxy has been created, but open() has not yet been called.
1OPENEDThe open() method has been called.
2HEADERS_RECEIVEDThe send() method has been called, and the headers and status are available.
3LOADINGDownloading; the responseText property contains some data.
4DONEThe download operation is complete.

AJAX and Fetch#

  • An upgraded version of XMLHttpRequest
  • Uses Promises
  • Modular design, Response, Request, Header objects
  • Supports chunked reading through data stream processing objects

Standard Library in Node: HTTP/HTTPS#

  • Default module, no need to install other dependencies
    Limited functionality / not very user-friendly

Common Request Library: axios#

// Global configuration
axios.defaults.baseURL = "https://api.example.com";
// Add request interceptor
axios.interceptors.request.use(function (config) {
	// Do something before sending the request
	return config;
}, function (error) {
	// Do something with request error
	return Promise.reject(error);
});

// Send request
axios({ 
    method: 'get',
    url: 'http://test.com',
    responseType: 'stream'
}).then(function(response) {
    response.data.pipe(fs.createWriteStream('ada_lovelace.jpg'));
});

Network Optimization#

Learn More#

More than One Choice for HTTP Protocol#

Extension - Communication Methods#

WebSocket#
  • A network technology for full-duplex communication between the browser and server
  • Typical scenario: high real-time requirements, such as chat rooms
  • URL starts with ws:// or wss://

image.png

UDP#

QUIC: Quick UDP Internet Connection based on UDP

  • 0-RTT connection establishment (except for the first connection).
  • Reliable transmission similar to TCP.
  • Encrypted transmission similar to TLS, supporting perfect forward secrecy.
  • User-space congestion control, latest BBR algorithm.
  • Supports stream-based multiplexing similar to h2, but without TCP's HOL problem.
  • Forward error correction (FEC).
  • Connection migration similar to MPTCP.
  • Not many applications yet.

image.png

Summary and Thoughts#

Today, the instructor's gentle voice introduced HTTP and its common protocol analysis, message structure, cache strategy analysis, and explained its specific business scenario usage.

Most of the content cited in this article comes from Teacher Yang Chaonan's class - HTTP Practical Guide.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.