banner
cos

cos

愿热情永存,愿热爱不灭,愿生活无憾
github
tg_channel
bilibili

Youth Training Camp | "Introduction to Web Multimedia" Notes

History of Web Multimedia#

  • PC Era: Playback plugins like Flash, rich clients.
  • Mobile Internet Era: Flash gradually phased out, HTML5 emerged, but its support for video formats is limited.
  • Media Source Extensions, supporting various video formats.

Basic Knowledge#

Encoding Formats#

Basic Concepts of Images#

  • Image Resolution: Used to determine the pixel data that makes up an image, referring to the number of pixels in the horizontal and vertical directions of the image.
  • Image Depth: Refers to the number of bits required to store each pixel. Image depth determines the possible number of colors or possible grayscale levels for each pixel.
    • For example, a color image represents each pixel with R, G, and B components, each component uses 8 bits, making the pixel depth 24 bits, which can represent 2 to the power of 24 colors, or 16777216 colors;
    • A monochrome image requires 8 bits to store each pixel, so the image's pixel depth is 8 bits, with a maximum grayscale number of 2 to the power of 8, which is 256.
  • Image resolution and image depth together determine the size of the image.

Basic Concepts of Video#

  • Resolution: The resolution of each frame of the image.
  • Frame Rate: The number of video frames contained in a unit time.
  • Bit Rate: Refers to the amount of data transmitted per unit time in the video, usually expressed in kbps, which means kilobits per second.
  • Resolution, frame rate, and bit rate together determine the size of the video.

Classification of Video Frames#

I-frames, P-frames, B-frames

I-frame (Intra-coded frame): An independent frame that contains all information, can be decoded independently without relying on other frames.

P-frame (Predictive-coded frame): Can only be encoded by referencing previous I-frames or P-frames.

B-frame (Bidirectionally predictive-coded frame): Depends on both previous and subsequent frames, representing the difference between this frame and the frames before and after it.

1 -> 2 -> 3 ->.....

image.png

DTS (Decode Time Stamp): Determines when the bitstream starts being sent to the decoder for decoding.

PTS (Presentation Time Stamp): Determines when the decoded video frame is displayed.

In the absence of B-frames, the order of DTS and PTS should be the same.

GOP (Group of Pictures)#

The interval between two I-frames, usually between 2 to 4 seconds.

image.png

If there are many I-frames, the video will be larger.

Why Encode?#

Video resolution: 1920 × 1080

So the size of one image in the video: 1920 × 1080 × 24/8 = 6220800 Bytes (5.2M)

Thus, a video with a frame rate of 30 FPS and a duration of 90 minutes would occupy: 933G, too large!

Not to mention the higher 60 FPS...

What does encoding compress?

  • First, spatial redundancy:

image.png

  • Temporal redundancy: ↓ Only the position of the ball has changed; everything else remains unchanged.

image.png

  • Encoding redundancy: For the image shown, blue can be represented by 1 and white by 0 (because there are only these two colors, using a certain Huffman encoding method).

    image.png

  • Visual redundancy

    image.png

Encoding Data Processing Flow#

image.png

Remove spatial and temporal redundancy through prediction -> Transform to remove spatial redundancy.

  • Quantization to remove visual redundancy: Remove things that the visual system cannot easily perceive.
  • Entropy encoding to remove encoding redundancy: Characters that appear frequently require shorter encoding lengths.

Encapsulation Formats#

The above video encoding only stores pure video information.

Encapsulation format: A container that stores audio and video, images, or subtitle information.

image.png

image.png

Multimedia Elements and Extended APIs#

video & audio#

<video> tag is used to embed a media player in HTML or XHTML documents to support video playback within the document.

<!DOCTYPE html>
<html>
<body>
    <video src="./video.mp4" muted autoplay controls width=600 height=300></video>
    <video muted autoplay controls width=600 height=300>
        <source src="./video.mp4"></source>
    </video>
</body>
</html>

<audio> element is used to embed audio content in the document.

<!DOCTYPE html> 
<html>
<body>
    <audio src="./audio.mp3" muted autoplay controls width=600 height=300></audio>
    <audio muted autoplay controls width=600 height=300>
    	<source src="./audio.mp3"></source>
    </audio>
</body>
</html>

HTMLMediaElement

MethodDescription
play()Starts playing audio/video (asynchronously)
pause()Pauses the currently playing audio/video
load()Reloads the audio/video element
canPlayType()Detects whether the browser can play the specified audio/video type
addTextTrack()Adds a new text track to the audio/video
PropertyDescription
autoplaySets or returns whether the video should play automatically after loading.
controlsSets or returns whether audio/video controls are displayed (like play/pause, etc.)
currentTimeSets or returns the current playback position in the audio/video (in seconds)
durationReturns the length of the current audio/video (in seconds)
srcSets or returns the current source of the audio/video element
volumeSets or returns the volume of the audio/video
bufferedReturns a TimeRanges object representing the buffered portion of the audio/video
playbackRateSets or returns the speed at which the audio/video is played.
errorReturns a MediaError object representing the error state of the audio/video
readyStateReturns the current ready state of the audio/video.
......
EventDescription
loadedmetadataTriggered when the browser has loaded the metadata of the audio/video
canplayTriggered when the browser can start playing the audio/video
playTriggered when the audio/video has started or is no longer paused
playingTriggered when the audio/video is ready after being paused or stopped due to buffering
pauseTriggered when the audio/video has been paused
timeupdateTriggered when the current playback position has changed
seekingTriggered when the user starts moving/jumping to a new position in the audio/video
seekedTriggered when the user has moved/jumped to a new position in the audio/video
waitingTriggered when the video stops due to needing to buffer the next frame
endedTriggered when the current playlist has ended
......

Limitations#

  • audio and video do not support direct playback of formats like hls, flv, etc.
  • Requests and loading of video resources cannot be controlled by code, thus the following functionalities cannot be achieved:
    • Segment loading (to save bandwidth)
    • Seamless switching of quality
    • Precise preloading

MSE (Extended API)#

Media Source Extensions (MSE)

  • Play streaming media on the web without plugins

  • Supports playback of video formats such as hls, flv, mp4, etc.

  • Can achieve segmented loading of video, seamless switching of quality, adaptive bitrate, precise preloading, etc.

  • Supported by major browsers, except for Safari on iOS.

image.png

  1. Create a mediaSource instance.
  2. Create a URL pointing to the mediaSource.
  3. Listen for the sourceopen event.
  4. Create a sourceBuffer.
  5. Add data to the sourceBuffer.
  6. Listen for the updateend event.

image.png

  • Player playback flow

image.png

Streaming Media Protocols#

image.png

HLS stands for HTTP Live Streaming, a media streaming protocol based on HTTP proposed by Apple for real-time audio and video streaming. Currently, the HLS protocol is widely used in video on demand and live broadcasting.

Application Scenarios#

image.png

  • Video on demand/live streaming -> Video upload -> Video transcoding
  • Images -> Support for some new image formats
  • Cloud gaming -> No need to download cumbersome clients, running on remote servers, video streams propagate back and forth (high requirements for latency)

Summary and Reflections#

In this lesson, the teacher popularized the basic concepts of Web multimedia technology, such as encoding formats, encapsulation formats, multimedia elements, streaming protocols, etc., and elaborated on various application scenarios of Web multimedia.

Most of the content cited in this article comes from Teacher Liu Liguo's class and MDN.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.