banner
cos

cos

愿热情永存,愿热爱不灭,愿生活无憾
github
tg_channel
bilibili

Data Structure Study Notes <6> Heap, Huffman Tree, and Disjoint Set

Chapter 1: Heap#

1. What is a Heap#

A heap is an array object that can be seen as a complete binary tree. It has the following properties:

  • The value of any node is the maximum/minimum value among all nodes in its subtree (orderliness).
  • A heap is always a complete binary tree represented by an array.
    Insert image description here

2. Operations of Max Heap#

Definition

typedef struct HeapStruct *MaxHeap;
struct HeapStruct {
	ElementType *Elements; // Array for storing heap elements
	int Size; // Current number of elements
	int Capacity; // Maximum capacity
};

(1) Creating an Empty Max Heap (Create function)#

MaxHeap Create(int MaxSize) {
	MaxHeap H = malloc(sizeof(struct HeapStruct));
	H->Elements = malloc((MaxSize+1) * sizeof(ElementType)); // +1 because we start storing from index 1
	H->Size = 0;
	H->Capacity = MaxSize;
	H->Elements[0] = MaxData; // Index 0 is set as a "sentinel" with a value greater than any possible element in the heap, for later operations
	return H;
}

(2) Inserting into a Max Heap (Insert function)#

When inserting an element, compare it with its parent node. If the inserted element is larger, swap them and continue comparing with the parent node until the inserted element is smaller than the parent node.
Insert image description here

void Insert(MaxHeap H, ElementType item) {
	int i;
	if(IsFull(H)) {
		printf("The max heap is full");
		return;
	}
	i = ++H->Size; // i points to the position of the last element after insertion.
	for(; H->Elements[i/2] < item; i /= 2) { // Continue the loop only when the value of the parent node of item is smaller than item
		H->Elements[i] = H->Elements[i/2]; // Filter down the node
	}
	H->Elements[i] = item; // Insert item
}

(3) Deleting from a Max Heap (Delete function)#

Remove the root node (maximum value) element and delete the root node from the heap, ensuring that the new root node is still the maximum value in the heap.

  • Use the last element in the max heap as the new root node and delete the original last element.
  • Check if the left and right children of the root node are larger than the root node, and continue filtering down.
ElementType DeleteMax(MaxHeap H) {
	int Parent, Child; // Parent node, child node
	ElementType MaxItem, temp;
	if(IsEmpty(H)) {
		printf("The max heap is empty");
		return;
	}
	MaxItem = H->Elements[1]; // Take out the maximum value of the root node and store it in MaxItem
	temp = H->Elements[H->Size--]; // Store the last element, then decrease size
	for (Parent = 1; Parent*2 <= H->Size; Parent = Child) {
		Child = Parent * 2; // Child is the left child of Parent at this time
		if (Child != H->Size && H->Elements[Child] < H->Elements[Child+1]) {
			Child++; // Child becomes the index of the right child only when the right child exists and its value is larger than the left child
		} 
		if (temp >= H->Elements[Child]) break; // temp has found the appropriate position
		else // Replace the parent node with the value of the child node
			H->Elements[Parent] = H->Elements[Child];
	}
	H->Elements[Parent] = temp;
	return MaxItem; // Return the maximum value before deletion
}

(3) Building a Max Heap from Existing Elements#

Place the existing N elements in accordance with the requirements of the max heap in a one-dimensional array.

  • Method 1: Insert N elements one by one into an empty max heap, with a time complexity of up to O(NlogN).
  • Method 2: Build a max heap in linear time complexity.
  • (1) Place N elements in the order of input to satisfy the structural characteristics of a complete binary tree.
  • (2) Adjust the positions of each node to satisfy the orderliness of a max heap.

The time complexity of building a heap is O(n), which is the sum of the heights of each node in the book.
Start from the last node with children, which must have a left child.
Change the Elements array in the definition to Data array to store existing elements.

void PercDown(MaxHeap H, int p) { // Adjust the subtree rooted at H->Data[p] to satisfy the orderliness of a max heap, similar to the delete operation
	int Parent, Child;
	ElementType X;
	X = H->Data[p]; // Take out the value of the root node
	for(Parent = p; Parent*2 <= H->Size; Parent = Child) {
		Child = Parent * 2;
		if( Child != H->Size && H->Data[Child] < H->Data[Child+1]) {
			Child++;
		}
		if(X >= H->Data[Child]) break; // Found the appropriate position
		else 
			H->Data[Parent] = H->Data[Child];
	}
	H->Data[Parent] = X;
}
void BuildHeap(MaxHeap H) { // Adjust the elements in H->Data[] to satisfy the orderliness of a max heap, assuming that all H->Size elements already exist in H->Data[]
	int i;
	// Start from the parent node of the last node, to the root node 1
	for(i = H->Size/2; i > 0; i--)
		PercDown(H,i);
}

Chapter 2: Huffman Tree#

1. What is a Huffman Tree#

Weighted Path Length (WPL): Suppose a binary tree has n leaf nodes, each leaf node has a weight w~k~, and the length from the root node to each leaf node is l~k~. The sum of the weighted path lengths of each leaf node is the WPL: WPL = $\sum_{k=1}^n$w~k~ l~k~.
Optimal Binary Tree, also known as Huffman Tree: The binary tree with the minimum WPL, characterized by:

  • No nodes with a degree of 1
  • A Huffman tree with n leaf nodes has a total of 2n-1 nodes
  • After swapping the left and right subtrees of any non-leaf node of the Huffman tree, it is still a Huffman tree
  • For the same group of weights, there may be different structurally different Huffman trees
    Insert image description here

2. Operations of Huffman Tree#

To construct a Huffman tree, merge the two smallest weighted binary trees each time.
Main problem: How to select the two smallest ones? Using a min heap!

typedef struct TreeNode *HuffmanTree;
struct TreeNode {
	int Weight;
	HuffmanTree Left,Right;
};
HuffmanTree Huffman(MinHeap H) {
	// Assume H->Size weights are already stored in H->Elements[]->Weight
	int i;	HuffmanTree T;
	BuildMinHeap(H); // Adjust H->Elements[] according to weights to form a min heap
	for(i = 1; i < H->Size; i++) { // Merge H->Size-1 times
		T = malloc(sizeof(struct TreeNode)); // Create a new node
		T->Left = DeleteMin(H); // Delete a node from the min heap as the left child of the new T
		T->Right = DeleteMin(H); // Delete a node from the min heap as the right child of the new T
		T->Weight = T->Left->Weight + T->Right->Weight;
		Insert(H,T); // Insert the new T into the min heap
	}
	T = DeleteMin(H);
	return T;
}

3. Application of Huffman Tree - Huffman Coding#

How to encode to minimize the total encoding space?
Characters with higher frequencies should have shorter codes, while characters with lower frequencies can have longer codes, while avoiding ambiguity.
Prefix code: No code for any character is a prefix of another character's code, which avoids ambiguity.
A binary tree can be constructed for encoding, with 0 and 1 as the left and right branches, respectively, and all characters on the leaf nodes.
Insert image description here
Huffman tree can be used!

Chapter 3: Set#

Regarding sets, the main topic is disjoint sets, which has been learned before. This blog post is great: A Lovely Disjoint Set (Original blog is down, reposted)
So I won't go into detail here~
Insert image description here
Insert image description here

~ Disjoint Set Template ~#

#include <iostream>
#include <set>
using namespace std;
const int maxn = 1000000;
int fa[maxn];
int ans[maxn];
void init(int n) {
    for (int i = 1; i <= n; i++) {
        fa[i] = i;
    }
}
int find(int x) { // Find + Path Compression: Find the root node and set the parent node of each node along the way to the root node
    return x == fa[x]? x : (fa[x] = find(fa[x]));
}
inline void merge(int a, int b) {
    fa[find(a)] = find(b);
}
int main() {
    int m, n, k, x;
    cin >> m >> n >> k;
    x = n*m;
    init(x);
    for (int i = 0; i < k; i++) {
        int a,b;
        cin >> a >> b;
        merge(a, b);
    }
    for(int i = 1; i <= x; i++) {
        ans[find(i)] = 1;
    }
    int cnt = 0;
    for (int i = 1; i <= x; i++) {
        if(ans[i] == 1) cnt++;
    }
    cout << cnt << endl;
    return 0;
}
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.