Data Structure Study Notes <8> Sorting

I. Topological Sorting#

1. Concept Definition#

AOV Network#

For example, suppose a computer science student must complete all the courses listed in Figure 3-4. The prerequisites and subsequent relationships between the courses can be clearly seen in the figure. For instance, the prerequisite for course C5 is C2, and the subsequent courses are C4 and C6. Generally, we refer to a directed graph where vertices represent activities and edges represent the precedence relationships between activities as an Activity On Vertex network, abbreviated as AOV network.
Computer Science

Topological Order, DAG#

If there is a directed path from V to W in the graph, then V must precede W. A vertex sequence that satisfies this condition is called a topological order.
The process of obtaining a topological order is called topological sorting.
If an AOV has a reasonable topological order, it must be a Directed Acyclic Graph (DAG).

2. Topological Sorting Idea#

The idea of topological sorting is to find a vertex with an in-degree of 0 each time and output it, while reducing the in-degree of all adjacent vertices of that vertex by 1.
It can be seen that finding a vertex with an in-degree of 0 is key; if we have to traverse every time, it will definitely consume a lot of time and space. Therefore, a smarter algorithm is to put the vertices that become 0 in-degree into a container at any time.
The pseudocode is described as follows:

void TopSort() {
	int cnt = 0;
    for(each vertex V in the graph) 
		if( Indegree[W] == 0)
			Enqueue(V,Q);
	while(!isEmpty(Q)) {
		V = Dequeue(Q);
		output V, or record the output sequence number of V, cnt++;
        for(each adjacent vertex W of V)
        	if(--Indegree[W] == 0)
	            Enqueue(V,Q);
	}
	if(cnt != |V|)
		Error("The graph has a cycle");
}

Template code:

const int maxn = 1005;
int N,M;//number of vertices, number of edges (activities)
int edge[maxn][maxn];
int mint[maxn];//shortest time to each activity checkpoint
int In[maxn];//in-degree of each activity checkpoint
void init() {
    memset(edge, -1, sizeof(edge));
    memset(mint, 0, sizeof(mint));
    memset(In, 0, sizeof(In));
}
bool Topsort() {//topological sorting
    queue<int> q;
    for(int i = 0; i < N; ++i) {
        if(In[i] == 0) 
            q.push(i);
    }
    int cnt = 0;
    while(!q.empty()) {
        int v = q.front();
        q.pop();
        cnt++;
        for(int i = 0; i < N; ++i) {
            if(v == i || edge[v][i] == -1) continue;//check all edges starting from v
            In[i]--;
            //other operations
            if(In[i] == 0) q.push(i);
        }
    }
    if(cnt != N) return false;
    else return true;
}

Example Problem#

08-Graph 8 How Long Does It Take (25 points)
is a variation of topological sorting; the program is not complicated, and it is recommended to try it;
Problem statement, code, and ideas can be found in the blog:

3. Solving Practical Problems#

Critical Path Problem#

AOE Network (Activity On Edge)#

Generally used to schedule project processes
In the AOE network, activities are represented on the edges, and vertices are divided into three parts: vertex number, earliest completion time, and latest completion time.

First, derive the earliest completion time — mint [ j ] = max( mint[ j ], mint[ i ]+edge[ i ][ j ])#

Insert image description here

Then derive the latest completion time from back to front — maxt[ i ] = min( maxt[ j ], maxt[ j ]-edge[ i ][ j ])#

The slack time can be obtained — D[ i ][ j ] = maxt[ j ] - mint[ i ] - edge[ i ][ j ]#

Insert image description here
The critical path is composed of activities that must not be delayed, which means it is a path with no slack time.

Example Problem#

08-Graph 8 How Long Does It Take (25 points)
is a variation of topological sorting that asks for the earliest completion time.
08-Graph 9 Critical Activities (30 points)
asks for the critical path.
Code and ideas can be found in the blog: PTA Data Structure Problem Set Week 8 - Graph (Part 2)

II. Simple Sorting#

1. Premise#

void X_Sort(ElementType A[], int N);

For simplicity, we discuss sorting integers in ascending order.
N is a positive integer.
Only sorting based on comparisons is discussed (the definitions of >, =, <).
Only internal sorting is discussed.
Stability: The relative positions of any two equal data elements do not change after sorting.
No sorting algorithm performs the best in all cases!

2. Sorting Algorithms#

Test problem: 09-Sorting 1 Sorting (25 points)

Bubble Sort#

It repeatedly visits the list of elements to be sorted, comparing two adjacent elements at a time, and if they are in the wrong order (e.g., from largest to smallest, or from Z to A), it swaps them. The work of visiting the elements is repeated until no adjacent elements need to be swapped, meaning the list is sorted.
The name of this algorithm comes from the fact that smaller elements "bubble" up to the top of the list (in ascending or descending order) through swapping, just like carbon dioxide bubbles in carbonated drinks eventually rise to the top, hence the name "bubble sort."

——Excerpt from Baidu Encyclopedia

void Bubble_Sort(ll a[], int N) {
    for(int P = N-1; P >= 0; P--) {
        bool flag = false;
        //One pass of bubble sort compares from top to bottom; if the upper element is greater than the lower element, they are swapped.
        for(int i = 0; i < P; ++i) {    
            if(a[i] > a[i+1]) {
                swap(a[i], a[i+1]);
                flag = true;
            }
        }
        if(!flag) break; //If already sorted after one pass, no swaps occurred.
    }
}

Time Complexity#

Best case: sorted order, time complexity T = O(N)
Worst case: completely reversed order, time complexity T = O(N^2)

Advantages and Disadvantages#

Advantages: Simple to write, only requires swapping adjacent elements, can sort directly even with a singly linked list, stable (the positions of equal elements remain unchanged).
Disadvantages: High time complexity, slow!

Test Results#

The test results are as follows, with 3 samples failing.
Insert image description here

Insertion Sort#

Insertion sort, also known as direct insertion sort, is an effective algorithm for sorting a small number of elements. It is one of the simplest sorting methods, with the basic idea of inserting a record into an already sorted list, thus creating a new sorted list with one more record. The implementation uses a double loop, where the outer loop iterates over all elements except the first one, and the inner loop searches for the insertion position in the sorted list in front of the current element and moves the elements accordingly.

——Excerpt from Baidu Encyclopedia

void Insertion_Sort(ll a[], int N) {
    for(int P = 1; P < N; P++) {
        ll t = a[P];//pick the next card
        int i;
        for(i = P; i > 0 && a[i-1] > t; --i) 
            a[i] = a[i-1];  //move to create space until the previous element is less than the current element
        a[i] = t;   //place the new card
    }
}

Time Complexity#

Best case: sorted order, time complexity T = O(N)
Worst case: completely reversed order, time complexity T = O(N^2)
In general, the lower bound of time complexity is calculated as follows:
Swapping two adjacent elements eliminates exactly 1 inversion pair.
Let there be I inversion pairs.
Then T(N,I) = O(N+I)

Advantages and Disadvantages#

Advantages: Stable.
Disadvantages: The number of comparisons is not fixed; the fewer comparisons, the more data needs to be moved after the insertion point, especially when the total amount of data is large.

Test Results#

The test results are as follows, quite impressive.
Insert image description here

How to Improve Efficiency#

There is a theorem as follows:

Any sequence of N different elements on average has N(N-1)/4 inversion pairs.
Any algorithm that sorts only by swapping adjacent elements has an average time complexity of O(N^2).

Therefore, to improve algorithm efficiency, we must

Eliminate more than 1 inversion pair each time!
Swap two elements that are farther apart!

Shell Sort#

Utilizes the simplicity of insertion sort while overcoming the limitation that insertion sort can only swap adjacent elements.

Shell sort groups records by a certain increment based on their indices and sorts each group using the direct insertion sort algorithm; as the increment gradually decreases, the number of keywords in each group increases, and when the increment reduces to 1, the entire file is divided into one group, and the algorithm terminates.

——Excerpt from Baidu Encyclopedia

Define the increment sequence D~M~ >D~M-1~>…>D~1~ = 1
Sort for each D~k~ with a "D~k~ interval" (k=M, M-1, …, 1)
Note: The sequence that is "D~k~ interval" ordered remains "D~k~ interval" ordered after performing "D~k-1~ interval" sorting!

Selection of Shell Increment Sequence#

The original Shell sort increment sequence D~M~ = N/2, D~k~ = D~k+1~ / 2
- If the increment elements are not coprime, the smaller increment may not work at all!

void Shell_Sort(ll a[], int N) {
    for(int D = N/2; D > 0; D /= 2) { //Shell increment sequence
        for(int P = D; P < N; ++P) { //insertion sort
            ll t = a[P];
            int i;
            for(i = P; i >= D && a[i-D] > t; i -= D) 
                a[i] = a[i-D];
            a[i] = t;
        }
    }
}

Hibbard increment sequence
- D~k~ = 2^k^-1 —— adjacent elements are coprime
Sedgewick increment sequence, etc.

Advantages and Disadvantages#

Advantages: Fast, less data movement! Suitable for large amounts of data.
Disadvantages: Different choices of increment sequences can lead to differences in algorithm complexity; how to choose the increment sequence can only be based on experience, unstable.

Test Results#

It can be seen that the time taken did not exceed 100ms, and the speed is still quite ideal in these test cases.
Insert image description here

Selection Sort#

Before introducing heap sort, let's first introduce selection sort, an old friend.

void Selection_Sort(ll a[], int N) {
    for(int i = 0; i < N; ++i) {
        int mini = 0;
        ll ans = inf;
        //Find the minimum element after i and assign its position to mini
        for(int j = i; j <= N-1; ++j) {
            if(a[j] < ans) {
                ans = a[j];
                mini = j;
            }
        }
        //Swap the minimum element of the unsorted part to the last position of the sorted part
        swap(a[i], a[mini]);
    }
}

Time Complexity#

Regardless of the situation, the complexity is O(N^2).

Test Results#

The test results are as follows; although they all passed, the last few samples took a long time.
Insert image description here

Heap Sort#

Here, taking ascending order as an example, we need to adjust the original array into a maximum heap starting from index 0, then swap the maximum heap top with the current last element (equivalent to deleting the maximum heap top) and adjust.

void swap(ll& x, ll& y) {
    ll t = x;
    x = y;
    y = t;
}
void PercDown(ll a[], int N, int rt) {
    //Adjust the subtree with a[rt] as the root into a maximum heap within an array of N elements.
    int father, son;
    ll tmp = a[rt];
    for(father = rt; (father*2+1) < N; father = son) {
        son = father * 2 + 1;//left child
        if(son != N-1 && a[son] < a[son+1]) //right child exists and is greater than the left child
            son++;
        if(tmp >= a[son]) break;//found the place to put
        else a[father] = a[son];//sift down
    }
    a[father] = tmp;
}
inline void BuildHeap(ll a[], int N) {
    for(int i = N/2-1; i >= 0; --i) {
        PercDown(a, N, i);
    }
}
void Heap_Sort(ll a[], int N) {
    BuildHeap(a, N);
    for(int i = N-1; i > 0; --i) {
        swap(a[0], a[i]);//swap the maximum heap top a[0] with a[i]
        PercDown(a, i, 0);//adjust after deletion
    }
}

Time Complexity#

Heap sort provides the best average time complexity.
Best case O(nlogn)
Worst case O(nlogn)
Average time complexity O(nlogn)

Advantages and Disadvantages#

Advantages: Fast! Even in the worst case, performance is excellent, and it uses little auxiliary space.
Disadvantages: Unstable, not suitable for sorting objects.

Test Results#

The test results are as follows, seemingly more powerful than Shell sort~
Insert image description here

Merge Sort#

The core is the merging of ordered subsequences; here is the recursive implementation version~ for the non-recursive implementation, see Merge Sort Iterative Implementation

void Merge(ll a[], int s, int m, int e, ll tmp[]) {
    //Merge the local a[s,m] and a[m+1,e] of array a into array tmp, ensuring tmp is ordered.
    //Then copy back to a[s,m]. Time complexity O(e-m+1), i.e., O(n);
    int pb = s;//pb is the index of the tmp array
    int p1 = s, p2 = m+1;//p1 points to the first half, p2 points to the second half
    while (p1 <= m && p2 <= e) {
        if (a[p1] < a[p2])
            tmp[pb++] = a[p1++];
        else 
            tmp[pb++] = a[p2++];
    }
    while(p1 <= m) 
        tmp[pb++] = a[p1++];
    while(p2 <= e) 
        tmp[pb++] = a[p2++];
    for (int i = 0; i < e-s+1; ++i)
        a[s+i] = tmp[i];
}
void MergeSort(ll a[], int s, int e, ll tmp[]) {
    if (s < e) {//if s>=e do nothing
        int m = s + (e-s)/2;
        MergeSort(a, s, m, tmp);//sort the first half
        MergeSort(a, m+1, e, tmp);//sort the second half
        Merge(a, s, m, e, tmp);//merge the two arrays a from s to m and m+1 to e in order
    }
}

Time Complexity#

Best case O(nlogn)
Worst case O(nlogn)
Average time complexity O(nlogn)

Advantages and Disadvantages#

Advantages: Stable, fast.
Disadvantages: Requires more space.

Test Results#

The test results are as follows; savor it.
Insert image description here

Quick Sort#

Set k = a[0], move k to the appropriate position so that all elements smaller than k are on the left and all elements larger than k are on the right (completed in O(n) time).

Quickly sort the left part of k.

Quickly sort the right part of k.
k is the pivot.

void QuickSort(ll a[], int s, int e){//sort a[s,e] quickly
    if (s >= e)
        return;
    int k = a[s];
    int i = s,j = e;
    while (i != j) {
        while (j > i && a[j] >= k)  --j;
        swap(a[i],a[j]);
        while (i < j && a[i] <= k)  ++i;
        swap(a[i],a[j]);
    }//After processing, a[i] = k;
    QuickSort(a, s, i-1);//quick sort the left part
    QuickSort(a, i+1, e);//quick sort the right part
}

Time Complexity#

Best case: perfectly balanced O(nlogn)
Worst case O(N^2)
Average time complexity O(nlogn)

Advantages and Disadvantages#

Advantages: The fastest algorithm among all internal sorts.
Disadvantages: Unstable, slower in the worst case!

Test Results#

The test results are as follows:
Insert image description here

Table Sort#

When the data volume is large and the elements to be sorted are objects, and the time required for movement is particularly high, we need indirect sorting.
Define a pointer array as a "table" to record the elements to be sorted.

Time Complexity#

Best case: initially ordered.
Worst case: there are N/2 cycles, each containing 2 elements; swapping two elements requires three steps, needing 3N/2 element movements.
T = O(m N), where m is the time to copy each element of A.

Radix Sort (a generalization of bucket sort)#

All the algorithms discussed previously require comparisons, and in the worst case, they all have Nlogn; can it be faster?
Suppose we have N students, and their scores are integers between 0 and 100 (thus there are M = 101 different score values); how can we sort the students by score in linear time?
LSD (Least Significant Digit) first, MSD (Most Significant Digit) second

The code for radix sort is referenced from this blog, which uses a very clever method: Radix Sort

ll getMax(ll a[], int n) {//find the maximum number in the array a of n elements
    int maxx = a[0];
    for(int i = 1; i < n; ++i) {
        if(a[i] > maxx) maxx = a[i];
    }
    return maxx;
}
void radixsort(ll a[], int n, int exp) { //sort the array a of n elements according to "a certain digit" (bucket sort), base 10
    ll tmp[maxn];
    ll T[20] = {0}; //buckets for decimal with negative numbers
    for(int i = 0; i < n; ++i) //T stores how many numbers are in each bucket
        T[(a[i]/exp)%10 + 10]++;
    for(int i = 1; i < 20; ++i) //make T's values the positions in tmp
        T[i] += T[i-1];
    for(int i = n - 1; i >= 0; --i) {
        int now = T[(a[i]/exp)%10 + 10];//the position this number should be in
        tmp[now-1] = a[i];
        T[(a[i]/exp)%10 + 10]--;
    }
    for(int i = 0; i < n; ++i)
        a[i] = tmp[i]; //assign the sorted tmp back to a
}
void Radix_Sort(ll a[], int n) {
    ll maxnum = getMax(a, n);
    for(int exp = 1; maxnum/exp > 0; exp *= 10) 
        radixsort(a, n, exp);
}

Time Complexity#

Let N be the number of elements to be sorted, and B be the number of buckets.
O(P(N+B)) where P is the number of passes; one pass of distribution takes O(N) time, and one pass of collection takes O(B) time, for a total of P passes of distribution and collection.

Advantages and Disadvantages#

Advantages: Suitable for cases with few digits and where the maximum digit of the list to be sorted is not particularly large, fast.
Disadvantages: Space for time.

Test Results#

The test results are as follows, super fast~
Insert image description here

III. Comparison of Sorting Algorithms#

Insert image description here