七、排序(4)——qsort()
生活随笔
收集整理的這篇文章主要介紹了
七、排序(4)——qsort()
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、回顧
| 冒泡排序 | O(n2) | √ | √ |
| 插入排序 | O(n2) | √ | √ |
| 選擇排序 | O(n2) | × | √ |
| 快速排序 | O(nlogn) | × | √ |
| 歸并排序 | O(nlogn) | √ | × |
| 計數排序 | O(n+k),k是數據范圍 | √ | × |
| 桶排序 | O(n) | √ | × |
| 基數排序 | O(dn) d是維度 | √ | × |
- 線性排序算法的時間復雜度比較低,適用場景比較特殊。
- 小規模數據進行排序 ==》時間復雜度是 O(n2) 的算法;
- 大規模數據進行排序 ==》時間復雜度是O(nlogn) 的算法 ==》一般也是首選
二、快速排序的優化
關鍵點:合理選擇分區點,來避免時間復雜度退化為O(n2)
理想的分區點:被區分點分開的兩個分區中,數據的數量差不多。
==》常用、簡單的分區算法
1、三數取中法
從區間的首、尾、中間,分別取出一個數,然后進行比較,取三個數的中位數作為分區點。
==》擴展:數據大時,“五數取中”、“十數取中”
2、隨機法
每次從要排序的區間中,隨機選擇一個元素作為分區點。
特點:雖不能保證每次分區點都選的比較好,但是從概率角度分析,不太可能出現每次分區點都很差的情況。
3、警惕堆棧溢出
快速排序使用堆棧實現的。==》 遞歸要警惕堆棧溢出
解決方法:
(1)限制遞歸深度。一旦遞歸過深(超過事先設定的閾值),就停止遞歸。
(2)通過在堆上模擬實現一個函數調用棧,手動模擬遞歸壓棧、出棧的過程,使得系統棧大小沒有限制。
三、qsort()函數的分析
1、源碼
/* Copyright (C) 1991,1992,1996,1997,1999,2004 Free Software Foundation, Inc.This file is part of the GNU C Library.Written by Douglas C. Schmidt (schmidt@ics.uci.edu).The GNU C Library is free software; you can redistribute it and/ormodify it under the terms of the GNU Lesser General PublicLicense as published by the Free Software Foundation; eitherversion 2.1 of the License, or (at your option) any later version.The GNU C Library is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNULesser General Public License for more details.You should have received a copy of the GNU Lesser General PublicLicense along with the GNU C Library; if not, write to the FreeSoftware Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA02111-1307 USA. *//* If you consider tuning this algorithm, you should consult first:Engineering a sort function; Jon Bentley and M. Douglas McIlroy;Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. */#include <alloca.h> #include <limits.h> #include <stdlib.h> #include <string.h>/* Byte-wise swap two items of size SIZE. */ /* 以字節為單位交換兩個SIZE長度數組的SWAP宏 */ #define SWAP(a, b,size) do { register size_t __size = (size);register char *__a = (a), *__b = (b); do{har __tmp = *__a;*__a++ = *__b;*__b++ = __tmp;} while (--__size > 0); } while (0)/* Discontinue quicksort algorithm when partition gets below this size.This particular magic number was chosen to work best on a Sun 4/260. */ /* 當一個數據段的長度小于這個值的時候,將不用快排對其進行排序。因為這個特別的的魔術數在Sun 4/260下的性能最好*/ #define MAX_THRESH 4 /* Stack node declarations used to store unfulfilled partition obligations. */ /* 用來存儲還沒處理的數據段索引*/ typedef struct {char *lo;char *hi; } stack_node;/* The next 4 #defines implement a very fast in-line stack abstraction. */ /* 下面四個宏實現了一個非常快速的棧(的確很快,但是現實中真要這樣做,請確保1. 這是核心代碼。2. 這些代碼百分比正確。要不然出錯了,調試調死你,維護的人罵死你。)*/ /* The stack needs log (total_elements) entries (we could even subtract log(MAX_THRESH)).Since total_elements has type size_t, we get as upper bound for log (total_elements):bits per byte (CHAR_BIT) * sizeof(size_t). */ /* 棧需要log(total_elements)個元素(當然我們也可以在這個基礎上減去log(MAX_THRESH)。)(PS:log(x/y) = log(x) - log(y))因為傳入數據個數total_elements的類型是size_t,所以我們可以把棧元素上限設為:size_t的位數(CHAR_BIT) * sizeof(size_t)。PS:棧用來記錄的是還沒處理的數據段索引,最壞的情況是分開的兩個數據段其中一個已經不用再分,這個時候棧不許要任何記錄。最好的情況是進行log(total_elements)次劃分,此時棧需要記錄log(total_elements)個索引,但是這個算法在一個分片的元素個數小于MAX_THRESH便不再劃分,所以實際只需log(total_elements / MAX_THRESH)個空間。CHAR_BIT在limits.h有定義,意思是一個字節有多少位,因為sizeof是算出一種類型占幾個字節,所以(CHAR_BIT) * sizeof(size_t)是當total_elements去最大值的值,也就是這里棧元素個數的上限。*/ #define STACK_SIZE (CHAR_BIT * sizeof(size_t)) #define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top)) #define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi))) #define STACK_NOT_EMPTY (stack < top)/* Order size using quicksort. This implementation incorporatesfour optimizations discussed in Sedgewick:這個快排的程序實現了Sedgewick書中討論的四個優化,下面從大到小說明:(大概這意思...)1. Non-recursive, using an explicit stack of pointer that store thenext array partition to sort. To save time, this maximum amountof space required to store an array of SIZE_MAX is allocated on thestack. Assuming a 32-bit (64 bit) integer for size_t, this needsonly 32 * sizeof(stack_node) == 256 bytes (for 64 bit: 1024 bytes).Pretty cheap, actually.1. 不用遞歸,用了顯示的指針棧存儲下一段要排序的數據。為了節省時間,為棧申請了最大的存儲空間。假設size_t是一個32位(64位)的整數,這里僅需要 32 * sizeof(stack_node) = 256 bytes(對于64位:1024bytes)。事實上很小。(一個棧節點兩指針,32位就是2 * 4 個字節,64位是8 * 2位字節)2. Chose the pivot element using a median-of-three decision tree. This reduces the probability of selecting a bad pivot value andeliminates certain extraneous comparisons.2. 用中值決策樹選擇關鍵值。這減小了選擇一個差關鍵值的可能性和排除特定的無關的比較。3. Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leavinginsertion sort to order the MAX_THRESH items within each partition.This is a big win, since insertion sort is faster for small, mostlysorted array segments.3. 只用快排對TOTAL_ELEMS / MAX_THRESH個數據段進行了排序,用插入排序對每個數據段的MAX_THRESH個數據進行排序。這是一個很好的改進,因為插入排序在處理小的、基本有序的數據段時比快排更快。 4. The larger of the two sub-partitions is always pushed onto thestack first, with the algorithm then concentrating on thesmaller partition. This *guarantees* no more than log (total_elems)stack size is needed (actually O(1) in this case)!4. 大的數據分段通常先壓入棧內,算法優先處理小的數據分段。這就保證棧的元素不會超過log(total_elems)(事實上這里只用了常數個空間)。 */ void _quicksort (void *const pbase, size_t total_elems, size_t size, __compar_d_fn_t cmp, void *arg) {/* 寄存器指針,最快的指針,當然系統不一定會把它放到寄存器。register只是一種建議。*/register char *base_ptr = (char *) pbase;const size_t max_thresh = MAX_THRESH * size;if (total_elems == 0)/* Avoid lossage with unsigned arithmetic below. */return;if (total_elems > MAX_THRESH){char *lo = base_ptr;char *hi = &lo[size * (total_elems - 1)];/* 因為用了上面棧的宏,所以下面兩個變量的名字一定不能改...*/stack_node stack[STACK_SIZE];stack_node *top = stack;PUSH (NULL, NULL);while (STACK_NOT_EMPTY){char *left_ptr;char *right_ptr;/* Select median value from among LO, MID, and HI. RearrangeLO and HI so the three values are sorted. This lowers theprobability of picking a pathological pivot value and skips a comparison for both the LEFT_PTR and RIGHT_PTR in the while loops. *//* 在數組的第一位、中間一位、最后一位中選出一個中值。同時也會對第一位和最后一位進行排序以達到這三個值都是有序的目的。這降低了選擇一個很爛的關鍵值的可能性,同時也跳過了左指針和右指針在while循環里面的一次比較。*/char *mid = lo + size * ((hi - lo) / size >> 1);if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size);if ((*cmp) ((void *) hi, (void *) mid, arg) < 0)SWAP (mid, hi, size);elsegoto jump_over;if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)SWAP (mid, lo, size); jump_over:;left_ptr = lo + size;right_ptr = hi - size;/* Here's the famous ``collapse the walls'' section of quicksort.Gotta like those tight inner loops! They are the main reasonthat this algorithm runs much faster than others. *//* 這里就是快排中著名的“collapse the walls(推墻??)”。和那些緊湊的內層循環非常像!它們是這個算法比其他算法快的主要原因。PS:了解過快排的應該對下面這一段都比較熟悉,就是把比關鍵值小的移到數組左邊,比關鍵值大的移到數組右邊,把數據分成大小兩段的過程*/do{while ((*cmp) ((void *) left_ptr, (void *) mid, arg) < 0)left_ptr += size;while ((*cmp) ((void *) mid, (void *) right_ptr, arg) < 0)right_ptr -= size;if (left_ptr < right_ptr){SWAP (left_ptr, right_ptr, size);if (mid == left_ptr)mid = right_ptr;else if (mid == right_ptr)mid = left_ptr;left_ptr += size;right_ptr -= size;}else if (left_ptr == right_ptr){left_ptr += size;right_ptr -= size;break;}}while (left_ptr <= right_ptr);/* Set up pointers for next iteration. First determine whetherleft and right partitions are below the threshold size. If so,ignore one or both. Otherwise, push the larger partition'sbounds on the stack and continue sorting the smaller one. *//* 給下次迭代的指針賦值。首先判斷左右兩段數據的元素個數是否小于閾值,如果是,跳過這一個或兩個分段。如若不是,把大的數據段的開始和結束指針入棧,繼續劃分小的數據段。*/if ((size_t) (right_ptr - lo) <= max_thresh){/* 左右兩個數據段的元素都小于閾值,取出棧中數據段進行劃分*/if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore both small partitions. */POP (lo, hi);else/* Ignore small left partition. (只有左邊大于閾值)*/lo = left_ptr;}else if ((size_t) (hi - left_ptr) <= max_thresh)/* Ignore small right partition. (只有右邊大于閾值)*/hi = right_ptr;else if ((right_ptr - lo) > (hi - left_ptr)){/* Push larger left partition indices. *//* 兩個數據段的元素個數都大于閾值,大的入棧,小的繼續劃分。 */PUSH (lo, right_ptr);lo = left_ptr;}else{/* Push larger right partition indices. *//* 兩個數據段的元素個數都大于閾值,大的入棧,小的繼續劃分。 */PUSH (left_ptr, hi);hi = right_ptr;}}}/* Once the BASE_PTR array is partially sorted by quicksort the restis completely sorted using insertion sort, since this is efficientfor partitions below MAX_THRESH size. BASE_PTR points to the beginningof the array to sort, and END_PTR points at the very last element inthe array (*not* one beyond it!). *//* 當數組經過快排的排序后,已經是整體有序了。剩下的排序由插入排序完成,因為數據段小于MAX_THRESH時,插入排序效率更高。此時排序的首指針是數組的首指針,尾指針是數組的尾指針(不是倒數第二個)*/#define min(x, y) ((x) < (y) ? (x) : (y)){char *const end_ptr = &base_ptr[size * (total_elems - 1)];char *tmp_ptr = base_ptr;char *thresh = min(end_ptr, base_ptr + max_thresh);register char *run_ptr;/* Find smallest element in first threshold and place it at thearray's beginning. This is the smallest array element,and the operation speeds up insertion sort's inner loop. *//* 找出第一段的最小一個值并把它放在數組的第一個位置。這是數組的最小元素(用快排排過,應該比較容易理解),這一步可以加入插入排序的內層循環*/for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)if ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr = run_ptr;if (tmp_ptr != base_ptr)SWAP (tmp_ptr, base_ptr, size);/* Insertion sort, running from left-hand-side up to right-hand-side. *//* 從左到右執行插入排序。*/run_ptr = base_ptr + size;while ((run_ptr += size) <= end_ptr){/*上面說的加速內層循環,就是不用在這里判斷*/tmp_ptr = run_ptr - size;/*一直往回找直到找到大于或等于當前元素的元素*/while ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)tmp_ptr -= size;/*把當前元素移到找出元素的后面去*/tmp_ptr += size;if (tmp_ptr != run_ptr){char *trav;trav = run_ptr + size;while (--trav >= run_ptr){char c = *trav;char *hi, *lo;/*這個內層循環只是把每個元素的最后一位移到后面一個元素去*/for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)* hi = *lo;*hi = c;}}}} }2、分析
通過多種排序算法實現的:
- 當數據量較小是,qsort() 會優先使用歸并算法來排序輸入數據(原因:空間復雜度為O(n),空間換時間)
- 當數據量太大,qsort() 使用快速排序,分區點選擇方法:“三數取中法”
- 遞歸太深會導致堆棧溢出問題的解決方法:qsort()實現了一個堆上的棧,手動模擬遞歸來解決該問題。
- 當排序的區間中的元素小于等于4時,qsort()就退化為插入排序,不再使用遞歸來做快速排序。原因:在小規模數據面前,O(n2時間復雜度并不一定比O(nlogn)的算法執行時間長)
- 時間復雜度代表增長趨勢,但是在表示的過程中,我們會省略低階、系數和常數。例:O(knlogn + c)時間復雜度中的 k 和 c 可能還是一個比較大的數。
總結
以上是生活随笔為你收集整理的七、排序(4)——qsort()的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 七、排序(3)——线性排序
- 下一篇: 八、二分查找(Binary Search