Heap 堆積

  1. 構造為 tree
  2. 每一個親節點對任何一個子節點都保持著某一特定關係
    例:parent </≤/>/≥ children 
  3. 所有的子樹的親子節點也都是相同特定關係的 heap

※ 通常說 heap 指的即是 binary heap

Binary Heap

  1. 符合所有 heap 的條件
  2. 為一 complete binary tree

 

[Python] 使用 heapq 實作 binary heap / priority queue

Python 中的 Heap 定義

  • 為一 list
  • all(heap_x[k] <= heap_x[2*k+1] and heap_x[k] <= heap_x[2*k+2] for all k in range(len(heap_x)))True
    • heap_x[0]min(heap_x) (如果 len(heap_x) > 0)
  • 和 binary tree 的映射
    • heap_x[0] 為 binary tree 的 root (第 0 層)
    • 令 k = (2 ** p – 1) + r, 0 <= r < 2 ** p – 1
      ⇒ heap_x[k] 相當於 binary tree 中第 p 層的第 r 個 node (0-indexed)

Methods

Syntax Description Average Time Complexity

1 heap_x = list_x
2 heapq.heapify(list_x)

1 heap_x = \
heapq.heapify(list_x)

在線性時間內將 list_x 轉為 heap,
且過程不會申請額外記憶體

heapq 預設的 key 是:lambda e: (e[0], e[1], ...))
雖 type 為 list 但表達的是一個 binary heap
故不等同 list_x.sort(key=lambda e: (e[0], e[1], ...))
所以須用 heapq 的 method 取用
而不能直接用 for 迴圈(heapify() 後的 list_x 並非是個遞增序列)

O(len(list_x))

\begin{align}
& \text{Time Complexity of }heappush(heap_k, a) \\
& = O(log_2(len(heap_k))) = O(\text{height of } heap_k) \\
& \text{※ Because } heap_k \text{ is a binary heap}\\
\\
& len_x = \text{length of } list\_x \\
& height_i := \text{height of } heap\_x \text{ when push } i^{th} \text{ element} \\
& H := \text{height of } heap_x = log_2{(len_x)} \\
& \text{※ The height of root of } heap\_x \text{ is } H \text{, the height of leaves of } heap\_x \text{ is 0.} \\
& \text{A heap of size } n \text{ has at most } \frac{n}{2 ^ h} \text{ nodes with height } h. \\
\\
& \text{Time Complexity} \\
& = \sum_{i = 1}^{len_x}{O(height_i)} \\
& < \sum_{h = 0}^{H}{O(h) \cdot \frac{len_x}{2 ^ h}} \\
& = O \left( len_x \cdot \sum_{h = 0}^{H}{\frac{h}{2 ^ h}} \right) = O \left( len_x \cdot \sum_{h = 0}^{H}{h \left( \frac{1}{2} \right) ^ h} \right) \\
\\
\\
& \text{Sum of Infinite Geometric Progression: } \sum_{n = 0}^{\infty}{x ^ n} = \frac{1}{1 – x}, \quad 0 < x < 1 \\
& \text{Let } S = \sum_{n = 0}^{\infty}{n \cdot x ^ n} = x + 2x ^ 2 + 3x ^ 3 + 4x ^ 4 + … \\
& \Rightarrow xS = x ^ 2 + 2 x ^ 3 + 3x ^ 4 + …\\
& \Rightarrow S \ – \ xS = (1 – x)S = x + x ^ 2 + x ^ 3 + x ^ 4 + … = \sum_{n = 1}^{\infty}{x ^ n} = \left( \sum_{n = 0}^{\infty}{x ^ n} \right) – 1 \\
& \quad = \frac{1}{1 – x} \ – \ 1 = \frac{1 – (1 – x)}{1 – x} = \frac{x}{1 – x} \\
\\
\\
& (1 – x)S = \frac{x}{1 – x} \Rightarrow S.= \frac{x}{(1 – x) ^ 2} \\
& \sum_{h = 0}^{H}{h \left( \frac{1}{2} \right) ^ h} < \sum_{h = 0}^{\infty}{h \left( \frac{1}{2} \right) ^ h} = \frac{\frac{1}{2}}{(1 – \frac{1}{2}) ^ 2} = 2 \\
& \Rightarrow \text{Time Complexity} = O \left( len_x \cdot \sum_{h = 0}^{H}{h \left( \frac{1}{2} \right) ^ h} \right) = O(2 \cdot len_x) = O(len_x)
\end{align}

heapq.heappush(
  heap_x, item_a
)
把 item_a 放進 heap_x,
並保持 heap_x 的 heap 性質不變
O(log(len(heap_x)))
≡ O(height of heap_x)
  1. 將 item_a 依序放入下一個 node
  2. 如果新 node 比其 parent node 小
    則將兩者交換
    直到新 node 大於其 parent node
        13      
      25       66    
    32   50          
                 
        13      
      25       66    
    32   50   7      
                 
        13      
      25       7    
    32   50   66      
                   
        7      
      25       13    
    32   50   66      
heapq.heappop(heap_x)
heap_x.heappop()
  • 從 heap_x 取出並回傳最小的元素,
    同時保持其 heap 性質不變
  • 如果 heap 是空的會產生 IndexError 錯誤
  • 只存取最小元素但不取出可以使用 heap[0]
O(log(len(heap_x)))
≡ O(height of heap_x)
heapq.heappushpop(
  heap_x, item_a
)
  • 將 item_a 放入 heap_x ,
    接著從 heap_x 取出並回傳最小的元素
  • 這個組合函式比呼叫 heappush() 之後
    再呼叫 heappop() 更有效率
 

heapq.nlargest(
  n, iterable, key=None
)

heapq.nsmallest(
  n, iterable, key=None
)

  • 從 iterable 所定義的數據集中
    返回前 n 個最大元素組成的列表
  • 等價於 sorted(iterable, key=key, reverse=True/False)[:n]
  • 找出[前 n 大/小]的 elements:
    • n == 1: 使用 min()/max()
    • 較小的 n 值: 使用 heapq.nlargest()/nsmallest()
    • 較大的 n 值: 使用 sorted()
 
heapq._siftdown(
  heap_x, startpos, pos
)

使 heap_x[pos] 一路往上

  1. 若親節點較大則交換
  2. pos <= startpos 時停止 
O(len(heap_x))
heapq._siftup(heap_x, pos)
  1. 將 heap_x[pos] 一路往較小的子節點方向下沉 (交換)
    直至其沉至 leaf ,其位置 := leafpos
  2. 再呼叫 _siftdown(heap_x, pos, leafpos) 
O(len(heap_x))
heap_x[i] = heap_x[-1]qwa
# 把 heap_x[-1] 的值保留在 heap_x[i]
heap_x.pop()
# 再用 list 的 pop() 移除 heap_x[-1]
heapq.heapify(heap_x)
# 重新將 heap_x 給 heap 化
Delete element from heap O(len(heap_x))

heap_x[i] = heap_x[-1]
heap_x.pop()
if i < len(heap_x):
# 只有 i 在 pop() 前等於 (len(heap_x) – 2)
# 即 heap_x[i] 在 pop() 前為倒數第 2 個元素
# 在 pop() 後原本的 heap_x[-1]
# 若仍在同一左/右子樹則仍是 heap_x[-1]
# 但倘若不同則有誤
# ex: [1, 10, 2, 11, 12, 3]

  heapq._siftup(heap_x, i)
  heapq._siftdown(heap_x, 0, i)
# 原本的 heap_x[-1] 就是 leaf
# 先用 _siftup() 讓其下沉為leaf
# 再用 _siftdown() 把它調整到正確位置

O(log(len(heap_x)))

heapq._siftdown(heap, startpos, pos) 和 heapq._siftup(heap, pos)Source Code

# 'heap' is a heap at all indices >= startpos, except possibly for pos.  pos
# is the index of a leaf with a possibly out-of-order value.  Restore the
# heap invariant.
def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too.  We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way.  During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early).  See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples.  Comparisons are thus potentially
# expensive.
#
# On random arrays of length 1000, making this change cut the number of
# comparisons made by heapify() a little, and those made by exhaustive
# heappop() a lot, in accord with theory.  Here are typical results from 3
# runs (3 just to demonstrate how small the variance is):
#
# Compares needed by heapify     Compares needed by 1000 heappops
# --------------------------     --------------------------------
# 1837 cut to 1663               14996 cut to 8680
# 1855 cut to 1659               14966 cut to 8678
# 1847 cut to 1660               15024 cut to 8703
#
# Building the heap by using heappush() 1000 times instead required
# 2198, 2148, and 2219 compares:  heapify() is more efficient, when
# you can use it.
#
# The total compares needed by list.sort() on the same lists were 8627,
# 8627, and 8632 (this should be compared to the sum of heapify() and
# heappop() compares):  list.sort() is (unsurprisingly!) more efficient
# for sorting.

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

 

Ref

Last Updated on 2024/12/14 by A1go

目錄
Bitnami