Python中的堆和优先队列

    xiaoxiao2023-11-20  178

    分别是heapq和queue.PriorityQueue这两个模块

    import heapq from queue import PriorityQueue as PQ
    PriorityQueue模块定义如下所示:
    class PriorityQueue(Queue): '''Variant of Queue that retrieves open entries in priority order (lowest first). Entries are typically tuples of the form: (priority number, data). ''' def _init(self, maxsize): self.queue = [] def _qsize(self): return len(self.queue) def _put(self, item): heappush(self.queue, item) def _get(self): return heappop(self.queue)
    他含有一个属性queue,输出队列中每个元素,三个方法,分别是qsize(),代表优先级队列中元素个数,put(),用heappush()方法将元素放入队列,get(),用heappop()方法将元素从队列取出。注意这个库在解答leetcode相关问题时(比如top-K问题,leetcode-215,347等)不能用。
    例子如下
    from queue import PriorityQueue as PQ pq = PQ() pq.put((-2, 'c')) pq.put((-3, 'b')) pq.put((-4,'a')) pq.queue # output: [(-4, 'a'), (-2, 'c'), (-3, 'b')] #只能是最小堆,所以按照元素从小到大(如果元素是list或者tuple,那么以第一个元素的顺序)的顺序出队 pq.get() # output: (-4, 'a') #每当堆首元素出队后,剩下的元素又会重新排成堆 pq.get() #output: [(-3, 'b'), (-2, 'c')] pq.get() # output: (-3,'b') pq.get() # output: (-2,'c') #最后结果为空 pq.queue # output:[]
    heappush模块代码如下,功能:将item添加到当前最小堆最后,并再恢复成最小堆(只是把元素插到堆尾再调整,之前已经是堆了,现在只有最后一个元素可能不满足堆得要求,调整也只会沿着一条路径自底向上,时间复杂度O(lgn))
    def heappush(heap, item): """Push item onto heap, maintaining the heap invariant.""" heap.append(item) _siftdown(heap, 0, len(heap)-1) def _siftdown(heap, startpos, pos): newitem = heap[pos] # Follow the path to the root, moving parents down until finding a place # newitem fits. while pos > startpos: parentpos = (pos - 1) >> 1 parent = heap[parentpos] if newitem < parent: heap[pos] = parent pos = parentpos continue break heap[pos] = newitem
    heappop模块代码如下,功能:当前最小堆中首元素,再将当前堆尾元素放在堆首,之后将这个列表再回复成堆。(只是把元素插到堆首再调整,之前已经是堆了,现在只有第一个元素可能不满足堆得要求,调整也只会沿着一条路径从根节点向下,时间复杂度O(lgn))
    def heappop(heap): """Pop the smallest item off the heap, maintaining the heap invariant.""" lastelt = heap.pop() # raises appropriate IndexError if heap is empty if heap: returnitem = heap[0] heap[0] = lastelt _siftup(heap, 0) return returnitem return lastelt def _siftup(heap, pos): endpos = len(heap) startpos = pos newitem = heap[pos] # Bubble up the smaller child until hitting a leaf. childpos = 2*pos + 1 # leftmost child position while childpos < endpos: # Set childpos to index of smaller child. rightpos = childpos + 1 if rightpos < endpos and not heap[childpos] < heap[rightpos]: childpos = rightpos # Move the smaller child up. heap[pos] = heap[childpos] pos = childpos childpos = 2*pos + 1 # The leaf at pos is empty now. Put newitem there, and bubble it up # to its final resting place (by sifting its parents down). heap[pos] = newitem _siftdown(heap, startpos, pos) def _siftdown(heap, startpos, pos): newitem = heap[pos] # Follow the path to the root, moving parents down until finding a place # newitem fits. while pos > startpos: parentpos = (pos - 1) >> 1 parent = heap[parentpos] if newitem < parent: heap[pos] = parent pos = parentpos continue break heap[pos] = newitem
    heapfiy模块代码如下,功能:在O(len(x))时间复杂度将列表x原地转换成最小堆
    def heapify(x): """Transform list into a heap, in-place, in O(len(x)) time.""" n = len(x) # Transform bottom-up. The largest index there's any point to looking at # is the largest with a child index in-range, so must have 2*i + 1 < n, # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. for i in reversed(range(n//2)): _siftup(x, i)
    这个某块可以直接使用使用,通过import heapq,所以当在leetcode中遇到top-K或类似问题需要用到优先队列时(Python),不需要用优先队列了,直接用堆即可,不过注意python中的heapq只是最小堆,如果需要最大堆,那么在把元素入堆的时候需要加负号。并且最后输出结果的时候也要加负号,然后在逆序输出。

    下面以leetcode-215和leetcode-347题目的用优先队列和堆的解法来进一步了解两者的关系。

    leetcode-215

    class Solution: def findKthLargest(self, nums, k): """ :type nums: List[int] :type k: int :rtype: int """ import heapq #heapq中直接有这个函数 return heapq.nlargest(k, nums)[-1] # 用堆,时间复杂度O(N + klog(N)) class Solution: def findKthLargest(self, nums, k): """ :type nums: List[int] :type k: int :rtype: int """ import heapq # heapify只能创建一个最小堆,所以加负号,但根据题目意思应该用最大堆 nums = [-num for num in nums] heapq.heapify(nums) res = float('inf') for _ in range(k): res = heapq.heappop(nums) return -res
    leetcode-347

    # top-K问题常用解题思路,用优先队列,时间复杂度O(nlgk) # 维护一个含有k个元素的优先队列。如果遍历到的元素比队列中的最小频率元素的频率高,则取出队列中最小频率 # 的元素,将新元素入队。最终,队列中剩下的,就是前k个出现频率最高的元素。 class Solution(object): def topKFrequent(self, nums, k): """ :type nums: List[int] :type k: int :rtype: List[int] """ freq = {} for num in nums: if num not in freq: freq[num] = 1 else: freq[num] += 1 # 扫描数组,维护当前出现频率最高的k个元素 # 在优先队列中,按照频率排序,所以数据对是(频率,元素)的形式 # python的堆只有最小堆,优先队列就是用堆实现的,所以优先队列也只有最小优先队列 from queue import PriorityQueue as PQ pq = PQ() for v, f in freq.items(): if pq.qsize() == k: (fr, va) = pq.get() # pq只是最小优先队列,为了实现最大优先队列,所以需要取负数 if -f > -fr: pq.put((-f, v)) else: pq.put((fr, va)) else: pq.put((-f, v)) res = [] while pq.queue: res.append((pq.get()[1])) return res #上面用优先队列只是为了了解这个模块中优先队列的用法,实际上用起来也不方便,可以看到上面写的很繁琐, 不易懂,而且在leetcode上面提交还会显示ImportError: No module named queue class Solution(object): def topKFrequent(self, nums, k): """ :type nums: List[int] :type k: int :rtype: List[int] """ # 上面的优先队列就是用heapq实现的,所以这里直接使用heapq堆 import heapq freq = {} res = [] for num in nums: if num not in freq: freq[num] = 1 else: freq[num] += 1 # 1.我们需要按照数字出现频率进行排序,所以val在前,key在后 # 2.我们需要最大堆,但是heapq只实现了最小堆,所以加个负号,模拟最大堆 max_heap = [(-val, key) for key, val in freq.items()] heapq.heapify(max_heap) for i in range(k): res.append(heapq.heappop(max_heap)[1]) return res nums = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6, 7] k = 4 print(Solution().topKFrequent(nums, k))

    参考

    最小堆 构建、插入、删除的过程图解 排序算法总结-堆排序

    最新回复(0)