05-树9 Huffman Codes (30 分)

    xiaoxiao2022-07-14  158

    05-树9 Huffman Codes (30 分)

    In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

    Input Specification:

    Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

    c[1] f[1] c[2] f[2] … c[N] f[N] where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

    c[i] code[i] where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0’s and '1’s.

    Output Specification:

    For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

    Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

    Sample Input:

    7 A 1 B 1 C 1 D 3 E 3 F 6 G 6 4 A 00000 B 00001 C 0001 D 001 E 01 F 10 G 11 A 01010 B 01011 C 0100 D 011 E 10 F 11 G 00 A 000 B 001 C 010 D 011 E 100 F 101 G 110 A 00000 B 00001 C 0001 D 001 E 00 F 10 G 11

    Sample Output:

    Yes Yes No No

    Note

    总体思路:利用堆实现哈夫曼建树过程中最小元素的删除,计算WPL与测试数据是否相同,若相同且测试数据符合前缀编码,则Yes

    WPL的计算:树的带权路径长度等于叶子节点的带权路径长度之和 ->等于非叶子结点的权值之和->等于除根节点以外所有节点的权值之和(第二个推导较好理解,所有叶节点加起来等于根节点) 第一个推导举例:

    WPL = 12 * 1 + 1 * 4 + 2 * 4 + 4 * 3 + 5 * 3 + 6 * 3 = 69 WPL = 30+18+7+11+3 =(12+1+2+4+5+6)+(1+2+4+5+6)+(1+2+4)+(5+6)+(1+2)= 69

    判断是否是前缀编码通过两个字符串指针一直往后移,指导两指针内容不同,若此时一个字符串指针指向了结尾,便判断不是前缀变(借鉴他人的,总感觉能够优化)

    犯的错误:(1)Delete函数忘记else,导致错了很长时间 (2) i+1没保证 <= size 导致问题又出了很长时间

    Code

    #include<iostream> #include<cstring> #include<map> using namespace std; const int MAX = 1e5 + 10; int size = 0; int heap[MAX]; void insert(int data){ int i; for( i = ++size; data < heap[i/2]; i /= 2) heap[i] = heap[i/2]; heap[i] = data; } int Delete(void){ int temp = heap[1], i; int last = heap[size]; heap[size--] = 0; for(i = 2; i <= size; i *= 2){ if(heap[i] > heap[i+1] && i + 1 <= size) i++; if(heap[i] < last) heap[i/2] = heap[i]; else break; } heap[i/2] = last; return temp; } int isPrefix(char *s1, char *s2) { while (s1 && s2 && *s1 == *s2) ++s1, ++s2; if (*s1 == '\0' || *s2 == '\0') return 1; else return 0; } int hasPrefixCode(char s[][200], int n) { for (int i = 0; i < n; ++i) for (int j = i + 1; j < n; ++j) if (isPrefix(s[i], s[j])) //s[i], s[j]都是字符串 return 1; return 0; } int main() { int num, Wpl = 0; heap[0] = -1; cin >> num; map<char, int> m; for(int i = 0; i < num; i++){ int temp; char name; cin >> name >> temp; m[name] = temp; insert(temp); } int heapsize = size; for(int i = 1; i < heapsize; i++){ int temp1, temp2; temp1 = Delete(); temp2 = Delete(); insert(temp1 + temp2); Wpl += temp1 + temp2; //666 } int checknum; cin >> checknum; while(checknum--){ char ch[256]; char s[256][200]; int thisWPL = 0; for (int i = 0; i < num; ++i) { scanf("\n%c %s", &ch[i], s[i]); thisWPL += m[ch[i]] * strlen(s[i]); } if (thisWPL == Wpl && !hasPrefixCode(s, num)) printf("Yes\n"); else printf("No\n"); } return 0; }
    最新回复(0)