Catalog
  1. 1. Huffman Tree简介
  2. 2. Huffman Tree的构建
  3. 3. Huffman编码
  4. 4. Huffman编码/解码的c++实现
数据结构课设四之赫夫曼编/译码器

Huffman Tree简介

​ 赫夫曼树(Huffman Tree),又称最优二叉树,是一类带权路径长度最短的树。假设有n个权值{w1,w2,…,wn},如果构造一棵有n个叶子节点的二叉树,而这n个叶子节点的权值是{w1,w2,…,wn},则所构造出的带权路径长度最小的二叉树就被称为赫夫曼树。

​ 这里补充下树的带权路径长度的概念。树的带权路径长度指树中所有叶子节点到根节点的路径长度与该叶子节点权值的乘积之和,如果在一棵二叉树中共有n个叶子节点,用Wi表示第i个叶子节点的权值,Li表示第i个也叶子节点到根节点的路径长度,则该二叉树的带权路径长度 WPL=W1L1 + W2L2 + … Wn*Ln。

根据节点的个数以及权值的不同,赫夫曼树的形状也各不相同,赫夫曼树具有如下特性:

  • 对于同一组权值,所能得到的赫夫曼树不一定是唯一的。

  • 赫夫曼树的左右子树可以互换,因为这并不影响树的带权路径长度。

  • 带权值的节点都是叶子节点,不带权值的节点都是某棵子二叉树的根节点。

  • 权值越大的节点越靠近赫夫曼树的根节点,权值越小的节点越远离赫夫曼树的根节点。

  • 赫夫曼树中只有叶子节点和度为2的节点,没有度为1的节点。

  • 一棵有n个叶子节点的赫夫曼树共有2n-1个节点。

    Huffman Tree的构建

    赫夫曼树的构建步骤如下:

    1、将给定的n个权值看做n棵只有根节点(无左右孩子)的二叉树,组成一个集合HT,每棵树的权值为该节点的权值。
    2、从集合HT中选出2棵权值最小的二叉树,组成一棵新的二叉树,其权值为这2棵二叉树的权值之和。
    3、将步骤2中选出的2棵二叉树从集合HT中删去,同时将步骤2中新得到的二叉树加入到集合HT中。
    4、重复步骤2和步骤3,直到集合HT中只含一棵树,这棵树便是赫夫曼树。

    假如给定如下5个权值:

    则按照以上步骤,可以构造出如下面左图所示的赫夫曼树,当然也可能构造出如下面右图所示的赫夫曼树,这并不是唯一的。

    Huffman编码

    ​ 赫夫曼树的应用十分广泛,比如众所周知的在通信电文中的应用。在等传送电文时,我们希望电文的总长尽可能短,因此可以对每个字符设计长度不等的编码,让电文中出现较多的字符采用尽可能短的编码。为了保证在译码时不出现歧义,我们可以采取如下图所示的编码方式:

即左分支编码为字符0,右分支编码为字符1,将从根节点到叶子节点的路径上分支字符组成的字符串作为叶子节点字符的编码,这便是赫夫曼编码。我们根据上面左图可以得到各叶子节点的赫夫曼编码如下:
权值为5的节点的赫夫曼编码为:11
权值为4的节点的赫夫曼编码为:10
权值为3的节点的赫夫曼编码为:00
权值为2的节点的赫夫曼编码为:011
权值为1的节点的赫夫曼编码为:010

而对于上面右图,则可以得到各叶子节点的赫夫曼编码如下:
权值为5的节点的赫夫曼编码为:00
权值为4的节点的赫夫曼编码为:01
权值为3的节点的赫夫曼编码为:10
权值为2的节点的赫夫曼编码为:110
权值为1的节点的赫夫曼编码为:111

最小带权路径长度(左图):WPL=23+31+32+24+25=33
最小带权路径长度(右图):WPL=2
5+24+23+32+31=33

由此可见,不管是左子树权值大于右子树权值还是小于右子树权值的哈夫曼树的最小带权路径长度不变。

Huffman编码/解码的c++实现

head.h

#include<iostream>
#include <string>
#pragma warning (disable:4996)
using namespace std;
#define MAXSIZE 100

typedef struct Node
{
int weight;
int parent;
char value;
int lchild, rchild;
}HTNode, *HuffmanTree;
typedef char **HuffmanCode;

HuffmanTree create_HuffmanTree(int n);
void select_minium(HuffmanTree HT, int k, int &min1, int &min2);
int min(HuffmanTree HT, int k);
void HuffmanCoding(HuffmanTree HT, HuffmanCode &HC, int n);
int countWPL(HuffmanTree HT, int n);
void HuffmanDecoding(char ch[], HuffmanTree HT,int n);

main.cpp

#include "head.h"
int main() {
cout << " ========================================================================" << endl;
cout << "% 欢迎使用Huffman翻译器 %" << endl;
cout << "$ 1.初始化 $" << endl;
cout << "% 2.编码 %" << endl;
cout << "$ 3.解码 $" << endl;
cout << "% 4.最小带权路径 %" << endl;
cout << "$ 5.退出 $" << endl;
cout << "% %" << endl;
cout << "$ $" << endl;
cout << "% %" << endl;
cout << "$ $" << endl;
cout << "% %" << endl;
cout << "$ $" << endl;
cout << "% %" << endl;
cout << "$ $" << endl;
cout << " =========================================================================" << endl;
cout << "请输入选项(输入范围位数字1-5):";
char input;
cin >> input;
HuffmanTree htree = NULL;
HuffmanCode HC = NULL;
int n;
while (input != '5') {
while (input > 53 or input < 49) {
cout << "输入有误,请重新输入!!!" << endl;
cin >> input;
}
switch (input) {
case '1':
{
cout << "请输入节点个数:";
cin >> n;
htree = create_HuffmanTree(n);
cout << "赫夫曼树初始化完毕!" << endl;
cout << "--------------------------------------------------" << endl;

break;
}
case '2':
HuffmanCoding(htree, HC, n);
break;
case '3':
{
char ch[1000] = { 0 };
cout << "请输入编码序列:" ;
getchar();
cin.getline(ch, 1000, '\n');
HuffmanDecoding(ch, htree,n);
break;
}
case '4': {
int wpl = countWPL(htree, n);
printf("最小带权路径长度WPL=%d\n", wpl);
cout << "-----------------------------------------------------------"<<endl;
break;
}

}
cout << "请输入选项(输入范围位数字1-5):";
cin >> input;
}
cout << "see u next!";
return 0;

}

Huffmancode.cpp

#include "head.h"
HuffmanTree create_HuffmanTree(int n)
{
int *wet = (int*)malloc(sizeof(int) *MAXSIZE);
char *value = (char*)malloc(sizeof(char)*MAXSIZE);
int weight;
char v;
for (int i = 0; i < n; i++) {
cout << "请输入节点" << i << "的值和权重:";
cin >> v; cin >> weight;
*value = v; *wet = weight;
value++; wet++;

}
int total = 2 * n - 1;
HuffmanTree HT = (HuffmanTree)malloc(total * sizeof(HTNode));
if (!HT)
{
printf("HuffmanTree malloc faild!");
exit(-1);
}
int i;
wet = wet - n;
value = value - n;
for (i = 0; i < n; i++)
{
HT[i].parent = -1;
HT[i].lchild = -1;
HT[i].rchild = -1;
HT[i].weight = *wet;
HT[i].value = *value;
wet++; value++;
}
for (; i < total; i++)
{
HT[i].parent = -1;
HT[i].lchild = -1;
HT[i].rchild = -1;
HT[i].weight = 0;
}
int min1, min2;
for (i = n; i < total; i++)
{
select_minium(HT, i, min1, min2);
HT[min1].parent = i;
HT[min2].parent = i;

HT[i].lchild = min1;
HT[i].rchild = min2;
HT[i].weight = HT[min1].weight + HT[min2].weight;
}
return HT;
}

int countWPL(HuffmanTree HT, int n)
{
int i, countRoads, WPL = 0;
for (i = 0; i < n; i++)
{
int father = HT[i].parent;
countRoads = 0;

while (father != -1)
{
countRoads++;
father = HT[father].parent;
}
WPL += countRoads * HT[i].weight;
}
return WPL;
}

void select_minium(HuffmanTree HT, int k, int &min1, int &min2)
{
min1 = min(HT, k);
min2 = min(HT, k);
}

int min(HuffmanTree HT, int k)
{
int i = 0;
int min;
int min_weight;
while (HT[i].parent != -1)
i++;
min_weight = HT[i].weight;
min = i;
for (; i < k; i++)
{
if (HT[i].weight < min_weight && HT[i].parent == -1)
{
min_weight = HT[i].weight;
min = i;
}
}
HT[min].parent = 1;

return min;
}

void HuffmanCoding(HuffmanTree HT, HuffmanCode &HC, int n)
{
HC = (HuffmanCode)malloc(n * sizeof(char *));
if (!HC)
{
printf("HuffmanCode request memory failed!");
exit(-1);
}
char *code = (char *)malloc(n * sizeof(char));
if (!code)
{
printf("code malloc faild!");
exit(-1);
}
code[n - 1] = '\0';
int i;
for (i = 0; i < n; i++)
{
int current = i;
int father = HT[i].parent;
int start = n - 1;

while (father != -1)
{
if (HT[father].lchild == current)
code[--start] = '0';
else
code[--start] = '1';
current = father;
father = HT[father].parent;
}
HC[i] = (char *)malloc((n - start) * sizeof(char));
if (!HC[i])
{
printf("HC[i] malloc faild!");
exit(-1);
}
strcpy(HC[i], code + start);
}
FILE *file = fopen("codefile.txt", "w");
if (NULL == file) {
cout << "open error!\n";
}
cout << "----------------------编码如下--------------------------" << endl;
for (int i = 0; i < n; ++i) {
printf("%c %s\n",HT[i].value,HC[i]);
fprintf(file, "%c %s\n", HT[i].value, HC[i]);
}
fclose(file);
free(code);
cout << "已编码并写入文件codefile.txt" << endl;
cout << "-----------------------------------------------------------" << endl;
}


void HuffmanDecoding(char ch[], HuffmanTree HT,int n) {
int cnt = 0;
int num[100];
HuffmanTree temp;

for (int i = 0; i < strlen(ch); i++) {
if (ch[i] == '0') {
num[i] = 0;
}
else {
num[i] = 1;
}
}
FILE *file = fopen("textfile.txt", "w");
if (NULL == file) {
cout << "open error!\n";
}
cout << "译码结果为:";
fprintf(file, "%s", "译码结果:");
if (HT) {
int cursor = 2*n- 2;
while (cnt < strlen(ch)) {
int temp = cursor;
while ((HT[temp].lchild != -1) && (HT[temp].rchild != -1)) {
if (num[cnt] == 0) {
temp = HT[temp].lchild;
}
else {
temp = HT[temp].rchild;
}
cnt++;
}

printf("%c", HT[temp].value);
fprintf(file, "%c", HT[temp].value);
}
}
cout << endl <<"已打印出译码结果并将结果存入textfile.txt中"<< endl;
cout << "-------------------------------------------------------------" << endl;
}

/*101101011110101010010011100010110101011100*/

运行结果如下:

Author: superzhaoyang
Link: http://yoursite.com/2020/03/17/数据结构课设实验四之赫夫曼编译码器/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • 微信
  • 支付宝

Comment