一哈夫曼樹以及文件壓縮原理:
成都創(chuàng)新互聯(lián)公司是專業(yè)的射陽網(wǎng)站建設(shè)公司,射陽接單;提供成都網(wǎng)站設(shè)計、成都網(wǎng)站制作,網(wǎng)頁設(shè)計,網(wǎng)站設(shè)計,建網(wǎng)站,PHP網(wǎng)站建設(shè)等專業(yè)做網(wǎng)站服務(wù);采用PHP框架,可快速的進(jìn)行射陽網(wǎng)站開發(fā)網(wǎng)頁制作和功能擴(kuò)展;專業(yè)做搜索引擎喜愛的網(wǎng)站,專業(yè)的做網(wǎng)站團(tuán)隊(duì),希望更多企業(yè)前來合作!
1.哈夫曼樹 :
給定N個權(quán)值作為N個葉子結(jié)點(diǎn),構(gòu)造一棵二叉樹,若該樹的帶權(quán)路徑長度達(dá)到最小,稱這樣的二叉樹為最優(yōu)二叉樹,也稱為哈夫曼樹。哈夫曼樹是帶權(quán)路徑長度最短的樹,權(quán)值較大的結(jié)點(diǎn)離根較近(頻率越高的結(jié)點(diǎn)離根越進(jìn)
)。
以 下數(shù)組為例,構(gòu)建哈夫曼樹
int a[] = {0,1,2,3,4,5,6,7,8}
我們可以發(fā)現(xiàn)以下規(guī)律
1:9個數(shù)構(gòu)成的哈夫曼樹一共有17個結(jié)點(diǎn),也就是可以n個數(shù)可以生產(chǎn)2*n-1個結(jié)點(diǎn)
2:數(shù)字越大的數(shù)離根節(jié)點(diǎn)越近,越小的數(shù)離根節(jié)點(diǎn)越近。
2.如何利用haffman編碼實(shí)現(xiàn)文件壓縮:
比如abc.txt文件中有以下字符:aaaabbbccde,
1.進(jìn)行字符統(tǒng)計
aaaabbbccde a : 4次 b : 3次 c : 2次 d : 1次 e : 1次
2.用統(tǒng)計結(jié)果構(gòu)建哈夫曼樹
3.用哈夫曼樹生成哈夫曼編碼(從根結(jié)點(diǎn)開始,路徑左邊記為0,右邊記為1):
a的編碼:1 b的編碼:01 c的編碼:000 d的編碼:0011 e的編碼:0010
4.哈夫曼編碼代替字符,進(jìn)行壓縮。
源文件內(nèi)容為:aaaabbbccde
將源文件用對應(yīng)的哈夫曼編碼(haffman code)替換,則有:11110101 01000000 00110010 (總共3個字節(jié))
由此可見,源文件一共有11個字符,占11字節(jié)的內(nèi)存,但是經(jīng)過用haffman code替換之后,只占3個字節(jié),這樣就能達(dá)到壓縮的目的
二主要技術(shù)點(diǎn):
1.哈夫曼樹算法(哈夫曼壓縮的基本算法)
2.哈希算法(字符統(tǒng)計時候會用到,也可以直接用HashMap統(tǒng)計)
3.位運(yùn)算(涉及到將指定位,置0或置1)
4.java文件操作,以及緩沖操作。
5.存儲模式(大端存儲,小端存儲,能看懂文件16進(jìn)制的形式)
7.設(shè)置壓縮密碼,解壓輸入密碼解壓(小編自己加的內(nèi)容)
三實(shí)現(xiàn)過程:
以上述aaaabbbccde為例
1.字符統(tǒng)計:
public class FreqHuf { public static int BUFFER_SIZE = 1 << 18; int freq[] = new int[256]; File file; int count; Listlist; FreqHuf(String pathname) throws Exception { list = new ArrayList<>(); this.file = new File(pathname); if(!file.exists()){ throw new Exception("文件不存在"); } System.out.println("進(jìn)行字符統(tǒng)計中"); CensusChar(); System.out.println("字符統(tǒng)計完畢"); } public void CensusChar() throws IOException{ int intchar; FileInputStream fis = new FileInputStream(file); System.out.println("統(tǒng)計中"); //這種統(tǒng)計處理方案,速度極慢,不建議使用,以下采用緩存讀數(shù)據(jù)。 // while((intchar = fis.read()) != -1){ // freq[intchar]++; // } //這里采用緩存機(jī)制,一次讀1 << 18個字節(jié),大大提高效率。 byte[] bytes = new byte[BUFFER_SIZE]; while((intchar = fis.read(bytes))!= -1){ for(int i = 0; i < intchar;i++){ int temp = bytes[i]& 0xff; freq[temp]++; } } fis.close(); for(int i = 0; i < 256; i++){ if(freq[i] != 0){ this.count++; } } int index = 0; for(int i = 0; i < 256; i++){ if(freq[i] != 0){ HuffmanFreq huffman = new HuffmanFreq(); huffman.character = (char)i; huffman.freq = freq[i]; list.add(index, huffman); } } } }
//統(tǒng)計每個字符和其頻率的類 public class HuffmanFreq { char character; int freq; HuffmanFreq() { } HuffmanFreq(int character,int freq) { this.character = (char)character; this.freq = freq; } char getCharacter() { return character; } void setCharacter(int character) { this.character = (char)character; } int getFreq() { return freq; } void setFreq(int freq) { this.freq = freq; } byte[] infoToByte(){ byte[] bt = new byte[6]; byte[] b1 = ByteAnd8Types.charToByte(character); for(int i= 0; i < b1.length;i++){ bt[i] = b1[i]; } byte[] b2 = ByteAnd8Types.intToBytes2(freq); int index = 2; for(int i= 0; i < b2.length;i++){ bt[index++] = b2[i]; } return bt; } @Override public String toString() { return "Huffman [character=" + character + ", freq=" + freq + "]"; } }
2.用統(tǒng)計結(jié)果構(gòu)建哈夫曼樹:
//treeSize為總節(jié)點(diǎn)數(shù) private void creatTree(int treeSize){ int temp; treeList = new ArrayList(); for(int i = 0; i < treeSize; i++){ HuffTreeNode node = new HuffTreeNode(); treeList.add(i, node); } for(int i = 0; i < charCount; i++){ HuffTreeNode node = treeList.get(i); node.freq.freq = charList.get(i).getFreq(); node.freq.character = charList.get(i).getCharacter(); node.left = -1; node.right = -1; node.use = 0; } for(int i = charCount; i < treeSize; i++){ int index = i; HuffTreeNode node = treeList.get(i); node.use = 0; node.freq.character = '#'; node.right = searchmin(index); node.left = searchmin(index); node.freq.freq = treeList.get(node.right).freq.freq + treeList.get(node.left).freq.freq; temp = searchmin(++index); if(temp == -1){ break; } treeList.get(temp).use = 0; } } private int searchmin(int count){ int minindex = -1; for(int i = 0; i < count; i++){ if(treeList.get(i).use == 0){ minindex = i; break; } } if(minindex == -1){ return -1; } for(int i = 0; i < count; i++){ if((treeList.get(i).freq.freq <= treeList.get(minindex).freq.freq) && treeList.get(i).use == 0){ minindex = i; } } treeList.get(minindex).use = 1; return minindex; }
3.用哈夫曼樹生成哈夫曼編碼(從根結(jié)點(diǎn)開始,路徑左邊記為0,右邊記為1):
private void bulidhuftreecode(int root, String str){ if(treeList.get(root).getLeft() != -1 && treeList.get(root).getRight() != -1){ bulidhuftreecode(treeList.get(root).getLeft(), str+"0"); bulidhuftreecode(treeList.get(root).getRight(), str + "1"); } else{ treeList.get(root).code = str; } }
4.哈夫曼編碼代替字符,進(jìn)行壓縮,壓縮前首先要將文件頭(文件標(biāo)志,字符數(shù)量,最后一個字節(jié)有效位,密碼)字符和其頻率的那張表格寫入文件,以便于解壓縮
public void creatCodeFile(String path) throws Exception{ byte value = 0; int index = 0; int arr[] = new int[256]; int intchar; for(int i = 0; i < charCount; i++){ arr[treeList.get(i).freq.character] = i; } File file = new File(path); if(!file.exists()){ if(!file.createNewFile()){ throw new Exception("創(chuàng)建文件失敗"); } } int count = charList.size(); HuffmanHead head = new HuffmanHead(count, howlongchar(count), password); //將文件頭信息寫入文件 this.write = new RandomAccessFile(file, "rw"); write.write(head.InfoToByte()); //將字符及其頻率的表寫入文件 for(HuffmanFreq freq : charList){ byte[] bt = freq.infoToByte(); write.write(bt); } //將字符用哈夫曼編碼進(jìn)行壓縮,這里讀寫都是采用緩存機(jī)制 byte[] readBuffer = new byte[BUFFER_SIZE]; while((intchar = read.read(readBuffer))!= -1){ ProgressBar.SetCurrent(read.getFilePointer()); for(int i = 0; i < intchar;i++){ int temp = readBuffer[i]& 0xff; String code = treeList.get(arr[temp]).code; char[] chars = code.toCharArray(); for(int j = 0; j < chars.length; j++){ if(chars[j] == '0'){ value = CLR_BYTE(value, index); } if(chars[j] == '1'){ value = SET_BYTE(value, index); } if(++index >= 8){ index = 0; writeInBuffer(value); } } } } //此方法速度較慢 // while((intchar = is.read()) != -1){ // String code = treeList.get(arr[intchar]).code; // char[] chars = code.toCharArray(); // // for(int i = 0; i < chars.length; i++){ // if(chars[i] == '0'){ // value = CLR_BYTE(value, index); // } // if(chars[i] == '1'){ // value = SET_BYTE(value, index); // } // if(++index >= 8){ // index = 0; // oos.write(value); // } // } // } if(index != 0){ writeInBuffer(value); } byte[] Data = Arrays.copyOfRange(writeBuffer, 0, writeBufferSize); write.write(Data); write.close(); read.close(); } //指定位,置1 byte SET_BYTE(byte value, int index){ return (value) |= (1 << ((index) ^ 7)); } //指定位,置0 byte CLR_BYTE(byte value, int index){ return (value) &= (~(1 << ((index) ^ 7))); } //判斷指定位是否為0,0為false,1為true boolean GET_BYTE(byte value, int index){ return ((value) & (1 << ((index) ^ 7))) != 0; }
如果一個字節(jié)一個字節(jié)往文件里寫,速度會極慢,為了提高效率,寫也采用緩存,先寫到緩存區(qū),緩存區(qū)滿了后寫入文件,
private void writeInBuffer(byte value) throws Exception { if(writeBufferSize < BUFFER_SIZE){ writeBuffer[writeBufferSize] = value; if(++writeBufferSize >= BUFFER_SIZE){ write.write(writeBuffer); writeBufferSize = 0; } } else{ throw new Exception("寫入文件出錯"); } }
到這里壓縮就完成了,以下為解壓縮方法
1.從寫入文件中的字符統(tǒng)計的表讀出放入list里
public void init() throws Exception{ char isHUf = read.readChar(); //驗(yàn)證文件頭信息 if(isHUf != '哈'){ throw new Exception("該文件不是HUFFMAN壓縮文件"); } this.charCount = read.readChar(); this.treeSize = 2*charCount -1; this.lastIndex = read.readChar(); int password = read.readInt(); if(password != this.password.hashCode()){ System.out.println("密碼錯誤"); } else{ System.out.println("密碼正確,正在解壓"); } //從文件中將字符統(tǒng)計的表讀出 byte[] buffer = new byte[charCount * 6]; read.seek(10); read.read(buffer, 0, charCount * 6); ProgressBar.SetCurrent(read.getFilePointer()); for(int i = 0; i < buffer.length; i+=6){ byte[] buff = Arrays.copyOfRange(buffer, i, i+2); ByteBuffer bb = ByteBuffer.allocate (buff.length); bb.put (buff); bb.flip (); CharBuffer cb = cs.decode (bb); byte[] buff1 = Arrays.copyOfRange(buffer, i+2, i+6); int size = ByteAnd8Types.bytesToInt2(buff1, 0); HuffmanFreq freq = new HuffmanFreq(cb.array()[0], size); charList.add(freq); } }
2.用統(tǒng)計結(jié)果構(gòu)建哈夫曼樹(和以上代碼一樣)
3.用哈夫曼樹生成哈夫曼編碼(從根結(jié)點(diǎn)開始,路徑左邊記為0,右邊記為1)(和以上代碼一樣)
4.遍歷文件每個字節(jié),根據(jù)哈夫曼編碼找到對應(yīng)的字符,將字符寫入新文件
public void creatsourcefile(String pathname) throws Exception{ int root = treeList.size() - 1; int fininsh = 1; long len; File file = new File(pathname); if(!file.exists()){ if(!file.createNewFile()){ throw new Exception("創(chuàng)建文件失敗"); } } write = new RandomAccessFile(file, "rw"); int intchar; byte[] bytes = new byte[1<<18]; int index = 0; while((intchar = read.read(bytes))!= -1){ len = read.getFilePointer(); ProgressBar.SetCurrent(len); for(int i = 0; i < intchar;i++){ for(;index < 8 && fininsh != 0;){ if(GET_BYTE(bytes[i], index)){ root = treeList.get(root).right; } else{ root = treeList.get(root).left; } if(treeList.get(root).right== -1 && treeList.get(root).left == -1){ byte temp = (byte)treeList.get(root).freq.character; writeInBuffer(temp); root = treeList.size() - 1; } index++; if(len == this.goalfilelenth && i == intchar-1){ if(index >= this.lastIndex){ fininsh = 0; } } } index = 0; } } byte[] Data = Arrays.copyOfRange(writeBuffer, 0, writeBufferSize); write.write(Data); write.close(); write.close(); read.close(); }
四運(yùn)行展示:
以上就是本文的全部內(nèi)容,希望對大家的學(xué)習(xí)有所幫助,也希望大家多多支持創(chuàng)新互聯(lián)。