StreamAnalyse-BloomFilter

liudongdong1 收录于 Categories 时空数据

2022-04-26 约 2273 字预计阅读 5 分钟 - 次阅读

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/images.png

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. a query returns either “possibly in set” or “definitely not in set”. the shortcoming of this structure is that the more elements that are added to the set, the larger the probability of false positives. and , Bloom filters do not store the data items at all, and a separate solution must be provided for the actual storage

level: IEEE International Conference on Computer Communications INFCOM CCFA author: Tong Yang date: ‘2019-4’ keyword:

Size measurement, Hash function

Paper: SA Counter

Summary

使用概率的方法，在可以有一定的容错下，自适应不同的counting range。

静态模型中概率如何选择？文中是predefined
文中说这种概率计数所带来的误差在大流是是可以忽略的？理论界限是多少
刚开始静态模型中定义 length of the sign and counting part.

Research Objective

Application Area: net work measurement provide indispensable information for congestion control,DDoS attack detection,heavy hitter identification,heavy change detection
Purpose: in order to balance the number of counters and the size of each counter to achieve high and constant processing speed to keep up with line rate and memory efficiency, we propose self-adaptive counters.

Problem&Challenge Statement

skewed distribution flow
high speed of flows
have no idea of the approximate flow size of elephant flows beforehand

previous work:

sampling low accuracy
a compact data structure called sketch (Count,CM,CU,C Sketch,Sophisticated Sketches)

Methods

【定义问题1】 each small counter has to be able to represent the size of both mouse and elephant flow.

use two version: Static Sign Bits version and Dynamic Sign Bits version, split each counter into two parts,sign bits ,and counting bits,when sign bits are all 0,increment the counting bits normally,else increment the counting bits with a probability calculated by the value of sign bits.

for Static Version: the buckets structure as follows:

so the total count of the buckets is:

insertion algorithm:

query algorithm:

【定义子问题2】how to determine how many bits should be assigned for the sign bits? Using the dynamic sign bit version .

for dynamic counter structure:

Evaluation

Environment set up:
Dataset : IP trace Datasets ,Synthetic Datasets using C++ to implemention

Conclusion

use small counters to accurately record the sizes of both elephant and mouse flow ,achieving memory efficiency and constant fast speed. and applied to CM,CU,C
according to experiment based on two real datasets and one synthetic dataset ,self-adaptive counters have superior performance.

Notes

Flow identifiers are selected from the header fields of packets,(source ip address,port,destination ip,address,protocol)
Flow size is defined as the number of packets in a flow
Flow volume is defined as the number of bytes in the flow

level: IEEE/ACM TRANSACTIONS ON NETWORKING 计算机网络 CCF A类 author: Tong Yang date: ‘2019’ keyword:

sketch,network measurements,elephant flow