Classification and comparison of flow classification algorithms

Classification and comparison of flow classification algorithms

The flow classification algorithm can be classified according to different principles. In this paper, the existing flow classification algorithms are divided into three categories based on the relationship between multiple regions.

1) N represents the number of rules,

2) d represents the dimension,

3) w represents the width of each dimension.

Algorithm for converting multi-dimensional search to one-dimensional search

The characteristic of this kind of algorithm is to connect the various regions of the flow classification search to form a search keyword, thereby transforming the problem of multi-dimensional search into a one-dimensional search problem. This method is often used in flow classifiers that perform lookups based on hash tables and flow classifiers that use TCAM. The disadvantage is that the obtained search keywords are very large, and because the multi-dimensional synthesis is one-dimensional, it is impossible to optimize the design using the commonality of each dimension in the rule. The representative of this type of algorithm is the multi-space search algorithm (Tuple-Space-Search) [1]. The basic idea is to decompose the flow classification search problem into multiple exact matching problems.

First, map the d-dimension rule to a space with d components. The i-th component of the space indicates the prefix length of the i-th dimension of the rule, so that rules with the same prefix length in each dimension correspond to the same space, and store them In the same hash table. During the search, all hash tables are searched for exact matches. The space complexity of this algorithm is O (N), and the time complexity is determined by the number of hash tables that need to be accessed, making the time complexity of the search uncertain. In the worst case, the number of spaces may reach O (wd), making the search time unacceptable. The update speed is very fast and only requires one hash access time. In fact, many well-known hardware manufacturers have adopted hashing technology when designing their own flow classification implementation.

Related area search algorithm The characteristic of this algorithm is that the search result of the previous area will affect the path of the area to be searched after. The main advantage is that a relatively simple spanning tree structure can be used. The disadvantage is that if you want to achieve a faster search speed You need to copy the structure of the tree or add link information to the data, which will increase the memory capacity and make the update slower. In addition, a large number of memory accesses are interdependent, resulting in unpredictable delays. Representatives of such algorithms are Hierarchical Tries, Set-Pruning tries [2], Grid-of-tries and Hierarchical Intelligent CutTIngs , HiCuts) [3]. The hierarchical search tree takes any one of the d dimensions to generate the first-level binary tree, and then takes any one of the remaining d−1 dimensions as the second dimension. Each of the binary tree and the first dimension in the rule table For the matching node, build a second-level binary tree according to its second dimension, and repeat the above process until the processing of each dimension is completed. The space complexity of the hierarchical search tree is O (Ndw), and the time complexity is O (wd). It is simple and easy to implement, but the search is slow and the update is not fast. The set merge search tree is an improvement to the hierarchical search tree. By copying the subtree corresponding to the node with a small prefix length to all the subtrees with a prefix length greater than it, if a certain node repeats rules, then Take the rule with higher priority. This process is performed recursively on all subtrees. You only need to find the longest match on all the trees in order to find the corresponding rule. The time complexity is O (dw) and the space complexity is O (Nddw). It can be seen that the search space is reduced by increasing the storage space, and the scalability is also poor. The search tree grid is based on the hierarchical search tree, adding a transfer pointer b (0 or 1) to some nodes, which points to a node of another subtree.

The conditions for the transfer pointer from the Y node of the subtree Ty to the x node of the subtree Tx are:

1) Tx and Ty are different subtrees on the same layer, and the pointers to their root nodes are the next pointers to two different nodes (r and s) on the same tree T.

2) The bit string of the transfer pointer b from the root node of Ty to Y and then connected in series is equal to the bit string from the root node of Tx to x.

3) Y has no child node equal to the transfer pointer b.

4) s is the parent node closest to r in T that satisfies the above conditions. The search tree grid avoids the problem of expansion of storage space and backtracking of hierarchical search trees due to replication rules. When dealing with the two-dimensional flow classification problem, the time complexity is O (w) and the space complexity is O (Nw). Therefore, it is a good algorithm to deal with the two-dimensional flow classification problem. When dealing with multi-dimensional problems, it can also be used to optimize the last two subtrees of the hierarchical search tree. The implementation of HiCuts needs to establish a data structure of the decision tree. Each leaf node stores some rules. During the search, a leaf node is found through the decision tree, and then the rules in this leaf node are linearly searched to find a match. rule. The root node of the decision tree contains the entire d-dimensional space, the specific structure can be determined by parameters.

The parameter binth specifies the maximum number of rules contained in each leaf node. When the number of rules of a node is greater than binth, a certain dimension in the d dimension is divided into NP (C) parts to form NP (C) Node. If the number of rules contained in a node is less than binth, then the node is a leaf node. The time complexity of HiCuts is O (d), and the space complexity is O (Nd). The parameters can be adjusted according to the characteristics of the rules to optimize the data structure, reduce the required storage space, increase the search speed, and update the rules easily. The disadvantage is that the pre-processing time is longer, which is suitable for the case of fewer rules.

Independent area search algorithm This type of algorithm first searches each area individually through a one-dimensional search algorithm to produce an intermediate search result, and then determines the final multi-dimensional search result based on the intermediate result. In this way, the characteristics of each area can be used to make the search more effective. In addition, the memory access is independent, so it can be performed concurrently. Its performance largely depends on the encoding of the intermediate search results.

Representative algorithms are crossproducTIng, Bitmap-IntersecTIon and Recursive Flow ClassificaTIon (RFC) [4]. Crossproducting first finds combinations of different situations on each dimension (each combination corresponds to a rule), stores them in a Crossproduct table, the search is performed separately on each dimension, and finally based on the search results of each dimension Crossproduct table to find and find the corresponding rules. The time complexity is O (dw), and the space complexity is O (Nd). Its disadvantage is that the storage space requirements are large, and the number of stages varies. In addition, it is inconvenient to update, and each time you add a rule, you need to recalculate the Crossproduct table.

The intersection of bit vectors maps the same dimension of all rules to the same number axis, then each rule is a range or a point on the number axis, and an intermediate vector is defined for them, and its bit width is N, corresponding to N Rules, the rules are arranged according to priority, and each rule is given a value according to its matching condition in the range. When the rules match, the corresponding bit of the vector takes 1; otherwise, it takes 0. When searching, first find the matching intermediate vectors in each dimension, and then perform the AND operation to find the highest priority 1 bit in the vector. The corresponding rule is the matching rule. The time complexity is O (dw + N / memwidth), and the space complexity is O (dN2). This algorithm attempts to reduce the required storage space by increasing the number of memory accesses, but the effect is not obvious, and it does not solve the problem of difficulty in updating. When the RFC algorithm processes the data packet, it can be regarded as a process of mapping the S bits of the packet header to the T-bit-like symbols.

Where T = lbN, T <

Realization of Embedded Agent Based on SNMP in OBS Network

Auto Control Electrical Switch Series

Auto Control Electrical Switch Series,Auto Timer Charge Switch,Auto Brake Switch,Electric Car Electric Switch

Shenzhen huaxunde Technology CO.,Ltd. , https://www.huaxundekj.com

Posted on