DIFFERENT

capacity-approaching codes, such as Turbo codes and low-density parity-check

(LDPC) codes , have been designed and used for some time in applications, such

as wireless communication and data storage, to achieve very high data rates.

Recently, the first provable capacity-achieving codes, called polar codes, were

invented by Arikan. Being the first family of codes known to achieve the channel

capacity with explicit construction, polar codes have attracted a lot of

attention since their invention. Polar codes have been proved to achieve the

capacity for binary-input symmetric memoryless channels as well as discrete and

continuous memoryless channels. Moreover, an explicit construction method for

polar codes has been provided, and it is shown that they can be efficiently

encoded and decoded with complexity O(n log n), where n is the code length.

A number

of decoding methods have been proposed for polar codes, and among these,

successive cancellation decoding (SCD) and belief propagation decoding (BPD)

are the two most popular methods. Due to the serial nature of the algorithm,

SCD suffers from long latency, although it requires less computation as

compared with BPD. Several methods have been proposed to reduce the latency of

SC decoders to achieve a high throughput. Moreover, list decoding and stack

decoding, which are based on SCD, have been proposed to improve the

error-correcting performance for polar codes with short code lengths.

On the

other hand, polar BP decoders have the intrinsic advantage of parallel

processing. Therefore, compared with their SC counterparts, polar BP decoders

are more attractive for low-latency applications. However, due to their

iterative nature, the required latency and energy dissipation of BP decoders

increase linearly with the number of iterations. The requirement of a large

number of iterations results in high computation complexity, and hence makes

BPD less attractive than its SC counterpart.

In this

paper, a modified BP decoder is proposed to speed up the decoding process. In

the proposed BP decoder, the adjacent two stages of basic computation elements

are combined together to generate new decoding factor graph with bigger basic

decoding element which has four nodes on either left side or right side. Then

based on the new computation element, a corresponding path-based belief

computation method is presented. Simulation results show that the proposed

algorithm performs as well as the traditional BP decoding.

The new

stage-combined BP decoder has advantages in several aspects. The number of

stages is reduced. Therefore, the decoding latency is reduced to log2N clock

cycles per iteration instead of 2log(2N). In addition, the message storage and

read/write operations will be reduced to Nlog(2N) as well, which is also about

half of the conventional BP decoder.

METHODOLOGY

a )

Polar codes

Polar

codes are linear block codes based on the phenomenon of channel polarization,

in which individual channels are recursively combined and split, such that

their mutual information tends toward either 1 or 0. In other words, some of

these channels become completely noise-free, while the others become completely

noisy. Furthermore, the fraction of noiseless channels tends toward the

capacity of the underlying binary symmetric channels.

Fig 3.1.

Encoding signal flow graph of (8, 4) polar code.

Polar codes are specified by a generator matrix Gn = F?m,

where n = 2m is the code length and F?m is the mth Kronecker power of F =1 01 1 .An (n, k) polar

code can be generated in two steps. First, an n-bit message u is constructed by

assigning the k reliable and (n?k) unreliable positions as information bits and

frozen bits, respectively. The (n ? k) frozen bits are forced to 0 and form the

frozen set AC. Then, the n-bit u is multiplied with the generator matrix G = F?m

to generate an n-bit transmitted code-word x. Fig. 1 shows the encoding signal

flow graph for n = 8 polar codes, where the “?” sign

represents the XOR operation.

b ) Belief propagation decoding

The BP decoding for polar codes is

based on the factor graph representation of the codes. In the decoder, each

node is associated with two types of messages: left-to-right messages L and

right-to-left messages R. Fig.2 shows the four nodes involved in one basic

computation element BE2. Firstly, the belief (indicated as log likelihood

ratio, LLR) of each node is initiated. The left most source vector nodes in Fig.3.1

are initiated with zero or positive infinity as Eq.(4) and the right most code

word nodes are initiated with channel output LLRs as Eq.(5). Other nodes are

initiated with zero.

Fig 3.2. Factor graph of the original BP polar coding.

Fig 3.3. Factor graph of basic computation element BE2 in BP

polar decoders.

In BP decoding, LLRs are passed

iteratively from left to right and then from right to left through basic

computation elements BE2s to compute the likelihood of information bits. The

computations in each BE2 are expressed as equation (3)

Li,j = g( Li+1,2j-1, Li+1,2j+

Ri,j+N/2)

Li,j+N/2 = g( Ri,j, Li+1,2j-1)+

Li+1,2j

Ri+1,2j-1 = g( Ri,j, Li+1,2j+

Ri,j+N/2)

Ri+1,2j = g( Ri,j, Li+1,2j-1)+

Ri,j+N/2

Where

g(x,y) = log(cosh((x+y)/2))-log(cosh((x-y)/2))

it can be simplified as :

g(x,y) ? 0.9.sign(x)sign(y).min(|x|,|y|)

After t iterations, the decision can

be made based on the final LLR results of source vector nodes as

c) Scheduling schemes for bpd

For a BP

decoder, scheduling refers to the method of information dissemination and

updating from one end of the factor graph to the other. It has a significant

effect on the convergence speed of the BP decoding on the factor graph. In this

section, we introduce the scheduling schemes used in the existing BP decoders.

In Early stopping criteria for energy-efficient low-latency belief-propagation

polar code decoders, a one-way scheduling scheme is used, where stages are

activated sequentially from left to right and one iteration is completed when

the activation reaches the rightmost stage, as shown in Fig. 3.2. When a stage

is activated, all the PEs in that stage are activated and both left-hand and

right-hand propagation messages for that stage are calculated and updated. In

this one-way scheduling, one iteration takes m clock cycles, where m

is the number of stages in the factor graph. Careful analysis of the one-way

scheduling reveals that, in one iteration, information at theleft-hand side of the factor graph has traversed

the whole factor graph to reach the right-hand side, while the information at

the right-hand side has only traversed one stage. This uneven updating of the

information will lead to a slow convergence of decoding on the factor graph, as

will be shown in the later sections. In A 4.68Gb/s belief propagation polar

decoder with bit-splitting register file,”, another scheduling scheme, named

round-trip scheduling, is employed. Round-trip scheduling guarantees even

dissemination of information. In on iteration, right-hand side information

traverses all the way to the left-hand side, followed by left-hand side

information traversing all the way to the right-hand side, as shown in Fig.

3(b). One iteration in round-trip scheduling takes 2m?1 cycles. However, since the dissemination and updating of the

message are more even, it requires less iteration for the convergence, and

hence, the overall latency is smaller, as will be shown in the later sections.

d) Connected subfactor graph

In this

section, the notion of the CSFG is introduced. The CSFG will be used to

illustrate our proposed decoding scheme for low-complexity BPD. Fig. 3.4 shows

the scheduling tree of the SCD of an n

= 8 polar code , and Fig. 3.5 shows the equivalent BPD factor graph of the same

n = 8 polar code. At each stage, the

SCD scheduling tree is split into a number of subtrees, each of which is

responsible for decoding a corresponding constituent code. The sizes of the

subtrees vary at each level and the size is reduced by half when moving down from

one stage to another stage. A CSFG is defined as a subfactor-graph that has the

same number of inputs and outputs, where the output nodes are at the stage m + 1 and each input is connected to

each output through some PEs in the subfactor-graph. Fig. 4(b) shows two

examples of a CSFG. It can be seen that each CSFG has a corresponding subtree

in the decoding scheduling tree of SCD. Fig. 3.4 and 3.5 shows the examples of

the decoding subtrees and the corresponding CSFGs of the n = 8 polar code. The number of CSFGs at each stage is given by 2 j, where j is the stage number. For the n

= 8 polar code, the numbers of CSFGs at stages 1 and 2 are two and four,

respectively.

Fig 3.4.Subtrees in SCD Scheduling tree

Fig 3.5. CSFG’s in BPD Factor graph

Similar

to the subtree of the SCD scheduling tree, each CSFG of the BPD factor graph is

responsible for the decoding of its corresponding constituent code using its

respective input messages. At each iteration t, the nodes at the stage j

in the BPD factor graph output left-to-right LLR-based propagating messages Rt1 :n,

j+1, and these are the inputs to the 2 j

CSFGs at the stage j .

e) Proposed

system

In this

section, a modified BP decoder with stage combination for polar codes is

introduced. Combining adjacent stages in the factor graph can be applied to

reduce the number of stages.

FINDINGS

Xilinx

software is used to verify the functionality of my design. The VLSI

architecture is developed using Verilog HDL. Fig 5.1 shows simulation result of

8 bit polar decoder. Fig 5.2 shows the RTL Schematics of belief propagation

polar code decoder.

Fig 5.1 Simulation result of Polar decoder

Fig 5.2

RTL Schematic of belief propagation polar code decoder

CONCLUSIONS

In

this paper, a novel early stopping method based on combined belief propagation

polar code decoder is proposed for BP polar code decoders. This early stopping

scheme not only reduces the average complexity of the BPD, hence reducing the

energy consumption during decoding, but also helps to make the decoding

converge faster. This reduces the average number of iterations and further

reduces the energy consumption and average latency. A new quarter-way

scheduling scheme is proposed, and the corresponding VLSI hardware architecture

is developed for the proposed BPD.