#### Polar Codes for Terabit/s Data Rates

#### Erdal Arıkan, Altuğ Süral, E. Göksu Sezer

Bilkent University and Polaran Ltd. Ankara, Turkey

Dec. 4th, 2018 International Symposium on Turbo Codes and Iterative Information Processing 2018 (ISTC 2018) Hong Kong

◆□▶ ◆□▶ ◆□▶ ▲□▶ ▲□ ◆ ○ ◆ ○ ◆



## Goals

Provide some motivation and background for the problem
 Discuss challenges for Tb/s Forward Error Correction (FEC) with current VLSI technology

## Goals

- Provide some motivation and background for the problem
- Discuss challenges for Tb/s Forward Error Correction (FEC) with current VLSI technology
- Present a solution based on polar codes developed jointly in a H2020 project (EPIC)

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

## Goals

- Provide some motivation and background for the problem
- Discuss challenges for Tb/s Forward Error Correction (FEC) with current VLSI technology
- Present a solution based on polar codes developed jointly in a H2020 project (EPIC)

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

State some remaining challenges

Exploit the vast spectrum potential above 90 GHz

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

fronthauling/backhauling

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

- fronthauling/backhauling
- data centers

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

- fronthauling/backhauling
- data centers
- on/intra chip communications

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

- fronthauling/backhauling
- data centers
- on/intra chip communications
- data kiosks

Exploit the vast spectrum potential above 90 GHz

 Bring wireless systems to optical speeds for new applications for use in

- fronthauling/backhauling
- data centers
- on/intra chip communications
- 🕨 data kiosks



Exploit the vast spectrum potential above 90 GHz

- Bring wireless systems to optical speeds for new applications for use in
  - fronthauling/backhauling
  - data centers
  - on/intra chip communications
  - data kiosks
  - ▶ ...

 Relevant standardization already underway (IEEE 802.15.3d, LiFi)

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

Exploit the vast spectrum potential above 90 GHz

- Bring wireless systems to optical speeds for new applications for use in
  - fronthauling/backhauling
  - data centers
  - on/intra chip communications
  - data kiosks
  - ► ...
- Relevant standardization already underway (IEEE 802.15.3d, LiFi)
- Other technologies are also needed for Tb/s communications but FEC is one of the most complex part of the transmission chain

Channel coding requirements not extraordinary

Channel coding requirements not extraordinary
 Reliability: frame error rate of 10<sup>-6</sup> to 10<sup>-12</sup> without retransmissions

▲□▶ ▲□▶ ▲ □▶ ▲ □▶ □ のへぐ

- Channel coding requirements not extraordinary
   Reliability: frame error rate of 10<sup>-6</sup> to 10<sup>-12</sup> without retransmissions
- Implementation challenges are beyond the state-of-the-art

- Channel coding requirements not extraordinary
   Reliability: frame error rate of 10<sup>-6</sup> to 10<sup>-12</sup> without retransmissions
- Implementation challenges are beyond the state-of-the-art
- We will use the targets set by the EPIC Project experts

| Technology   | 7nm                          |  |
|--------------|------------------------------|--|
| Throughput   | 1 Tb/s                       |  |
| Clock freq.  | $\leq$ 1 GHz                 |  |
| Silicon area | $\leq 10  { m mm^2}$         |  |
| Pow. Den.    | $\leq$ 0.1 W/mm <sup>2</sup> |  |
| Area Eff.    | $\geq 100 \ { m Gb/s/mm^2}$  |  |
| Energy Eff.  | $\leq$ 1 pJ/bit              |  |

- Channel coding requirements not extraordinary
   Reliability: frame error rate of 10<sup>-6</sup> to 10<sup>-12</sup> without retransmissions
- Implementation challenges are beyond the state-of-the-art
- We will use the targets set by the EPIC Project experts

| Technology   | 7nm                          |  |
|--------------|------------------------------|--|
| Throughput   | 1 Tb/s                       |  |
| Clock freq.  | $\leq$ 1 GHz                 |  |
| Silicon area | $\leq 10 \ { m mm}^2$        |  |
| Pow. Den.    | $\leq 0.1 \; { m W/mm^2}$    |  |
| Area Eff.    | $\geq 100 \text{ Gb/s/mm}^2$ |  |
| Energy Eff.  | $\leq$ 1 pJ/bit              |  |

 Goal: Obtain the best coding gain per code family subject to these constraints

Does uncoded transmission meet the targets?

- Does uncoded transmission meet the targets?
- Uncoded transmission (design at 40nm, scaled to 7nm on paper)

|                                   | EPIC target | Uncoded             |
|-----------------------------------|-------------|---------------------|
| Technology (nm)                   | 7           | 7                   |
| Throughput (Tb/s)                 | 1           | 1                   |
| Clock freq. (GHz)                 | $\leq 1$    | 1                   |
| Silicon area (mm <sup>2</sup> )   | $\leq 10$   | 10                  |
| Pow. Den. (W/mm <sup>2</sup> )    | $\leq 0.1$  | $2.3 	imes 10^{-4}$ |
| Area Eff. (Gb/s/mm <sup>2</sup> ) | $\geq 100$  | 100                 |
| Energy Eff. (pJ/bit)              | $\leq 1$    | $2.3 	imes 10^{-3}$ |

- Does uncoded transmission meet the targets?
- Uncoded transmission (design at 40nm, scaled to 7nm on paper)

|                                   | EPIC target | Uncoded             |
|-----------------------------------|-------------|---------------------|
| Technology (nm)                   | 7           | 7                   |
| Throughput (Tb/s)                 | 1           | 1                   |
| Clock freq. (GHz)                 | $\leq 1$    | 1                   |
| Silicon area (mm <sup>2</sup> )   | $\leq 10$   | 10                  |
| Pow. Den. (W/mm <sup>2</sup> )    | $\leq 0.1$  | $2.3 	imes 10^{-4}$ |
| Area Eff. (Gb/s/mm <sup>2</sup> ) | $\geq 100$  | 100                 |
| Energy Eff. (pJ/bit)              | $\leq 1$    | $2.3 	imes 10^{-3}$ |

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

There is room for 1000/2.3 = 435 times more complex decoding operations relative to uncoded

- Does uncoded transmission meet the targets?
- Uncoded transmission (design at 40nm, scaled to 7nm on paper)

|                                 | EPIC target | Uncoded             |
|---------------------------------|-------------|---------------------|
| Technology (nm)                 | 7           | 7                   |
| Throughput (Tb/s)               | 1           | 1                   |
| Clock freq. (GHz)               | $\leq 1$    | 1                   |
| Silicon area (mm <sup>2</sup> ) | $\leq$ 10   | 10                  |
| Pow. Den. $(W/mm^2)$            | $\leq 0.1$  | $2.3 	imes 10^{-4}$ |
| Area Eff. $(Gb/s/mm^2)$         | $\geq 100$  | 100                 |
| Energy Eff. (pJ/bit)            | $\leq 1$    | $2.3 	imes 10^{-3}$ |

- There is room for 1000/2.3 = 435 times more complex decoding operations relative to uncoded
- Design space is narrowed significantly but still interesting

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where

 $\blacktriangleright \gamma$  is the throughput in b/s,

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where



 $\blacktriangleright$  Q is the quantizer precision in bits,

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

◆□▶ ◆□▶ ◆三▶ ◆三▶ - 三 - のへぐ

where

- $\gamma$  is the throughput in b/s,
- Q is the quantizer precision in bits,
- R is the code rate, and

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

▲□▶ ▲□▶ ▲三▶ ▲三▶ 三三 のへで

where

- $\gamma$  is the throughput in b/s,
- Q is the quantizer precision in bits,
- R is the code rate, and
- $f_c$  is the clock frequency.

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where



- Q is the quantizer precision in bits,
- R is the code rate, and
- $f_c$  is the clock frequency.

For  $\gamma = 1$  Tb/s, Q = 5 bits, R = 15/16, and  $f_c = 1$  GHz, the bus width is W = 5333 bits.

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where

- $\gamma$  is the throughput in b/s,
- Q is the quantizer precision in bits,
- R is the code rate, and
- $f_c$  is the clock frequency.

For  $\gamma = 1$  Tb/s, Q = 5 bits, R = 15/16, and  $f_c = 1$  GHz, the bus width is W = 5333 bits.

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

Better to use small Q and large R

Decoder input bus width

$$W = \frac{\gamma \times Q}{R \times f_c}$$

where

- $\gamma$  is the throughput in b/s,
- Q is the quantizer precision in bits,
- R is the code rate, and
- $f_c$  is the clock frequency.
- For  $\gamma = 1$  Tb/s, Q = 5 bits, R = 15/16, and  $f_c = 1$  GHz, the bus width is W = 5333 bits.

- Better to use small Q and large R
- Compress the LLR input as much as possible (syndrome techniques)

Decoder storage requirement

 $M = N \times P \times \overline{Q}$ 

◆□ ▶ ◆□ ▶ ◆三 ▶ ◆三 ▶ ● □ ● ● ●

where

Decoder storage requirement

$$M = N \times P \times \overline{Q}$$

where



Decoder storage requirement

$$M = N \times P \times \overline{Q}$$

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

where

N is the code block length,

P is the number of codewords being decoded in parallel

Decoder storage requirement

$$M = N \times P \times \overline{Q}$$

▲□▶ ▲□▶ ▲三▶ ▲三▶ 三三 のへで

where

- N is the code block length,
- P is the number of codewords being decoded in parallel
- $\overline{Q}$  is the average LLR precision

Decoder storage requirement

$$M = N \times P \times \overline{Q}$$

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

where

- N is the code block length,
- P is the number of codewords being decoded in parallel
- $\overline{Q}$  is the average LLR precision

• Use advanced quantization methods to reduce  $\overline{Q}$ 

#### Memory bottleneck

Decoder storage requirement

$$M = N \times P \times \overline{Q}$$

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

where

- N is the code block length,
- P is the number of codewords being decoded in parallel
- $\overline{Q}$  is the average LLR precision
- Use advanced quantization methods to reduce  $\overline{Q}$
- ► Keep *NP* small

## Polar code decoding algorithms

| Algorithm                                             | Computational            | Space      | Time              |  |
|-------------------------------------------------------|--------------------------|------------|-------------------|--|
| Туре                                                  | Complexity               | Complexity | Complexity        |  |
| SC                                                    | N log N                  | N          | N                 |  |
| BP                                                    | IN log N                 | N log N    | l log N           |  |
| SC-list                                               | LN log N                 | LN         | $N + K \log^2(L)$ |  |
| SC-stack                                              | DN log N                 | DN         | -                 |  |
| SC-soft-out                                           | IN log N                 | N log N    | IN                |  |
| SC-flip                                               | $N \log N(1 + P_e(SNR))$ | N          | IN                |  |
| MJL                                                   | KN <sup>log 3</sup>      | N          | $\log^2(N)$       |  |
| Sphere                                                | Cubic                    | -          | -                 |  |
| I: number of iterations, L: list size, D: stack depth |                          |            |                   |  |

Successive Cancellation (SC) decoder has the least computational and space complexity

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

## Polar code decoding algorithms

| Algorithm                                                                     | Computational            | Space      | Time              |  |
|-------------------------------------------------------------------------------|--------------------------|------------|-------------------|--|
| Туре                                                                          | Complexity               | Complexity | Complexity        |  |
| SC                                                                            | N log N                  | N          | N                 |  |
| BP                                                                            | IN log N                 | N log N    | l log N           |  |
| SC-list                                                                       | LN log N                 | LN         | $N + K \log^2(L)$ |  |
| SC-stack                                                                      | DN log N                 | DN         | -                 |  |
| SC-soft-out                                                                   | IN log N                 | N log N    | IN                |  |
| SC-flip                                                                       | $N \log N(1 + P_e(SNR))$ | N          | IN                |  |
| MJL                                                                           | KN <sup>log 3</sup>      | Ν          | $\log^2(N)$       |  |
| Sphere                                                                        | Cubic                    | -          | -                 |  |
| <i>I</i> : number of iterations, <i>L</i> : list size, <i>D</i> : stack depth |                          |            |                   |  |

- Successive Cancellation (SC) decoder has the least computational and space complexity
- Majority Logic (MJL) decoder has the least time complexity

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

## Polar code decoding algorithms

| Algorithm                                                                     | Computational            | Space      | Time              |  |
|-------------------------------------------------------------------------------|--------------------------|------------|-------------------|--|
| Туре                                                                          | Complexity               | Complexity | Complexity        |  |
| SC                                                                            | N log N                  | N          | N                 |  |
| BP                                                                            | IN log N                 | N log N    | l log N           |  |
| SC-list                                                                       | LN log N                 | LN         | $N + K \log^2(L)$ |  |
| SC-stack                                                                      | DN log N                 | DN         | -                 |  |
| SC-soft-out                                                                   | IN log N                 | N log N    | IN                |  |
| SC-flip                                                                       | $N \log N(1 + P_e(SNR))$ | N          | IN                |  |
| MJL                                                                           | KN <sup>log 3</sup>      | Ν          | $\log^2(N)$       |  |
| Sphere                                                                        | Cubic                    | -          | -                 |  |
| <i>I</i> : number of iterations, <i>L</i> : list size, <i>D</i> : stack depth |                          |            |                   |  |

- Successive Cancellation (SC) decoder has the least computational and space complexity
- Majority Logic (MJL) decoder has the least time complexity
- We present a solution that combines the two approaches

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへで

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



・ロト ・四ト ・ヨト ・ヨト

æ

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへで

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへで

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□ > ◆□ > ◆三 > ◆三 > ・三 ・ のへ(?)

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへで

 $\mathcal{O}(N \log N)$  computational complexity  $\mathcal{O}(N)$  space and time complexity



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへで

▶ Use length-8 MJL decoders for  $N \le 8$  polar codes to reduce the latency of pure SC decoding from 2N - 2 to  $\frac{9N}{8} - 2$ 



◆ロ ▶ ◆ □ ▶ ◆ 三 ▶ ◆ □ ● ● の へ ()・

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

• At length N = 8

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

- At length N = 8
  - use the Wagner decoder for single-parity-check codes

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

- At length N = 8
  - use the Wagner decoder for single-parity-check codes
  - decode rate-0 and rate-1 codes trivially

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ のQで

- At length N = 8
  - use the Wagner decoder for single-parity-check codes
  - decode rate-0 and rate-1 codes trivially
  - use ML decoder for repetition codes

▶ Use SC decoder to decompose a length-N polar code into two length-N/2 polar codes until length N = 8 is reached.

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ のQで

• At length N = 8

- use the Wagner decoder for single-parity-check codes
- decode rate-0 and rate-1 codes trivially
- use ML decoder for repetition codes
- use MJL decoder for all remaining length-8 codes

 Decoder can take a new CW at each time interval represented by clock cycles



◆□ > ◆□ > ◆豆 > ◆豆 > ̄豆 = つへで

t=1

- Decoder can take a new CW at each time interval represented by clock cycles
- Pipelining increases both hardware efficiency and power density

◆□ > ◆□ > ◆豆 > ◆豆 > ̄豆 = つへで

- Decoder can take a new CW at each time interval represented by clock cycles
- Pipelining increases both hardware efficiency and power density

▲□▶ ▲□▶ ▲ □▶ ▲ □▶ □ のへぐ

- Decoder can take a new CW at each time interval represented by clock cycles
- Pipelining increases both hardware efficiency and power density

$$W^{--}$$
 $4^{th}$  CW

  $W^{-+}$ 
 $3^{rd}$  CW

  $W^{+-}$ 
 $2^{nd}$  CW

  $W^{++}$ 
 $1^{st}$  CW

▲□▶ ▲□▶ ▲三▶ ▲三▶ 三三 のへで

#### t=4

#### Progressive quantization of LLRs inside the decoder



◆□▶ ◆御▶ ◆臣▶ ◆臣▶ 三臣 - 釣��

## SC Performance



・ロト・「聞・・問・・問・・ 「聞・・」

#### SC-MJL Performance



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで

## Effect of progressive quantization



◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで

#### Post-synthesis results at 7nm

| Decoding Algorithm                | SC    | SC-MJL | SC-MJL | SC-MJL       |
|-----------------------------------|-------|--------|--------|--------------|
| Quantization (bits)               | 6     | 6      | 5-to-1 | 5-to-1       |
| Combining used?                   | х     | ×      | ×      | $\checkmark$ |
| Throughput (Gb/s)                 | 1000  |        |        |              |
| Area (mm <sup>2</sup> )           | 10    |        |        |              |
| Area Eff. (Gb/s/mm <sup>2</sup> ) | 100   |        |        |              |
| Pow. Den. $(W/mm^2)$              | 0.19  | 0.13   | 0.10   | 0.04         |
| Energy Eff. (pJ/bit)              | 1.90  | 1.28   | 0.96   | 0.42         |
| Latency (Clock cycles)            | 157   | 127    | 127    | 40           |
| Freq. (MHz)                       | 585.5 |        |        |              |

Two independent pipelined decoders are used each operating at 500 Gb/s

## Post-synthesis results at 7nm

| Decoding Algorithm                | SC    | SC-MJL | SC-MJL | SC-MJL       |
|-----------------------------------|-------|--------|--------|--------------|
| Quantization (bits)               | 6     | 6      | 5-to-1 | 5-to-1       |
| Combining used?                   | х     | ×      | ×      | $\checkmark$ |
| Throughput (Gb/s)                 | 1000  |        |        |              |
| Area (mm <sup>2</sup> )           | 10    |        |        |              |
| Area Eff. (Gb/s/mm <sup>2</sup> ) | 100   |        |        |              |
| Pow. Den. $(W/mm^2)$              | 0.19  | 0.13   | 0.10   | 0.04         |
| Energy Eff. (pJ/bit)              | 1.90  | 1.28   | 0.96   | 0.42         |
| Latency (Clock cycles)            | 157   | 127    | 127    | 40           |
| Freq. (MHz)                       | 585.5 |        |        |              |

Two independent pipelined decoders are used each operating at 500 Gb/s

Memory dominates the area and energy efficiency



#### 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain

 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain

▲□▶ ▲□▶ ▲ □▶ ▲ □▶ □ のへぐ

The proposed solution brought together some existing techniques

- 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain
- The proposed solution brought together some existing techniques
  - SC decoder in initial stages of decoding where parallelism is high

- 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain
- The proposed solution brought together some existing techniques
  - SC decoder in initial stages of decoding where parallelism is high

◆□▶ ◆□▶ ◆□▶ ▲□▶ ▲□ ◆ ○ ◆ ○ ◆

MJL decoding for speeding up decisions

- 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain
- The proposed solution brought together some existing techniques
  - SC decoder in initial stages of decoding where parallelism is high

◆□▶ ◆□▶ ◆□▶ ▲□▶ ▲□ ◆ ○ ◆ ○ ◆

- MJL decoding for speeding up decisions
- Adaptive quantization to reduce memory usage

- 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain
- The proposed solution brought together some existing techniques
  - SC decoder in initial stages of decoding where parallelism is high

◆□▶ ◆□▶ ◆□▶ ▲□▶ ▲□ ◆ ○ ◆ ○ ◆

- MJL decoding for speeding up decisions
- Adaptive quantization to reduce memory usage
- Reducing pipeline depth by combining simple steps

- 1 Tb/s FEC appears feasible with 7nm technology with a 6 dB coding gain
- The proposed solution brought together some existing techniques
  - SC decoder in initial stages of decoding where parallelism is high

◆□▶ ◆□▶ ◆□▶ ▲□▶ ▲□ ◆ ○ ◆ ○ ◆

- MJL decoding for speeding up decisions
- Adaptive quantization to reduce memory usage
- Reducing pipeline depth by combining simple steps
- Storage complexity dominates the design

With VLSI technology reaching its limits, FEC designers will have to learn more about VLSI implementation constraints

- With VLSI technology reaching its limits, FEC designers will have to learn more about VLSI implementation constraints
- The situation is reminiscent of the first three decades of coding when hardware complexity was a binding constraint but hardware at that time was very simple

- With VLSI technology reaching its limits, FEC designers will have to learn more about VLSI implementation constraints
- The situation is reminiscent of the first three decades of coding when hardware complexity was a binding constraint but hardware at that time was very simple
- The discrepancy between the desired data rates and available clock frequency may never have been as high as today

- With VLSI technology reaching its limits, FEC designers will have to learn more about VLSI implementation constraints
- The situation is reminiscent of the first three decades of coding when hardware complexity was a binding constraint but hardware at that time was very simple
- The discrepancy between the desired data rates and available clock frequency may never have been as high as today

・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・
 ・

We may hope that new types codes will emerge as we understand VLSI complexity vs FEC performance better

#### Acknowledgments

This work has been carried out in part by support from EPIC project, with funding from the European Union's Horizon 2020 research and innovation programme under grant No. 760150.

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ のQで

We thank Y. Ertuğrul for help with simulations and figures.

# Thank you!