7.3. IIR Filter#

  • Consider an \(N\)-order IIR filter specified by (7.2) in the time domain or (7.4) in the \(z\)-domain. For the sake of drawing simpler SFGs, we assume below that the feedback order \(N\) is at least as large as the feedforward order \(M\), i.e., \(N \geq M\).

  • The transfer function \(H(z)\) of the IIR filter is given by

    (7.12)#\[\begin{align} H(z) &= \frac{\sum_{k=0}^M b_k z^{-k}}{\sum_{k=0}^N a_k z^{-k}} & ~~~(a_0=1). \end{align} \]

    Thus, the IIR filter can be interpreted as a cascade of a feedforward component, i.e., an \(M\)-order FIR filter with transfer function \(\sum_{k=0}^M b_k z^{-k}\), and a feedback component, i.e., an \(N\)-order IIR filter with transfer function \(\frac{1}{\sum_{k=0}^N a_k z^{-k}}\) and a trivial feedforward tap.

7.3.1. Direct-form Implementation#

  • The feedback component, i.e.,

    \[\begin{equation*} Y(z) = - \sum_{k=1}^N a_k z^{-k} Y(z), \end{equation*}\]

    can be implemented in the direct form similar to the development in Section 7.2.1.

  • When the feedforward FIR component is also implemented in the direct form, we obtain the following SFG of direct form I for the IIR filter:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{9pt}{\scriptsize b_0}\hspace{9pt}}\!\!\bigcirc\!\!\xrightarrow{\hspace{26pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{26pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] {\scriptsize z^{-1}} & \Big\downarrow \hspace{25pt} \Big\uparrow \hspace{27pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \\[-10pt] &\!\bigcirc\!\!\xrightarrow{\hspace{9pt}{\scriptsize b_1}\hspace{9pt}}\!\!\bigcirc \hspace{25.5pt} \!\bigcirc\!\!\xleftarrow{\hspace{7pt}{\scriptsize -a_1}\hspace{6pt}}\!\!\bigcirc \\[-0pt] {\scriptsize z^{-1}} & \Big\downarrow \hspace{25pt} \Big\uparrow \hspace{27pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \\[-0pt] & \,\vdots \hspace{29pt} \vdots \hspace{29pt} \,\vdots \hspace{30pt} \vdots \\[-0pt] {\scriptsize z^{-1}} & \Big\downarrow \hspace{25pt} \Big\uparrow \hspace{27pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \\[-10pt] &\!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_M}\hspace{8pt}}\!\!\bigcirc \hspace{25pt} \!\bigcirc\!\!\xleftarrow{\hspace{6pt}{\scriptsize -a_N}\hspace{5pt}}\!\!\bigcirc \end{align*}\]
  • Swapping the order of the feedforward and feedback components in the cascade above, we obtain following SFG of direct form II for the IIR filter:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\xrightarrow{\hspace{9pt}{\scriptsize b_0}\hspace{9pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \\[-10pt] & \!\bigcirc\!\!\xleftarrow{\hspace{6.5pt}{\scriptsize -a_1}\hspace{7pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{9pt}{\scriptsize b_1}\hspace{9pt}}\!\!\bigcirc \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \\[-0pt] & \,\vdots \hspace{30pt} \vdots \hspace{27.5pt} \,\vdots \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \\[-10pt] &\!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_M}\hspace{5pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_M}\hspace{8pt}}\!\!\bigcirc \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \\[-0pt] & \,\vdots \hspace{30pt} \vdots \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \\[-10pt] &\!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_N}\hspace{5.5pt}}\!\!\bigcirc \end{align*}\]
  • Below is an example HLS function that implements the direct-form II SFG above:

    // This is a Chebyshev type-II lowpass filter with M=N=6 (L=7) 
    const din_t b[ffL] = {
      0.168814352044553, 0.792090023955804, 1.723851142494651, 2.197013405545998,
      1.723851142494651, 0.792090023955804, 0.16881435204455
    };
    
    const din_t a[fbL] = {
      1.0, 1.776425956077127, 2.196211818205448, 1.583727976930950, 
      0.765387107901598, 0.216231507219810, 0.028540076201079
    };
    
    void iir(hls::stream<din_t> &in, hls::stream<dout_t> &out, int N) {
    
      static din_t w[fbL] = {};
    #pragma HLS array_partition variable=w type=complete
    
      sample_loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        delay_loop: for (int k=fbL-1; k>0; k--) {
          w[k] = w[k-1];
        }
        // Read in new sample from in
        w[0] = in.read();
        
        dout_t y = 0.0;
        acc_loop: for (int k=1; k<fbL; k++) {
          dout_t aw = a[k]*w[k];
          w[0] -= aw;
          if (k<ffL)
            y += b[k]*w[k];
        }
        y += b[0]*w[0]
    
        // Write to out
        out.write(y);
      }
    }
    
    • Vitis HLS gives a RTL implementation of sample_loop with II=2.

      Caution

      In acc_loop above, the accumulation operation w[0] -= a[k]*w[k] is re-factored into:

      dout_t aw = a[k]*w[k];
      w[0] -= aw;
      

      This re-factoring seems to be needed for Vitis HLS 2024.2; otherwise a balanced adder tree will not be synthesized and II=2 cannot be achived.

    • The bottleneck prevents achieving II=1 is the carried dependence that updating w[0] in an iteration of sample_loop requires the accumulation of a[k]*w[k] in acc_loop to complete first. Although acc_loop is automatically unrolled by Vitis HLS, it still takes at least two clock cycles to complete accumulation for the example IIR filter of order \(6\). As a result, II=1 cannot be achieved for this filter.

    • This bottleneck problem is inherent to the direct-form structure of the IIR implementation, and would be more severe as the order of the IIR filter increases. Thus, the direct-from implementation may not be suitable for a high throughput implemention of a higher-order IIR filter.

7.3.2. Transposed-form Implementation#

  • When both the feedforward and feedback components in the cascade are implemented in the transposed form similar to the development in Section 7.2.2, we obtain the following SFG of transposed from I:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\xrightarrow{\hspace{26pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{9pt}{\scriptsize b_0}\hspace{9pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] {\scriptsize z^{-1}} & \Big\uparrow \hspace{25pt} \Big\downarrow \hspace{27pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \\[-10pt] &\!\bigcirc\!\!\xleftarrow{\hspace{6pt}{\scriptsize -a_1}\hspace{6pt}}\!\!\bigcirc \hspace{25.5pt} \!\bigcirc\!\!\xrightarrow{\hspace{9pt}{\scriptsize b_1}\hspace{9pt}}\!\!\bigcirc \\[-0pt] {\scriptsize z^{-1}} & \Big\uparrow \hspace{25pt} \Big\downarrow \hspace{27pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \\[-0pt] & \,\vdots \hspace{29pt} \vdots \hspace{29pt} \,\vdots \hspace{30pt} \vdots \\[-0pt] {\scriptsize z^{-1}} & \Big\uparrow \hspace{25pt} \Big\downarrow \hspace{27pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \\[-10pt] &\!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_N}\hspace{4.5pt}}\!\!\bigcirc \hspace{25pt} \!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_M}\hspace{9pt}}\!\!\bigcirc \end{align*}\]
  • Swapping the order of the feedforward and feedback components in the cascade above, we obtain following SFG of transposed form II for the IIR filter:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{10pt}{\scriptsize b_0}\hspace{9pt}}\!\!\bigcirc \!\!\xrightarrow{\hspace{24pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-10pt] & \!\bigcirc\!\!\xrightarrow{\hspace{10pt}{\scriptsize b_1}\hspace{9pt}} \!\!\bigcirc\!\!\xleftarrow{\hspace{6.5pt}{\scriptsize -a_1}\hspace{6pt}}\!\!\bigcirc \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-0pt] & \,\vdots \hspace{30pt} \vdots \hspace{27.5pt} \,\vdots \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-10pt] &\!\bigcirc\!\!\xrightarrow{\hspace{8.5pt}{\scriptsize b_M}\hspace{8pt}} \!\!\bigcirc\!\!\xleftarrow{\hspace{5pt}{\scriptsize -a_M}\hspace{5pt}}\!\!\bigcirc \\[-0pt] & \hspace{33pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-0pt] & \hspace{33pt} \,\vdots \hspace{29.5pt} \vdots \\[-0pt] & \hspace{33pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-10pt] & \hspace{33pt} \!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_N}\hspace{5.5pt}}\!\!\bigcirc \end{align*}\]
  • Below is an example HLS function that implements the transposed-form II SFG above:

    #include <assert.h>
    #include <hls_stream.h>
    #include "iir.h"
    
    // This is a Chebyshev type II filter with M=N=6 (L=7) 
    const din_t b[ffL] = {0.168814352044553, 0.792090023955804,
      1.723851142494651, 2.197013405545998, 1.723851142494651, 
      0.792090023955804, 0.16881435204455};
    
    const din_t a[fbL] = {1.0, 1.776425956077127,
      2.196211818205448, 1.583727976930950, 0.765387107901598, 
      0.216231507219810, 0.028540076201079};
    
    void iir(hls::stream<din_t> &in, hls::stream<dout_t> &out, int N) {
    
      static dout_t u[fbL-1] = {};
    #pragma HLS array_partition variable=u type=complete
    
      sample_loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        din_t x = in.read();
        dout_t y = b[0]*x+u[0];
    #pragma HLS bind_op variable=y op=mul impl=dsp
        // Write to out
        out.write(y);
    
        dout_t bx;
        dout_t ay;
    #pragma HLS bind_op variable=bx op=mul impl=dsp
    #pragma HLS bind_op variable=ay op=mul impl=dsp
        delay_add_loop: for (int k=1; k<fbL; k++) {
          ay = a[k]*y;
          if (k<ffL) {
            bx = b[k]*x;
            if (k<fbL-1)
              u[k-1] = u[k]+bx-ay;
            else
              u[k-1] = bx-ay;
          } else {
            if (k<fbL-1)
              u[k-1] = u[k]-ay;
            else
              u[k-1] = -ay;
          }
        }
      }
    }
    
    void load(float *in, hls::stream<din_t> &buf, int N) { 
      assert(N%2==0);
      Read_Loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        buf.write(in[n]);
      }
    }
    
    void store(hls::stream<dout_t> &buf, float *out, int N) {
      assert(N%2==0);
      Write_Loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        out[n] = buf.read().to_float();
      }
    }
    
    void top(float *in, float *out, int N) {
    #pragma HLS interface mode=m_axi port=in depth=MAX_N
    #pragma HLS interface mode=m_axi port=out depth=MAX_N
    
      hls::stream<din_t> buf_in;
      hls::stream<dout_t> buf_out;
    
    #pragma HLS DATAFLOW
      load(in, buf_in, N);
      iir(buf_in, buf_out, N);
      store(buf_out, out, N);
    }
    
    • Vitis HLS gives a RTL implementation of sample_loop with II=1.

    • The structure of the transposed-form implementation “simplifies” the carried dependence in sample_loop to that the update y=b[0]*x+u[0] needs to complete before the calculation of a[1]*y for updating u[0] in the next iteration of the loop. Similar to the first-order IIR filter example in Section 4.2.1, the objective of II=1 can be achieved as long as the update y=b[0]*x+u[0] and the subsequent update of u[0] each can be achieved in a single and seperate clock cycle. This condition is easier to satisfied.

      Caution

      For our case, we need to use the DSP slices to implement the multipliers and set the timing uncertainty (margin) in the sythesis process to 18% in order to satisying the condition mentioned above to achieve II=1. Timing can still be met with this setting after implementation.

    • In general, the transposed-form structure is more suitable for higher speed implementation of IIR filters since the carried dependence induced by the transposed-form structure involves only a few multiplications, independent of the filter length, rather than the accumulation of many products.

7.3.3. Cascade-form Implementation#

  • As in Section 7.2.3, if the IIR filter taps in (7.12) are all real-valued, then the transfer function in (7.12) of the IIR filter can be factored into a product of the transfer functions of SoSs with real-valued taps in the form below:

    (7.13)#\[\begin{align} H(z) &= b_0 \prod_{k=1}^K \frac{1+b_{k,1} z^{-1} + b_{k,2} z^{-2}}{1+ a_{k,1} z^{-1} + a_{k,2} z^{-2}} \end{align} \]

    Hence, the IIR filter may be implemented as a cascade of all the SoSs.

  • Each SoS may be implemented in the direct form II or in the transposed form II. For example:

    • Cascade-form SFG with direct-form II SoSs:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \xrightarrow{\hspace{8pt}{\scriptsize b_0}\hspace{7pt}} & \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{24.5pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc \!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\xrightarrow{\hspace{24.5pt}}\!\!\bigcirc \!\!\longrightarrow \cdots \longrightarrow \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \hspace{19pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \hspace{54pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \\[-10pt] & \!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_{1,1}}\hspace{4pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{1,1}}\hspace{7pt}}\!\!\bigcirc \hspace{15pt} \bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_{2,1}}\hspace{4pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{7pt}{\scriptsize b_{2,1}}\hspace{7pt}}\!\!\bigcirc \hspace{17pt} \cdots \hspace{19pt} \bigcirc\!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -a_{K,1}}\hspace{3pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{6.5pt}{\scriptsize b_{K,1}}\hspace{6pt}}\!\!\bigcirc \\[-0pt] & \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \hspace{19pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \hspace{54.5pt} \Big\uparrow \hspace{26pt} \Big\downarrow {\scriptsize z^{-1}} \hspace{15pt} \Big\uparrow \\[-10pt] & \!\bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_{1,2}}\hspace{4pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{1,2}}\hspace{7pt}}\!\!\bigcirc \hspace{15.5pt} \bigcirc\!\!\xleftarrow{\hspace{5.5pt}{\scriptsize -a_{2,2}}\hspace{4pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{2,2}}\hspace{7pt}}\!\!\bigcirc \hspace{17pt} \cdots \hspace{18pt} \bigcirc\!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -a_{K,2}}\hspace{3pt}} \!\!\bigcirc\!\!\xrightarrow{\hspace{6.5pt}{\scriptsize b_{K,2}}\hspace{6pt}}\!\!\bigcirc \end{align*}\]
    • Cascade-form SFG with transposed-form II SoSs:

    \[\begin{align*} \!\bigcirc\kern-6.5pt\vcenter{\tiny x} \xrightarrow{\hspace{8pt}{\scriptsize b_0}\hspace{7pt}} & \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{24.5pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc \!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\xrightarrow{\hspace{24.5pt}}\!\!\bigcirc \!\!\longrightarrow \cdots \longrightarrow \!\bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{25pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \hspace{19pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \hspace{54pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-10pt] & \!\bigcirc \!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{1,1}}\hspace{7.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4pt}{\scriptsize -a_{1,1}}\hspace{4pt}}\!\!\bigcirc \hspace{16pt} \bigcirc \!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{2,1}}\hspace{7pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -a_{2,1}}\hspace{4pt}}\!\!\bigcirc \hspace{17pt} \cdots \hspace{19pt} \bigcirc \!\!\xrightarrow{\hspace{7pt}{\scriptsize b_{K,1}}\hspace{6pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{3.5pt}{\scriptsize -a_{K,1}}\hspace{3pt}}\!\!\bigcirc \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \hspace{19pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \hspace{54.5pt} \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{15pt} \Big\downarrow \\[-10pt] & \!\bigcirc \!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{1,2}}\hspace{7.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -a_{1,2}}\hspace{4pt}}\!\!\bigcirc \hspace{15.5pt}\bigcirc \!\!\xrightarrow{\hspace{8pt}{\scriptsize b_{2,2}}\hspace{7pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -a_{2,2}}\hspace{4pt}}\!\!\bigcirc \hspace{17pt} \cdots \hspace{19.5pt} \bigcirc \!\!\xrightarrow{\hspace{7pt}{\scriptsize b_{K,2}}\hspace{6.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{3.5pt}{\scriptsize -a_{K,2}}\hspace{2.5pt}}\!\!\bigcirc \end{align*}\]
  • Below is an example HLS function that implements the cascade-form SFG with transposed-form II SoSs:

    // This is a Chebyshev type-II lowpass filter with M=N=6 (L=7)
    // expressed as SoSs. 
    #define K 3
    const din_t b[K][2]={
      {1.931625159186722, 0.999999999999988},
      {1.540424279915726, 1.000000000000037},
      {1.220028067120242, 0.999999999999978}
    };
    const din_t a[K][2]={
      {0.637344246881063, 0.128737933973358},
      {0.576772441445408, 0.316322235552652},
      {0.562309267750655, 0.700839985387954},
    };
    const din_t b0=0.168814352044553;
    
    void iir2nd(dout_t &in, dout_t &out, dout_t &w1, dout_t &w2, 
                const din_t b[2], const din_t a[2]) {
    #pragma HLS inline
    
      // Write to out
      out = in+w1;
      // Update register
      dout_t bx = b[0]*in;
      dout_t ay = a[0]*out;
    #pragma HLS bind_op variable=bx op=mul impl=dsp
    #pragma HLS bind_op variable=ay op=mul impl=dsp
      w1 = w2+bx-ay;
      bx = b[1]*in;
      ay = a[1]*out;
      w2 = bx-ay;
    }
    
    void iir(hls::stream<din_t> &in, hls::stream<dout_t> &out, int N) {
    
      static dout_t u[K+1];
      static dout_t w[K][2] = {};
    #pragma HLS array_partition variable=u type=complete
    #pragma HLS array_partition variable=w type=complete
    
      sample_loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        u[0] = b0*in.read();
        cascade_loop: for (int k=0; k<K; k++) {
          iir2nd(u[k], u[k+1], w[k][0], w[k][1], &b[k][0], &a[k][0]);
        }
        out.write(u[K]);
      }
    }
    

    Vitis HLS gives a RTL implementation of sample_loop with II=1 (with clock uncertainty set to 18%). The latency of sample_loop is higher than that achieved using the transposed-form implementation.

7.3.4. Parallel-form Implementation#

  • For an IIR filter (\(N \geq M\)) with real-valued taps and single-order poles, the transfer function \(H(z)\) in (7.12) can be expanded into a sum of partial fractions :

    (7.14)#\[\begin{align} H(z) &= B_0 + \sum_{k=1}^K \frac{B_{k,0} + B_{k,1} z^{-1}}{1 + A_{k,1} z^{-1} + A_{k,2} z^{-2}} \end{align} \]

    where all the \(A\) and \(B\) coefficients are real-valued.

  • Based on (7.14), we may implement the IIR filter as parallel second-order IIR components in the following SFG:

    \[\begin{align*} & \!\bigcirc\!\!\xrightarrow{\hspace{32pt}{\scriptsize B_{0}}\hspace{31.5pt}}\!\!\bigcirc \\[-10pt] \nearrow \hspace{3pt} & \hspace{72pt} \searrow \\[-10pt] \!\bigcirc\kern-6.5pt\vcenter{\tiny x}\longrightarrow \!\!\bigcirc\!\!\longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{7pt}{\scriptsize B_{1,0}}\hspace{6pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{25.5pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc\kern-6.5pt\vcenter{\tiny y} \\[-0pt] \Big| \hspace{19pt} & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \hspace{18.5pt} \Big\uparrow \\[-10pt] \big| \hspace{19.0pt} & \!\bigcirc \!\!\xrightarrow{\hspace{7pt}{\scriptsize B_{1,1}}\hspace{6.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4pt}{\scriptsize -A_{1,1}}\hspace{4pt}}\!\!\bigcirc \hspace{18.2pt} \big| \\[-0pt] \Big| \hspace{19pt} & \hspace{33pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \hspace{19.8pt} \Big| \\[-10pt] \Big\downarrow \hspace{17.5pt} & \hspace{31.5pt}\bigcirc \!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -A_{1,2}}\hspace{4pt}}\!\!\bigcirc \hspace{17.8pt} \Big| \\[-0pt] \!\!\bigcirc\!\!\longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{7pt}{\scriptsize B_{2,0}}\hspace{6pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{25.5pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc \\[-0pt] \Big| \hspace{19pt} & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \hspace{18.5pt} \Big\uparrow \\[-10pt] \big| \hspace{19.0pt} & \!\bigcirc \!\!\xrightarrow{\hspace{7pt}{\scriptsize B_{2,1}}\hspace{6.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4pt}{\scriptsize -A_{2,1}}\hspace{4pt}}\!\!\bigcirc \hspace{18.2pt} \big| \\[-0pt] \Big| \hspace{19pt} & \hspace{33pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \hspace{19.8pt} \Big| \\[-10pt] \Big\downarrow \hspace{17.5pt} & \hspace{31.5pt}\bigcirc \!\!\xleftarrow{\hspace{4.5pt}{\scriptsize -A_{2,2}}\hspace{4pt}}\!\!\bigcirc \hspace{17.8pt} \Big| \\[-0pt] \vdots \hspace{19.5pt} & \hspace{35pt} \vdots \hspace{54.8pt} \vdots \\[-0pt] \Big\downarrow \hspace{17.5pt} & \hspace{90.5pt} \Big\uparrow \\[-10pt] \!\!\bigcirc\!\!\longrightarrow & \!\bigcirc\!\!\xrightarrow{\hspace{5pt}{\scriptsize B_{K,0}}\hspace{6pt}}\!\! \bigcirc\!\!\xrightarrow{\hspace{25.5pt}}\!\!\bigcirc \!\!\longrightarrow\!\!\bigcirc \\[-0pt] & \Big\downarrow \hspace{26pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \\[-10pt] & \!\bigcirc \!\!\xrightarrow{\hspace{5pt}{\scriptsize B_{K,1}}\hspace{6.5pt}}\!\!\bigcirc \!\!\xleftarrow{\hspace{4pt}{\scriptsize -A_{K,1}}\hspace{2pt}}\!\!\bigcirc \\[-0pt] & \hspace{33pt} \Big\uparrow {\scriptsize z^{-1}} \hspace{16pt} \Big\downarrow \\[-10pt] & \hspace{31.5pt}\bigcirc \!\!\xleftarrow{\hspace{4pt}{\scriptsize -A_{K,2}}\hspace{2.5pt}}\!\!\bigcirc \end{align*}\]

    where each second-order component IIR filter is implemented in the transposed form II.

  • Below is an example HLS function that implements the parallel-form SFG with transposed-form II second-order components

    // This is a Chebyshev type-II lowpass filter with M=N=6 (L=7)
    // expressed as a parallel sum of second-order components. 
    #define K 3
    const din_t B[K][2]={
      { 0.182834372101273, -0.306910458916128},
      {-1.151247645072097, -1.125034655836861},
      {-4.777765413376408, -1.682130810244864}
    };
    const din_t A[K][2]={
      {0.562309267750655, 0.700839985387954},
      {0.576772441445408, 0.316322235552652},
      {0.637344246881063, 0.128737933973358},
    };
    const din_t B0=5.914993038391787;
    
    void piir2nd(din_t &in, dout_t &out, dout_t &w1, dout_t &w2,
                 const din_t B[2], const din_t A[2]) {
    #pragma HLS inline
    
      dout_t bx = B[0]*in;
    #pragma HLS bind_op variable=bx op=mul impl=dsp
      // Write to out
      out = bx+w1;
    
      // Update register
      dout_t ay = A[0]*out;
    #pragma HLS bind_op variable=ay op=mul impl=dsp
      bx = B[1]*in;
      w1 = w2+bx-ay;
      ay = A[1]*out;
      w2 = -ay;
    }
    
    void iir(hls::stream<din_t> &in, hls::stream<dout_t> &out, int N) {
       static dout_t w[K][2] = {};
    #pragma HLS array_partition variable=w type=complete
    
      sample_loop: for (int n=0; n<N; n++) {
    #pragma HLS loop_tripcount max=MAX_N
        din_t x = in.read();
        dout_t y = B0*x;
        parallel_loop: for (int k=0; k<K; k++) {
          dout_t yy;
          piir2nd(x, yy, w[k][0], w[k][1], &B[k][0], &A[k][0]);
          y += yy;
        }
        out.write(y);
      }
    }
    

    Vitis HLS gives a RTL implementation of sample_loop with II=1 (with clock uncertainty set to 18%).