# Frequency-Modulated Continuous-Wave Radar

## Table of Contents

Figure 1: Real-time range detection performed by radar. Distance (y-axis) is in meters and the time (x-axis) is in seconds. Magnitude is in dBFS relative to the max ADC input.

## 1. Introduction

This blog presents the key details of an FPGA-based radar I developed and built, including a summary of the relevant physics followed by more in-depth descriptions of the various hardware components and related FPGA and PC (linux) programming. It focuses on design and implementation details rather than how to use this radar. For usage information see the README. My formal education is in physics, not electrical engineering or software programming. But after a few years trading financial derivatives I decided that I wanted to do something more fun and challenging, and I chose this project to develop several areas of technical knowledge — including electrical engineering and FPGA programming — and apply them in a practical way.

I was originally motivated by an excellent design for a radar by Henrik Forsten that provides a valuable description of many key components. My debt to Henrik and appreciation of his work are great. Building on what he did, I created a design that uses an FPGA in a much more significant way — to perform all major real-time processing of the radar reception data — and wrote the interfacing host PC code from scratch. This met my twin goals of being able to configure hardware for large-volume data processing and developing a diverse set of electrical engineering and programming skills.

The biggest and most rewarding challenge in this project involved the FPGA gateware design and programming (described here). Other aspects of the project — characterizing the radar physics, designing and configuring elements of the printed circuit board (PCB) to implement numerous sub-processes and to avoid or eliminate signal problems, simulating and designing the distributed microwave structures, and programming the PC to interface with the FPGA and other hardware elements — were also uniquely valuable. In particular:

- I've completely rewritten the FPGA code so that all the signal processing is now performed in real time on the FPGA. I've done this without changing the actual FPGA device used. I've also added tests and some formal verification to accompany the new code.
- I've written a high-level Python interface to the OpenEMS electromagnetic wave solver and simulated and designed the microwave parts of the radar using this tool. These simulations are described in the RF Simulation section of this post.
- I've rewritten the host PC software from scratch to accommodate the new FPGA code.
- I've redone the PCB layout from scratch in order to modify the design for a deprecated part.
- I designed and 3D printed horn antennas and a mount for the radar.
- I've added a few analog simulations of relevant parts of the PCB using Ngspice.

The full project code is open source and can be viewed on GitHub. I hope you gain some benefit from what follows. Please do not hesitate to contact me if you have any questions.

## 2. Miscellaneous Notes

Several preliminary comments about this post are in order.

Although I've put a lot of time into this project, it is still a work in progress. In particular, there have been two major iterations of the hardware design, referred to as rev1 and rev2 (both revisions are maintained in the source code repository). All results presented in this post are from rev1 because I have not yet built rev2. However, these results pointed to a few shortcomings that I designed rev2 to fix. Because this post is meant as a description of the right way to do things, it presents the design considerations for rev2. In several instances, however, I have made references to aspects of rev1 in order to explain shortcomings in certain results.

Finally, a number of sections go into a considerable amount of detail. While I believe these sections are very valuable, I don't expect everyone to read them in full, so I've placed a link at the beginning of them to allow the reader to jump to the next high-level section. Still, the detail's there and I encourage you to read it.

## 3. Overview

At a very high level, a radar (depending on the technology used) can detect the distance to a remote object, the incident angle to that object, the speed of that object, or some combination of these. This radar currently just detects distance. The hardware supports angle detection too (and Henrik's version does this), but at this stage I have not yet written the code for that.^{1}

Figures 3 and 4 show what the radar looks like with the 3D-printed antennas and mount.

Figure 1 shows the output produced when observing cars on a highway from an overhead bridge.

Broadly, the radar works by emitting a frequency ramp signal that reflects off a remote object and is detected by a receiver. Because the transmitted signal travels through air at the speed of light and its frequency ramps at a known rate, the frequency difference between the transmitted and received signals can be used to determine the distance to the remote object.

## 4. Physics

### 4.1. Operating Principle

The distance to a remote object is

$d=\frac{c\Delta t}{2},$ | (4.1.1) |

where $\Delta t$ is the time delay between the transmitted and received signals and $c=$299\,792\,458\text{\,}\mathrm{m}\mathrm{/}\mathrm{s}$$ is the speed of light in air. The factor of $2$ accounts for the fact that the signal must travel the round-trip distance. The transmitted and received signals at the input to the mixer (see the PCB block diagram) are shown in Figure (as pictured, these are sawtooth ramp signals with a time delay between ramps).

The ramp slope and duration are explicitly programmed. So, by finding the frequency difference between the transmitted and received signals, we can find the time difference and thus the distance:

$\displaystyle\Delta t$ | $\displaystyle=\frac{\Delta ft_{\mathrm{ramp}}}{f_{\mathrm{ramp}}},$ | (4.1.2a) | ||

$\displaystyle d$ | $\displaystyle=\frac{c\Delta ft_{\mathrm{ramp}}}{2f_{\mathrm{ramp}}}.$ | (4.1.2b) |

To find the frequency difference we note that the transmitted signal has the following form (I've restricted $t$ and $t-\Delta t$ to be within the bounds of a single ramp period, which is similarly restricted by the implementation):

$\displaystyle s_{t}(t)$ | $\displaystyle=T\sin\left(\omega(t)t\right),$ | (4.1.3a) | ||

$\displaystyle=T\sin\left(2\pi f(t)t\right),$ | (4.1.3b) | |||

$\displaystyle=T\sin\left(2\pi t\left(f_{0}+t\frac{f_{\mathrm{ramp}}}{t_{% \mathrm{ramp}}}\right)\right),$ | (4.1.3c) |

where $T$ is the transmission signal amplitude.

The received signal is just a time-delayed copy of the transmitted signal with a different amplitude. The amplitude changes because energy is lost as the wave propagates through air and the signal is amplified by the receiver. Energy is proportional to the amplitude squared. The expression for the received signal is

$\displaystyle s_{r}(t)$ | $\displaystyle=\frac{R}{T}s_{t}(t-\Delta t),$ | (4.1.4a) | ||

$\displaystyle s_{r}(t)$ | $\displaystyle=R\sin\left(\omega(t-\Delta t)\left(t-\Delta t\right)\right),$ | (4.1.4b) | ||

$\displaystyle=R\sin\left(2\pi f(t-\Delta t)\left(t-\Delta t\right)\right).$ | (4.1.4c) |

We can also write this expression as

$s_{r}=R\sin{\left(at-a\Delta t\right)},$ | (4.1.5) |

where $a\equiv 2\pi f(t-\Delta t)$. It's easy to see that this is approximately equal to $R\sin{\left(at\right)}$. Look back at Eq. (4.1.1). If we take the max distance to be $1\text{\,}\mathrm{k}\mathrm{m}$ (see the range section), the maximum time delay is less than $6.7\text{\,}\mathrm{\SIUnitSymbolMicro}\mathrm{s}$. The time delay is usually much less, since we typically restrict ourselves to $250\text{\,}\mathrm{m}$. In contrast, the value of $t$ during a sweep ranges from $0-1$\mathrm{m}\mathrm{s}$$. So, for almost the entire ramp period, it's fair to assume $t>>\Delta t$. Using this approximation, the equation for our received signal turns into

$\displaystyle s_{r}(t)$ | $\displaystyle\approx R\sin{\left(at\right)},$ | (4.1.6a) | ||

$\displaystyle s_{r}(t)$ | $\displaystyle\approx R\sin\left(2\pi t\left(f_{0}+(t-\Delta t)\frac{f_{\mathrm% {ramp}}}{t_{\mathrm{ramp}}}\right)\right).$ | (4.1.6b) |

The product of two sine functions can be equivalently represented as

$\sin{\theta}\sin{\phi}=\frac{1}{2}\left[\cos{\left(\theta-\phi\right)}-\cos{% \left(\theta+\phi\right)}\right].$ | (4.1.7) |

In other words, as a combination of a sinusoid of the frequency sum and a sinusoid of the frequency difference. It's precisely the frequency difference that we want. Moreover, the frequency sum will be much higher than the frequency difference (about 4 orders of magnitude in our case). As a result, the circuitry that we use to pick up the difference frequency won't detect the sum frequency. Using this knowledge, we see that if we multiply the transmitted and received signals we get

$\displaystyle m$ | $\displaystyle\approx TR\sin\left(2\pi t\left(f_{0}+t\frac{f_{\mathrm{ramp}}}{t% _{\mathrm{ramp}}}\right)\right)\sin\left(2\pi t\left(f_{0}+(t-\Delta t)\frac{f% _{\mathrm{ramp}}}{t_{\mathrm{ramp}}}\right)\right),$ | (4.1.8a) | ||

$\displaystyle\approx\frac{TR}{2}\cos{\left(2\pi t\Delta t\frac{f_{\mathrm{ramp% }}}{t_{\mathrm{ramp}}}\right)},$ | (4.1.8b) | |||

$\displaystyle\approx\frac{TR}{2}\cos{\left(2\pi t\Delta f\right)}.$ | (4.1.8c) |

So, using a Fourier transform on the mixer output will tell us the frequency difference and thus the distance.

### 4.2. Range

It's possible to relate the range of a remote object to the power received at a reception antenna using something known as the radar range equation. I've derived this equation in the range equation derivation section. The resulting equation is

$R=\sqrt[4]{\frac{P_{\mathrm{T}}G^{2}\lambda^{2}\sigma}{P_{\mathrm{R}}\left(4% \pi\right)^{3}}}.$ | (4.2.1) |

All of these variables except reception power are constant and reasonably easy to calculate. They are presented in Table 6.

(See the power amplifier section for how the transmitted power was determined, and the horn antenna section for the antenna gain.)

Figure 6: Values in range equation.

This leaves us with a function relating range to received signal power. For a reason that will be clear momentarily, we've used received signal power as the dependent variable in Figure 7.

In the maximum and minimum received power sections, we derive the receiver power range as a function of the measured difference frequency. This frequency can be directly related to the distance using Eq. (4.1.2b). Although we can change the ramp duration and frequency bandwidth to our liking, for this calculation I'll use the software defaults of $t_{\mathrm{ramp}}=$1\text{\,}\mathrm{m}\mathrm{s}$$ and $f_{\mathrm{ramp}}=$300\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$$. I've replotted the range plots from those sections as a function of distance in Figure 8, along with the received signal power we just found. Additionally, I've added $20\text{\,}\mathrm{d}\mathrm{B}$ to the minimum power. The reason for this is that the signal has to be somewhat stronger than the background noise to be detectable. The choice of $20\text{\,}\mathrm{d}\mathrm{B}$ is somewhat arbitrary, but is reasonable and frequently used.

The visible range corresponds to the range where the actual received power is between the minimum and maximum detectable powers. Additionally, we're restricted to distances corresponding to frequencies less than the Nyquist frequency ($1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$, or $500\text{\,}\mathrm{m}$). From Figure 8 we see that this corresponds to distances between about $6\text{\,}\mathrm{m}$ and $450\text{\,}\mathrm{m}$.

It is important to note that this visible range (and the calculations performed in the minimum and maximum received power sections) correspond to just one possible radar configuration. For instance, it is possible to reconfigure the radar to work for longer ranges without making any hardware or FPGA modifications.

From the Operating Principle section (specifically, Eq. (4.1.2b)), we see that we can can perform some combination of decreasing $f_{\mathrm{ramp}}$ and increasing $t_{\mathrm{ramp}}$ to increase the range. Let's set $f_{\mathrm{ramp}}=$150\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$$ and $t_{\mathrm{ramp}}=$2\text{\,}\mathrm{m}\mathrm{s}$$. Figure 9 shows the range in this configuration.

This plot shows that we should be able to detect remote objects out to $1\text{\,}\mathrm{k}\mathrm{m}$. Moreover, we can probably achieve even longer ranges by adjusting $f_{\mathrm{ramp}}$ and $t_{\mathrm{ramp}}$ further.

Finally, it's worth mentioning that in the old version of the PCB that I'm currently using for testing, the noise floor is higher than it should be (the new version is designed and should fix this issue, but is not yet ordered and assembled). I'm measuring a noise floor of $-90\text{\,}\mathrm{d}\mathrm{B}\mathrm{F}\mathrm{S}$ rather than the theoretically-predicted value of $-102\text{\,}\mathrm{d}\mathrm{B}\mathrm{F}\mathrm{S}$. The difference has the effect of increasing the minimum power line in each of the above plots by that amount. This decreases the maximum range on the old version somewhat.

#### 4.2.1. Range Equation Derivation

Next high-level section: PCB Design

Most of the steps in this derivation are identical to the derivation for the well-known Friis transmission formula.

Start by imagining that our transmission antenna is isotropic. The power density at some remote distance $R$ is given by

$p_{\mathrm{i}}=\frac{P_{\mathrm{T}}}{4\pi R^{2}},$ | (4.2.1.1) |

where $P_{\mathrm{T}}$ is the transmitted power. The $4\pi R^{2}$ comes from the fact that the isotropic antenna by definition radiates power equally in all directions and therefore the power density at some distance is inversely related to the surface area of the sphere centered at the transmission antenna. Now we replace the isotropic antenna with an antenna with gain $G_{\mathrm{T}}$ in the direction of $R$ and the equation becomes

$p=\frac{P_{\mathrm{T}}G_{\mathrm{T}}}{4\pi R^{2}}.$ | (4.2.1.2) |

At this distance, $R$, the transmitted signal reflects off a remote object. The total reflected power is

$\displaystyle P_{\mathrm{R}}$ | $\displaystyle=p\sigma,$ | (4.2.1.3a) | ||

$\displaystyle=\frac{P_{\mathrm{T}}G_{\mathrm{T}}\sigma}{4\pi R^{2}},$ | (4.2.1.3b) |

where $\sigma$ is the cross-sectional area of the remote object. This is a bit of a simplification, since the total reflected power will be dependent on several other factors such as the material properties of the remote object and its shape. We lump all these additional factors into the cross-sectional area, which therefore generally is not exactly equal to the object's actual cross-sectional area.

We now imagine the remote object simply as a remote isotropic antenna with transmission power given by the reflected power shown above. And we use the same method as the first step to find the power density available at the reception antenna:

$\displaystyle p_{\mathrm{R}}$ | $\displaystyle=\left(\frac{P_{\mathrm{T}}G_{\mathrm{T}}\sigma}{4\pi R^{2}}% \right)\left(\frac{1}{4\pi R^{2}}\right),$ | (4.2.1.4a) | ||

$\displaystyle=\frac{P_{\mathrm{T}}G_{\mathrm{T}}\sigma}{\left(4\pi R^{2}\right% )^{2}}.$ | (4.2.1.4b) |

To find the actual received power, we multiply the power density by the reception antenna's effective aperture and efficiency:

$\displaystyle P_{\mathrm{R}}$ | $\displaystyle=\frac{P_{\mathrm{T}}G_{\mathrm{T}}\sigma}{\left(4\pi R^{2}\right% )^{2}}A_{\mathrm{e}}\eta_{\mathrm{rad}},$ | (4.2.1.5a) | ||

$\displaystyle A_{\mathrm{e}}\eta_{\mathrm{rad}}$ | $\displaystyle=\frac{G_{\mathrm{R}}\lambda^{2}}{4\pi},$ | (4.2.1.5b) | ||

$\displaystyle P_{\mathrm{R}}$ | $\displaystyle=\frac{P_{\mathrm{T}}G_{\mathrm{T}}G_{\mathrm{R}}\lambda^{2}% \sigma}{\left(4\pi\right)^{3}R^{4}}.$ | (4.2.1.5c) |

This is the final result. We could equivalently rearrange our equation in terms of the distance,

$R=\sqrt[4]{\frac{P_{\mathrm{T}}G_{\mathrm{T}}G_{\mathrm{R}}\lambda^{2}\sigma}{% P_{\mathrm{R}}\left(4\pi\right)^{3}}}.$ | (4.2.1.6) |

In our case the transmission and reception antennas are identical, so the equation simplifies to

$R=\sqrt[4]{\frac{P_{\mathrm{T}}G^{2}\lambda^{2}\sigma}{P_{\mathrm{R}}\left(4% \pi\right)^{3}}}.$ | (4.2.1.7) |

## 5. PCB Design

Figure 10 shows a simplified implementation of how the radar PCB works. The data processing (shown as part of the FPGA section) is more flexible in the actual implementation. Notably, the user can decide what processing to perform and whether this processing should be performed on the FPGA or on the host PC (see the FPGA and software sections for more detail).

A frequency synthesizer generates a sinusoidal signal that ramps in frequency from $5.6$ to $5.9\text{\,}\mathrm{G}\mathrm{H}\mathrm{z}$ over a duration of $1\text{\,}\mathrm{m}\mathrm{s}$ (the ramp start, stop and duration are all user-configurable at runtime in software). The signal is then amplified and most of the power is sent to the transmission antenna. The remaining power is redirected to a mixer for multiplication with the received signal. The echo of the transmitted signal is picked up by a reception antenna and amplified with a low-noise amplifier followed by a high-frequency amplifier. This amplified signal is mixed with the coupled portion of the transmitted signal and output as a differential signal. The mixer output is passed through a lowpass filter, which, together with an intermediate frequency amplifier, passes and amplifies signals between about $10\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$ and $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. The signal is then digitized at a sampling rate of $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ and passed to the FPGA. The FPGA first uses a polyphase FIR filter to simultaneously filter signals of frequency greater than $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ and downsample the signal by a factor of 20, which reduces the computational load for subsequent processing/transmission stages. The signal is then multiplied by a kaiser window and finally transformed into its frequency composition with a 1024-point fast Fourier transform (FFT). The frequency bins are then sent to a host PC for further processing and real-time plotting.

### 5.1. Transmitter

The transmitter is responsible for generating a frequency ramp that is transmitted through space by an antenna and coupled back into the receiver. An ADF4158 frequency synthesizer and voltage-controlled oscillator (VCO) together generate the ramp signal. A resistive splitter then divides the signal, sending a quarter of the input power to the power amplifier for transmission and another quarter back to the frequency synthesizer (see the frequency synthesizer section). The power amplifier amplifies the signal, most of which is sent to the antenna for transmission. The remainder is coupled back to the receiver. The transmitter is shown in Figure 11.

Figure 11: Transmitter block diagram. I included a transmission line after the VCO to indicate that a transmission line with characteristic impedance of $50\text{\,}\Omega$ is required for each connection after the VCO. I omitted most of them in the diagram to save space, but they are present on the physical PCB.

#### 5.1.1. Frequency Synthesizer

Next high-level section: Receiver

The ADF4158 frequency synthesizer is based on a fractional-N phase-locked loop (PLL) design. The device is highly configurable, which can be viewed in the datasheet. Chapter 13 of the Art of Electronics (3d ed.) (AoE) provides an excellent description of how a PLL works. I explain the relevant points here. The frequency synthesizer consists of a phase detector and VCO (our VCO is an external component). Figure 12 shows a block diagram of the synthesizer (note that this diagram has been adapted from AoE).

The phase detector, as the name suggests, outputs a voltage signal which corresponds to the difference in phase between two input frequencies. The VCO generates a frequency that is proportional to an input voltage. (Disregard the frequency divider blocks for a minute. We'll get back to them.) $f_{\mathrm{REF}}$ is a reference frequency. In our case this is a $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ clock frequency. Imagine that the rising edge of $f_{\mathrm{REF}}$ occurs before the rising edge of $f_{\mathrm{VCO}}$. In response, the phase detector increases it's output voltage corresponding to the duration of time that $f_{\mathrm{REF}}$ leads $f_{\mathrm{VCO}}$. This causes the VCO output frequency to increase accordingly and causes the next edge of the VCO output to occur sooner than the previous one. So, the phase gap diminishes or reverses. The opposite occurs when $f_{\mathrm{REF}}$ lags behind $f_{\mathrm{VCO}}$. The phase detector is detecting differences in phase, not frequency. However, any differences in frequency will lead to phase differences, and thus the phase detector will cause the frequency and phase of $f_{\mathrm{VCO}}$ to converge to that of $f_{\mathrm{REF}}$.

All we've done so far is take $f_{\mathrm{REF}}$ and generate another signal $f_{\mathrm{VCO}}$ with identical frequency and phase, which isn't particularly useful. This is where the frequency dividers come in. The R divider takes $f_{\mathrm{REF}}$ and outputs a frequency $f_{\mathrm{REF}}/r$. The N divider does the same thing for $f_{\mathrm{VCO}}$. So, what we're doing is setting

$\displaystyle\frac{f_{\mathrm{VCO}}}{n}$ | $\displaystyle=\frac{f_{\mathrm{REF}}}{r},$ | (5.1.1.1a) | ||

$\displaystyle f_{\mathrm{VCO}}$ | $\displaystyle=f_{\mathrm{REF}}\left(\frac{n}{r}\right).$ | (5.1.1.1b) |

By setting $n$ much larger than $r$, we can generate an output frequency that is much higher than the reference frequency. Figure 12 isn't completely accurate. What we've shown is an integer-N frequency synthesizer, whereas the device we're using is a fractional-N frequency synthesizer. The practical difference is that our value of $n$ can be non-integral, which allows finer-grained control of the output frequency.

#### 5.1.2. Resistive Power Splitter

Next high-level section: Receiver

A resistive splitter is a circuit that splits the power at its input between two outputs. The splitter used in this design splits the power evenly between both outputs, though there are splitters designed for uneven splits. Because the splitter is resistive, it loses half of the input power to heat. This leaves $6\text{\,}\mathrm{d}\mathrm{B}$ below the input power at each output. A common alternative to a resistive splitter is a Wilkinson power divider. The power divider has the benefit of being lossless. That is, the power at each output will be $3\text{\,}\mathrm{d}\mathrm{B}$ below the input. However, the price paid for this improved efficiency is bandwidth. The reason for this is that each branch of a Wilkinson divider is set to a quarter wavelength. Because wavelength is inversely related to frequency, this will only be suitable for a limited frequency band. The power divider, by contrast, just requires 3 resistors and so its bandwidth is defined by the parasitic values of these resistors.

Our requirements for the splitter or divider are not very demanding. The bandwidth is limited and the power available to the splitter is enough for each output even with the loss. Also, the power is low enough that the heat dissipation doesn't pose a problem. Therefore, using either of these devices would have worked fine. I've opted to use the splitter since this allows us to use one less attenuator and therefore costs marginally less.

#### 5.1.3. Attenuator

Next high-level section: Receiver

An attenuator is a relatively simple circuit that produces an output power which is a desired fraction of the input power. Although I've used an IC for tighter tolerance and better frequency characteristics, the equivalent schematic is very simple and shown in Figure 13. There are several different resistor configurations to create an attenuator. The Pi attenuator shown below is just one of them.

We can get an even more complete picture if we explicitly show the source and load resistance. This is shown in Figure 14.

Typically, our source and load impedance will be matched to the characteristic impedance: $R_{\mathrm{S}}=R_{\mathrm{L}}=Z_{0}$. Now we need to choose $R_{1}$ and $R_{2}$ for the desired attenuation, but subject to the constraint that the attenuator does not present an impedance mismatch and cause signal reflections. To satisfy this, we constrain the Thévenin network of resistors to the right of $R_{\mathrm{S}}$ to be equal to the characteristic impedance, $Z_{0}$. It is then a simple matter of algebra to show that this sets the relation between $R_{1}$, $R_{2}$ and $Z_{0}$ to be

$R_{2}=\frac{2\,R_{1}Z_{0}^{2}}{R_{1}^{2}-Z_{0}^{2}}.$ | (5.1.3.1) |

To find the attenuation, we need to find the voltage transfer function for the attenuator, $V_{\mathrm{out}_{\mathrm{a}}}/V_{\mathrm{in}_{\mathrm{a}}}$. The attenuation will be the decibel value of the inverse of this ratio:

$\displaystyle I_{2}$ | $\displaystyle\equiv\text{Current through $R_{2}$},$ | (5.1.3.2a) | ||

$\displaystyle=\frac{V_{\mathrm{in}_{\mathrm{a}}}}{R_{2}+R_{1}\mathbin{\|}R_{% \mathrm{L}}},$ | (5.1.3.2b) | |||

$\displaystyle V_{\mathrm{out}_{\mathrm{a}}}$ | $\displaystyle=V_{\mathrm{in}_{\mathrm{a}}}-I_{2}R_{2},$ | (5.1.3.2c) | ||

$\displaystyle x\equiv\frac{V_{\mathrm{out}_{\mathrm{a}}}}{V_{\mathrm{in}_{% \mathrm{a}}}}$ | $\displaystyle=,$ | (5.1.3.2d) | ||

$\displaystyle A$ | $\displaystyle=20\log_{10}{\left(\right)}.$ | (5.1.3.2e) |

Finally, we can rearrange our equations to represent the attenuator resistors in terms of the voltage transfer function, $x$, and decibel attenuation, $A$, which are more useful:

$\displaystyle R_{1}$ | $\displaystyle=\frac{1+x}{1-x}Z_{0},$ | (5.1.3.3a) | ||

$\displaystyle=\frac{1+10^{-A/20}}{1-10^{-A/20}}Z_{0},$ | (5.1.3.3b) | |||

$\displaystyle R_{2}$ | $\displaystyle=\frac{1-x^{2}}{2x}Z_{0},$ | (5.1.3.3c) | ||

$\displaystyle=\frac{1-10^{-A/10}}{2\left(10^{-A/20}\right)}Z_{0}.$ | (5.1.3.3d) |

#### 5.1.4. Directional Coupler

Next high-level section: Receiver

A coupler is a device that is able to share the power of a signal at an input port between two output ports. Its schematic depiction is shown in Figure 15.

The signal enters at the input port and most of the energy passes through to the transmitted port. Some input energy, however, exits through the coupled port where the amount is given by the coupling factor. See the relevent RF simulation section for more information on how this works. The insertion loss describes how much of the input power makes it to the transmitted port.

The equations for coupling factor and insertion loss are

$\displaystyle C$ | $\displaystyle=10\log{\left(\frac{P_{3}}{P_{1}}\right)},$ | (5.1.4.1a) | ||

$\displaystyle L$ | $\displaystyle=-10\log{\left(\frac{P_{2}}{P_{1}}\right)}.$ | (5.1.4.1b) |

The coupler we use has a coupling factor of $-$20\text{\,}\mathrm{d}\mathrm{B}$$ and the power amplifier has an output power of $26\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$ which gives a coupled power of $6\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$. The coupler's insertion loss is $0.17\text{\,}\mathrm{d}\mathrm{B}$, which means the transmitted power is nearly the same as the input power, and so we'll take it to be $26\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$.

Another important quantity for a coupler (it is often one of the most important criteria for assessing the quality of a coupler) is the coupler's directivity. It is given mathematically as

$D=-10\log{\left(\frac{P_{4}}{P_{3}}\right)}.$ | (5.1.4.2) |

The directivity represents the ratio of the power available at the isolated port to the power available at the coupled port. Ideally, no power makes it to the isolated port, which would make the directivity infinite. Directivity permits the coupler to differentiate between forward and reverse signals. Having a high directivity is crucial for many devices (e.g. VNAs) but is less critical in our application. Indeed, the directional coupler used here was designed for simplicity of fabrication and to consume minimal board area, but exhibits poor directivity.

#### 5.1.5. Power Amplifier

Next high-level section: Receiver

The power amplifier does exactly what its name suggests: it increases the signal power for transmission. This design uses a SE5004L power amplifier. The most important metrics for us to consider when choosing a power amplifier are the gain and $1\text{\,}\mathrm{d}\mathrm{B}$ compression point. We also, of course, need to ensure the operating frequency range accommodates our signal frequency. This power amplifier has a small signal gain of $32\text{\,}\mathrm{d}\mathrm{B}$ and an output compression point of $34\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$. Its input power is $-$6\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$$, which puts the output power at $26\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$, well below the compression point. We could increase the max range somewhat by increasing the power amplification (see the range section). However, this would also increase the minimum range, which isn't a great tradeoff. Moreover, increasing the power amplification would increase the power consumption of the board and heat dissipation, which we would need to mitigate. Finally, there are easier ways of increasing the max range, for instance by changing the intermediate frequency (IF) amplifier frequency response.

### 5.2. Receiver

A block diagram of the receiver is shown in 16. Its function is to capture the transmitted signal echo, amplify it without introducing too much noise, mix it with the transmitted signal down to an intermediate frequency, filter out non-signal frequencies and then amplify it again for digitization by the analog-to-digital converter (ADC). A simplified block diagram of the receiver hardware is shown below. Note that this is just one of two identical receivers. The mixer and differential amplifier support two separate differential signal pairs, so that circuitry is shared between both receivers.

#### 5.2.1. Dynamic Range

The radar's dynamic range is the ratio between the maximum signal the receiver can process and its noise floor. See those sections to see how the dynamic range was computed. We find that the theoretical dynamic range is equal to about $102\text{\,}\mathrm{d}\mathrm{B}$ and roughly constant across the signal bandwidth of interest. This is shown in Figure 17.

As discussed in the range section, the actual noise floor is higher than the theoretical value. A plot of the noise floor up to the decimated Nyquist frequency is shown in Figure 18.

Figure 18: Revision 1 PCB noise floor ($50\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$ to $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$). Average value is approximately $-88\text{\,}\mathrm{d}\mathrm{B}\mathrm{F}\mathrm{S}$, compared with the theoretical value of $-102\text{\,}\mathrm{d}\mathrm{B}\mathrm{F}\mathrm{S}$. The two spikes in the middle of the plot are caused by a noisy voltage converter switching at $500\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$. The noise floor should be significantly improved in the revision 2 PCB.

##### 5.2.1.1. Maximum Power

To find the maximum detectable power that avoids distortion, we can work backward from the ADC input voltage range of $\pm$1\text{\,}\mathrm{V}$$. The IF amplifier amplifies this signal with a frequency-dependent gain. Therefore, the maximum power will also depend on frequency. The full sine wave ADC input and IF amplifier gain together give us the maximum mixer voltage output, which we can convert to a power level by first converting it to a root-mean-square (RMS) voltage value and then to an average power level using the mixer differential output resistance of $200\text{\,}\Omega$. The equation for this is

$\displaystyle P_{\mathrm{avg}}$ | $\displaystyle=V_{\mathrm{RMS}}^{2}/R,$ | (5.2.1.1.1a) | ||

$\displaystyle=\frac{\left(V_{\mathrm{A}}/\sqrt{2}\right)^{2}}{R},$ | (5.2.1.1.1b) |

where $V_{\mathrm{A}}$ is the voltage amplitude. Every component between the mixer input and the receiver antenna is matched to $50\text{\,}\Omega$, so we can convert our mixer output power to a $\mathrm{d}\mathrm{B}\mathrm{m}$ value and subtract the $\mathrm{d}\mathrm{B}$ gain of the mixer, radio frequency (RF) amplifier and low-noise amplifier (LNA) to arrive at the maximum antenna power. The collective gain of the LNA, RF amplifier and mixer (see minimum power section) is $22\text{\,}\mathrm{d}\mathrm{B}$. I've performed this calculation over the full range of the IF amplifier gain. The results are plotted in 19.

##### 5.2.1.2. Minimum Power

Computing the minimum detectable power is more complicated than computing the maximum power. It requires first finding the voltage noise at the ADC input and then using the same procedure as in the maximum power section to relate this to an equivalent input power level.

We start by computing the thermal noise power at the reception antenna. This can be found with the equation

$P=k_{\mathrm{B}}TB,$ | (5.2.1.2.1) |

where $k_{\mathrm{B}}=$1.38\text{\times}{10}^{-23}\text{\,}\mathrm{m}^{2}\,\mathrm{k}% \mathrm{g}\,\mathrm{s}^{-2}\,\mathrm{K}^{-1}$$ is the Boltzmann constant, $T=$300\text{\,}\mathrm{K}$$ is room temperature and $B$ is the bandwidth. I'll start by using the Nyquist frequency for the bandwidth, $B=$20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$$, though this will decrease when we incorporate the process gain of the FFT. This evaluates to

$\displaystyle P$ | $\displaystyle=$8.28\text{\times}{10}^{-14}\text{\,}\mathrm{W}$,$ | (5.2.1.2.2a) | ||

$\displaystyle=$-101\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$.$ | (5.2.1.2.2b) |

The LNA and RF amplifier have their inputs and outputs matched to $50\text{\,}\Omega$, and the mixer has its input matched to $50\text{\,}\Omega$. As a result, we can simply add the collective noise figure of these components to find the noise power at the mixer output. The equivalent noise figure of multiple impedance-matched components chained together can be computed using the Friis formula for noise:

$F_{\mathrm{NET}}=F_{1}+\frac{F_{2}-1}{G_{1}},$ | (5.2.1.2.3) |

where $F$ denotes a noise factor (noise figure is the decibel value of noise factor) and $G$ denotes the non-dB gain. The subscripts refer to the stage. The result is presented in Table 20.

Figure 20: Noise figure and gain of each receiver stage, along with the cumulative values.

So, adding $1.7\text{\,}\mathrm{d}\mathrm{B}$ to the antenna noise power leaves us with $-99\text{\,}\mathrm{d}\mathrm{B}\mathrm{m}$ noise power at the mixer output. As in the maximum power section, we need to convert this to a voltage noise level since our components no longer have matched impedance values. The rest of the components use impedance bridging, so there is negligible voltage loading between stages. This voltage noise level will be multiplied by the IF amplifier gain and then added to its noise contribution, which we found to be $90.8\text{\,}\mathrm{n}\mathrm{V}\mathrm{/}\sqrt{Hz}$ in the IF amplifier noise section ($0.41\text{\,}\mathrm{m}\mathrm{V}$ over the full $20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ bandwidth). Finally, we add the $0.19\text{\,}\mathrm{m}\mathrm{V}$ found to be the ADC noise contribution and the quantization noise. The quantization noise power is assumed to be spectrally flat over the full Nyquist bandwidth and equal to

$V_{\mathrm{q}}=\frac{q}{\sqrt{12}},$ | (5.2.1.2.4) |

where $q=$1\text{\,}\mathrm{V}_{\mathrm{LSB}}$\approx$488\text{\,}\mathrm{% \SIUnitSymbolMicro}\mathrm{V}$$ is the LSB voltage. Next, we have to adjust this value for the processing gain we get by filtering, downsampling and performing an FFT on the data. The easiest method to compute this is by noticing that we computed the voltage noise for the full $20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ Nyquist rate, but our FFT bin bandwidth is $1.953\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$. So, the voltage noise we see after processing is the old voltage noise divided by the square root of the ratio of these bandwidths. Then, we can convert this back to an input power level using the same procedure we used before. The result is plotted in Figure 21.

#### 5.2.2. Mixer

Next high-level section: Differential Amplifier

A multiplicative mixer is a device that multiplies two input signals. Many mixers (including the one used in this design) are based on a Gilbert cell, whose schematic is shown in Figure 22.

The lower differential pair sets the gain of each of the top two differential pairs. For instance, if $V_{\mathit{LO}}$ is made positive there will be more total emitter current for the left differential pair than for the right differential pair. The top input, $V_{\mathrm{in}}$, is cross connected between the top transistors. That is, $V_{\mathrm{in}+}$ is connected to the base of two transistors whose collectors are tied to different polarities of $V_{\mathrm{out}}$.

To understand the operation, start by setting $V_{\mathit{LO}}=0$. It doesn't matter what we set $V_{\mathrm{in}}$ to, $V_{\mathrm{out}}$ will always be 0. For instance, if $V_{\mathrm{in}}>0$, $Q_{1}$'s collector current will be greater than that of $Q_{2}$, but $Q_{4}$'s collector current will exceed that of $Q_{3}$ by the exact same amount (remember $V_{\mathit{LO}}=0$ so the differential gain of each amplifier is identical). Since the voltage drop across the leftmost $R_{\mathrm{C}}$ is determined by the combined currents to $Q_{1}$'s and $Q_{3}$'s collectors and the drop across the rightmost $R_{\mathrm{C}}$ is determined by the combined currents to $Q_{2}$'s and $Q_{4}$'s collectors, the voltage drops will be identical. When $V_{\mathrm{in}}=0$ the collector current for each transistor within a differential pair will be the same. So, even if we change $V_{\mathit{LO}}$ to change the total emitter current for each upper differential pair, the output voltage will still be 0.

If we set $V_{\mathit{LO}}>0$, this increases the total emitter current for the left differential pair compared to the right differential pair. Then, $V_{\mathrm{in}}>0$ will increase $Q_{1}$'s and $Q_{4}$'s collector currents and decrease those for $Q_{2}$ and $Q_{3}$, but it will change $Q_{1}$'s and $Q_{2}$'s collector currents more than it changes the collector currents for $Q_{3}$ and $Q_{4}$. So the net effect is that $V_{\mathrm{out}}>0$. We can continue to analyze the circuit using the same procedure to find that the circuit's voltage transfer function is given by

$V_{\mathrm{out}}=gV_{\mathrm{in}}V_{\mathit{LO}},$ | (5.2.2.1) |

where $g$ is some gain factor.

#### 5.2.3. Differential Amplifier

The differential amplifier's input circuitry is worth discussing because it affects the amplifier's frequency response. There are three logical parts to this, separated by dashed lines in Figure 23. The pullup resistors at the mixer output set the DC bias point of the output to $5\text{\,}\mathrm{V}$. This is followed by an LC lowpass filter. And then the differential amplifier acts as a second-order active bandpass filter, with roughly $23.5\text{\,}\mathrm{d}\mathrm{B}$ of gain (though it varies with frequency). The DC blocking capacitors in the active filter serve an additional purpose, which is to allow the differential amplifier's input voltages to float down to the acceptable input range. This is critical because the $5\text{\,}\mathrm{V}$ bias point would be too high for the amplifier, which has a supply voltage of $3.3\text{\,}\mathrm{V}$.

The frequency response of this circuit is shown in Figure 24.

##### 5.2.3.1. Noise

Next high-level section: Digitization, Data Processing and USB Interface

The ADA4940 differential amplifier datasheet provides a method for estimating the output noise voltage given the feedback circuitry. Unlike the other receiver components, the differential amplifier is a relatively low-frequency component and does not have its ports matched to $50\text{\,}\Omega$. As a result, we characterize its noise in units of $\mathrm{V}\mathrm{/}\sqrt{Hz}$, instead of as a noise figure.

To calculate the root-mean-square (RMS) output differential voltage noise density, the datasheet provides the equation

$v_{\mathrm{nOD}}=\sqrt{\sum_{i=1}^{8}v_{\mathrm{nO}i}^{2}},$ | (5.2.3.1.1) |

where $v_{\mathrm{nO}i}$ denotes 8 different output noise sources. These sources can be calculated using the equations below. I've simplified them based on the fact that our resistors are the same at each input of the amplifier. The equations for the individual noise sources are

$\displaystyle R_{\mathrm{F}}$ | $\displaystyle=$8.2\text{\,}\mathrm{k}\Omega$,$ | (5.2.3.1.2a) | ||

$\displaystyle R_{\mathrm{G}}$ | $\displaystyle=$549\text{\,}\Omega$,$ | (5.2.3.1.2b) | ||

$\displaystyle G_{\mathrm{N}}$ | $\displaystyle=1+\frac{R_{\mathrm{F}}}{R_{\mathrm{G}}},$ | (5.2.3.1.2c) | ||

$\displaystyle v_{\mathrm{nIN}}$ | $\displaystyle=$3.9\text{\,}\mathrm{n}\mathrm{V}\mathrm{/}\sqrt{Hz}$,$ | (5.2.3.1.2d) | ||

$\displaystyle i_{\mathrm{nIN}}=i_{\mathrm{nIN+}}=i_{\mathrm{nIN-}}$ | $\displaystyle=$0.84\text{\,}\mathrm{p}\mathrm{A}\mathrm{/}\sqrt{Hz}$,$ | (5.2.3.1.2e) | ||

$\displaystyle v_{\mathrm{nCM}}$ | $\displaystyle=$92\text{\,}\mathrm{n}\mathrm{V}\mathrm{/}\sqrt{Hz}$,$ | (5.2.3.1.2f) | ||

$\displaystyle\beta$ | $\displaystyle=\frac{R_{\mathrm{G}}}{R_{\mathrm{F}}},$ | (5.2.3.1.2g) | ||

$\displaystyle T$ | $\displaystyle=$300\text{\,}\mathrm{K}$,$ | (5.2.3.1.2h) | ||

$\displaystyle k_{\mathrm{B}}$ | $\displaystyle=$1.380\,648\,52\text{\times}{10}^{-23}\text{\,}\mathrm{m}^{2}\,% \mathrm{k}\mathrm{g}\,\mathrm{s}^{-2}\,\mathrm{K}^{-1}$,$ | (5.2.3.1.2i) | ||

$\displaystyle v_{\mathrm{nO1}}$ | $\displaystyle=G_{\mathrm{N}}v_{\mathrm{nIN}},$ | (5.2.3.1.2j) | ||

$\displaystyle v_{\mathrm{nO2}}$ | $\displaystyle=G_{\mathrm{N}}\left(i_{\mathrm{nIN}}\times\left(1/R_{\mathrm{G}}% +1/R_{\mathrm{F}}\right)^{-1}\right),$ | (5.2.3.1.2k) | ||

$\displaystyle v_{\mathrm{nO3}}$ | $\displaystyle=v_{\mathrm{nO2}},$ | (5.2.3.1.2l) | ||

$\displaystyle v_{\mathrm{nO4}}$ | $\displaystyle=0,$ | (5.2.3.1.2m) | ||

$\displaystyle v_{\mathrm{nO5}}$ | $\displaystyle=G_{\mathrm{N}}(1-\beta)(4k_{\mathrm{B}}TR_{\mathrm{G}})^{1/2},$ | (5.2.3.1.2n) | ||

$\displaystyle v_{\mathrm{nO6}}$ | $\displaystyle=v_{\mathrm{nO5}},$ | (5.2.3.1.2o) | ||

$\displaystyle v_{\mathrm{nO7}}$ | $\displaystyle=(4k_{\mathrm{B}}TR_{\mathrm{F}})^{1/2},$ | (5.2.3.1.2p) | ||

$\displaystyle v_{\mathrm{nO8}}$ | $\displaystyle=v_{\mathrm{nO7}}.$ | (5.2.3.1.2q) |

Plugging all this in, we get $v_{\mathrm{nOD}}=$90.8\text{\,}\mathrm{n}\mathrm{V}\mathrm{/}\sqrt{Hz}$$.

### 5.3. Digitization, Data Processing and USB Interface

An ADC converts differential analog signals from the receiver into 12-bit digital signals that are then sent to the FPGA for processing. The FPGA supports a two-way communication interface with the FT2232H USB interface chip that allows it to send processed data to the host PC for further processing and receive directions and a bitstream from the PC. A block diagram of this is shown in Figure 25. I've omitted the ADC's second input channel since it's identical to the first. Both channels are time-multiplexed across the same 12-bit data bus to the FPGA.

#### 5.3.1. ADC

This design uses an LTC2292 ADC.

##### 5.3.1.1. Input Filter

The ADC employs common-mode and differential-mode lowpass filters at its analog inputs (see Figure 25). The series resistors and outer capacitors serve as a lowpass filter for common-mode signals with a cutoff frequency of $32\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ ($f_{\mathrm{c}}=1/2\pi RC$). The combination of series resistors and the inner capacitor serve as a differential-mode lowpass filter with a cutoff frequency of $16\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ ($f_{\mathrm{c}}=1/2\pi 2RC$). As described in the datasheet, the filters prevent the sample-and-hold charging glitches of the ADC from affecting the IF amplifier (remember the IF amplifier outputs are used for feedback). Additionally, they limit wideband noise, which could alias into our signal. It's also important to keep the resistor values low to limit thermal noise.

##### 5.3.1.2. Noise

The signal-to-noise ratio (SNR) of an ADC is equal to the ratio of the RMS value of a full-wave sinusoidal input (amplitude equal to the maximum input of the ADC) to the voltage noise RMS value. Mathematically,

$\mathrm{SNR}=20\log_{10}{\left(\frac{V_{\mathrm{FSW}_{\mathrm{RMS}}}}{V_{% \mathrm{n}_{\mathrm{RMS}}}}\right)}.$ | (5.3.1.2.1) |

The datasheet reports an SNR of $71.4\text{\,}\mathrm{d}\mathrm{B}$. We can use this to compute the voltage noise over the $20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ bandwidth:

$\displaystyle 71.4$ | $\displaystyle=20\log_{10}{\left(\frac{$1\text{\,}\mathrm{V}$/\sqrt{2}}{V_{% \mathrm{n}_{\mathrm{RMS}}}}\right)},$ | (5.3.1.2.2a) | ||

$\displaystyle V_{\mathrm{n}_{\mathrm{RMS}}}$ | $\displaystyle=$0.19\text{\,}\mathrm{m}\mathrm{V}$.$ | (5.3.1.2.2b) |

#### 5.3.2. FPGA

The heart of this project is its use of an FPGA, which offers unique performance attributes and design challenges. The FPGA serves two purposes: it performs data processing, and it acts as an interface to configure other ICs on the PCB. An FPGA can implement any digital logic circuit, within resource limitations, which gives us a tremendous amount of flexibility in regard to how we process the raw data and how to interact with and configure the other PCB ICs. The benefit of using the FPGA rather than software on the host PC for data processing is that it can run these algorithms faster than traditional software, and it also allows us to ease the burden on the USB data link between the radar and host PC. In regard to the second point, the bandwidth between the radar and host PC is limited by the USB 2.0 High Speed data transfer speeds of $480\text{\,}\mathrm{M}\mathrm{b}\mathrm{p}\mathrm{s}$, which is an upper bound (the practical speed is lower, typically about $320\text{\,}\mathrm{M}\mathrm{b}\mathrm{p}\mathrm{s}$). By using the memory facilities of the FPGA as a buffer, we can send the raw data directly to the host PC and process and plot it in real time. However, if our data link was slower (e.g., our host computer only supported a USB 1.0 interface), we could ease the burden on the data link by performing the processing directly on the FPGA. This works because the FIR decimation filter only keeps 1 sample out of every 20 inputs.

This design uses an XC7A15T FPGA from Xilinx, which is a low-cost FPGA but is nonetheless sufficient for our needs. One additional benefit of using this FPGA is that it is pin-compatible with a number of higher resource count variants. Consequently, if our data processing needs expand in the future, we can use one of those variants as a drop-in replacement without otherwise altering the PCB.

##### 5.3.2.1. Implementation

Next high-level section: FT2232H

The basic building block of an FPGA is a lookup table (LUT). Although LUTs can use any number of bits stored as a power of 2, these higher-level LUTs are all composed of a collection of simple 2-bit LUTs, shown in Figure 26.

The operation is very simple: to output the bit in the upper register we deassert the select line and to output the bit in the lower register we assert the select line. The register bits, called the bitmask, can be independently set.

We can already use this simple LUT to mimic the behavior of single-input logic gates. If we wanted the behavior of a buffer, we could write a 0 to bit 0 and a 1 to bit 1. The select line acts as the buffer input. A 0 here outputs a 0 and a 1 outputs a 1. To implement a NOT gate, we'd write a 1 to bit 0 and a 0 to bit 1.

It's instructive to double the size of the LUT to see how we can replicate two-input logic gates. A schematic for this is shown in Figure 27.

The select lines of the leftmost multiplexors are tied together to s0. The truth table for an AND gate is shown in Table 28. I've additionally included the bitmask number to show how the AND gate can be implemented.

Figure 28: Truth table for an AND gate, along with the necessary bit setting for an equivalent LUT.

So, the bitmask necessary to implement an AND gate is the bit in the out column for each bit number. In other words, bits 0 to 2 should be set to 0 and bit 3 should be set to 1. We can adjust this truth table for any other 2-input gate to show the correct bit mask to implement that gate.

Theoretically, we could now implement any combinational digital logic circuit with a sufficiently large interconnected array of these size-4 LUTs. Unfortunately, our processing circuit would be glacially slow. In particular, we haven't added any register elements (apart from the bit mask) so we wouldn't be able to pipeline our processing stream. Actual FPGAs incorporate LUTs and registers into larger structures often called logic blocks, which are designed to optimize pipeline performance for typical use cases.

FPGA implementations also need to address a number of other practical design challenges. Some of the most prominent of these are data storage, multiplication and clock generation and distribution. Although we can store processing data in the registers within logic blocks, we frequently need more storage than this provides. Additionally, using these registers for storage limits their use in pipelining, making the FPGA configuration slower and more difficult to route. To address this need, all but the simplest FPGAs come with dedicated RAM blocks.

Similarly, while LUTs can perform multiplication, they are needlessly inefficient in this regard. Since multiplication is highly critical in digital signal processing (DSP) applications, which is one of the major use cases for FPGAs, FPGAs have built-in hardware multipliers.

Finally, in regard to clocks, FPGAs often need to create derivative clock signals. This allows a portion of a design to be clocked at a much higher rate which correspondingly increases the throughput of the data processing stream. So, they are equipped with a number of phase-locked loops (PLLs). Additionally, it is critical that clock signals reach all registers with as little skew as possible. This makes timing closure (ensuring that all signals reach their destination register at the right time) much easier to achieve by the software tools that allocate FPGA resources. Because of this, FPGAs are built with dedicated, low-skew clock lines to all registers.

FPGAs are highly-complicated devices that require many sophisticated design considerations. This section mentioned only a few of the main ones.

#### 5.3.3. FT2232H

The FT2232H is a USB interface chip that permits 2-way communication with the FPGA and enables us to load a bitstream onto the FPGA. The attached EEPROM chip is required to use the FT2232H in its FT245 synchronous FIFO mode, which allows the full $480\text{\,}\mathrm{M}\mathrm{b}\mathrm{p}\mathrm{s}$ data transfer speed.

### 5.4. Power

The radar receives $12\text{\,}\mathrm{V}$ of power provided by a barrel jack connector. The power circuitry then converts/regulates this down to the supply voltage expected by each integrated circuit on the PCB. Although the power circuitry is fairly self-explanatory, a few things deserve mention. Buck converters are used to bring the input $12\text{\,}\mathrm{V}$ down to an intermediate voltage level, and then linear regulators take that intermediate voltage and produce the final supply voltage for each component. Buck converters are more energy-efficient than linear regulators, and linear regulators are less noisy than buck converters. By chaining both together, we can get the best of both worlds. In order to achieve the energy efficiency, the buck converters should output a voltage that is only slightly above the target output voltage of the downstream linear regulator. For this to work, the minimum voltage dropout must be below the actual voltage dropout. We can use low-dropout (LDO) regulators to accomplish this.

Additionally, we have to pay special attention to the power supply rejection ratio (PSRR) of the linear regulators. Voltage regulators have high PSRR at low frequencies, but the PSRR typically falls off significantly as the frequency increases. Most of the buck converter noise occurs at its switching frequency, with additional noise at harmonics of this switching frequency (e.g. $2f_{0},\,3f_{0},\,\ldots$). So, we need to ensure that the linear regulator PSRR is sufficiently high at the switching frequency and preferably for the first harmonics as well.

Finally, of course, we need to make sure that the buck converters and linear regulators can supply enough output current for all downstream devices under full load.

#### 5.4.1. Voltage Regulator Implementation

Next high-level section: FPGA Gateware

A simple implementation of a voltage regulator is shown in Figure 29. This has been adapted from AoE.

The operation of this circuit is very simple. The non-inverting input of the opamp is set by a current source supplying a zener diode. The current source provides a constant current over changes in the input voltage, so the voltage across the zener remains fixed at the desired reference. The inverting input is tied to a voltage divider output from the output voltage. The resistors are chosen to provide a desired output voltage for the reference voltage. Mathematically,

$V_{\mathrm{out}}=V_{\mathrm{REF}}\left(1+\frac{R_{1}}{R_{2}}\right).$ | (5.4.1.1) |

Finally, a darlington pair provides current amplification. So, for instance, even if our opamp could supply only about $1\text{\,}\mathrm{m}\mathrm{A}$ of current, darlington pairs often have current amplification on the order of $1000$ or more, so this regulator would be able to supply $1\text{\,}\mathrm{A}$ of current to a downstream device.

It's important to consider $Q_{2}$'s power dissipation. Specifically, it will dissipate

$P_{Q_{2}}=\left(V_{\mathrm{in}}-V_{\mathrm{out}}\right)\left(I_{\mathrm{out}}+% \frac{V_{\mathrm{out}}}{R_{1}+R_{2}}\right)\approx\left(V_{\mathrm{in}}-V_{% \mathrm{out}}\right)I_{\mathrm{out}}.$ | (5.4.1.2) |

For large voltage drops and large currents, this can amount to a significant amount of power, which the transistor must be designed to handle. Moreover, this is wasted power, which is the reason for the use of LDO regulators in the radar.

It's also worth noting that this implementation has some drawbacks. For instance, it provides no current limiting and thus could be damaged if the output were shorted.

#### 5.4.2. Voltage Converter Implementation

Next high-level section: FPGA Gateware

The operation of a voltage converter is more complicated than that of a voltage regulator. In this discussion we'll restrict ourselves to buck converters since that is what is used in the radar. That is, converters for which the output voltage is lower than the input voltage. However, the operation of boost converters and inverters is similar.

Figure 30 shows a possible implementation of a buck converter. This schematic has been adapted from AoE.

Ignore everything attached to the nMOS gate for the moment. We'll come back to it later. The salient point for now is that the transistor will be switched in a periodic way, with a percentage of that switching, $D$, (called the duty cycle) in the on state and the remaining percentage, $1-D$, in the off state.

We can describe this circuit with two different equations: one when the transistor is switched on and one when it is off.

In the on state,

$\displaystyle-V_{\mathrm{in}}+V_{\mathrm{L}}+V_{\mathrm{out}}$ | $\displaystyle=0,$ | (5.4.2.1a) | ||

$\displaystyle L\frac{\Delta I_{\mathrm{on}}}{\Delta t_{\mathrm{on}}}$ | $\displaystyle=V_{\mathrm{in}}-V_{\mathrm{out}},$ | (5.4.2.1b) | ||

$\displaystyle\Delta t_{\mathrm{on}}$ | $\displaystyle=DT,$ | (5.4.2.1c) | ||

$\displaystyle\Delta I_{\mathrm{on}}$ | $\displaystyle=\frac{DT}{L}\left(V_{\mathrm{in}}-V_{\mathrm{out}}\right).$ | (5.4.2.1d) |

And in the off state,

$\displaystyle V_{\mathrm{D}}+V_{\mathrm{L}}+V_{\mathrm{out}}$ | $\displaystyle=0,$ | (5.4.2.2a) | ||

$\displaystyle V_{\mathrm{D}}$ | $\displaystyle\approx 0,$ | (5.4.2.2b) | ||

$\displaystyle V_{\mathrm{L}}+V_{\mathrm{out}}$ | $\displaystyle\approx 0,$ | (5.4.2.2c) | ||

$\displaystyle L\frac{\Delta I_{\mathrm{off}}}{(1-D)T}$ | $\displaystyle\approx-V_{\mathrm{out}},$ | (5.4.2.2d) | ||

$\displaystyle\Delta I_{\mathrm{off}}$ | $\displaystyle\approx-\frac{(1-D)T}{L}V_{\mathrm{out}}.$ | (5.4.2.2e) |

To simplify the analysis, Eq. (5.4.2.2b) approximates $V_{\mathrm{D}}$ as having 0 voltage drop. This is a reasonable approximation since the voltage drop across Schottky diodes is fairly small. If we were designing an actual switching converter we'd have to include this.

We can now impose the constraint

$\Delta I_{\mathrm{on}}+\Delta I_{\mathrm{off}}=0,$ | (5.4.2.3) |

since otherwise the current would increase or decrease without bound. When we substitute in the current changes during the on and off states, we are left with following expression relating the input and output voltages:

$V_{\mathrm{out}}=DV_{\mathrm{in}}.$ | (5.4.2.4) |

Additionally, there are no lossy components here (aside from the small voltage drop between the nMOS drain and source) and so this device is theoretically 100% efficient. That is, $V_{\mathrm{in}}I_{\mathrm{in}}=V_{\mathrm{out}}I_{\mathrm{out}}$. The output capacitor will smooth the output $\Delta I_{\mathrm{on}}$, $\Delta I_{\mathrm{off}}$ ramp, and the average output voltage will be given by Eq. (5.4.2.4). Of course, real switching converters exhibit some loss, but the efficiency can often be upwards of 90%.

Before discussing the circuitry driving the transistor gate, it's worth addressing the Schottky diode which appears to play no role in the converter's function. Let's imagine what would happen if the diode were not present. When the transistor switches off, the inductor will go from seeing some current through it (call it $I_{0}$) to almost immediately seeing no current through it. The defining equation of an inductor is $V=LdI/dt$, so for very small $dt$ the inductor creates an enormous spike in the voltage across its leads trying to keep its current constant. This translates into the same huge spike at the nMOS source which is likely to damage the transistor. The presence of the Schottky provides a path for current to flow so that the voltage across the inductor peaks at one Schottky forward voltage drop.

Now we can move on to the gate driver circuitry. The oscillator block produces two signals: a clock pulse and a sawtooth ramp with the same period and phase as the clock pulse. This is shown graphically in Figure 31.

Figure 31: Switching converter duty cycle control. Clock pulse (top) and sawtooth ramp (bottom).

The error amplifier outputs a voltage proportional to the difference between the reference voltage and the divided output voltage. The flip flop in this circuit is a set-reset (SR) flip flop. So, the instant a high voltage arrives at its set input the output will go high, and the instant a high voltage arrives at its reset input the output will go low. The output voltage will remain as the last value it was set as until another high set or reset changes its output. For example, even though the clock pulse drops low immediately after being asserted, the output remains high until the reset pin is asserted. So, when the clock pulse arrives, the output will go high and remain that way until the sawtooth ramps to a voltage higher than the error voltage.

To understand this better imagine what happens when the output voltage is higher than its target set by the feedback loop. In this case, the error amplifier decreases its voltage and the duty cycle of the nMOS is reduced. We determined above that $V_{\mathrm{out}}=DV_{\mathrm{in}}$, so this has the effect of decreasing the output voltage. The opposite phenomenon occurs when the output voltage is below its target.

### 5.5. PCB layout

An image of the complete PCB layout is shown in Figure 32, with images of each layer shown in Figures 32, 33, 34, and 35.

Figure 32: PCB layer 1. The top layer is a signal layer containing the sensitive RF circuitry on the right side and the digital circuitry on the left side.

Figure 35: PCB layer 4. The bottom layer is a signal layer and contains most of the power supply circuitry.

This is a 4-layer PCB in which the top and bottom layers are signal layers and the middle two layers are unbroken ground planes. The right portion of the top layer contains the transmitter and receiver, the left portion of the top layer contains mostly digital circuitry, and the bottom layer is primarily devoted to the power supply circuitry.

This separation of noise-sensitive RF circuitry from the digital circuitry is deliberate. Digital ICs tend to be noisy, and without proper separation, fast digital signals can inject noise into the receiver.

Despite physically separating the digital and analog circuitry, I've decided to use the same unbroken ground plane for each. The rationale for this is that the physical separation should be sufficient to prevent crosstalk from reaching the analog circuits. Moreover, separating the ground planes can create major issues if certain mistakes are made. In particular, there are a few digital signals that pass from the digital section into the analog section. Like all signal traces, these need low-inductance current return paths. When the ground planes are unbroken, the current can return on the ground plane directly adjacent to the relevent signal layer. When the ground planes are broken, the return current needs to find some other path to reach the source. This problem can be avoided by connecting the ground planes with a trace where the signal trace crosses the gap between the ground planes. When this return path is omitted, the fallback return path may have high inductance, creating voltage ringing and noise in the circuit. The additional isolation provided by separated ground planes is typically insignificant and therefore it's safer to keep the ground planes connected.

Another noteworthy decision is that I've used a ground plane on each internal layer, rather than a ground plane on one layer and a power plane on the other layer. Having a dedicated power plane can be useful when most of the PCB's circuitry is served by a single power plane. When this is the case, integrated circuits can receive power through a via between the power rail pin and power plane. This frees up routing space for signal traces. However, this radar requires a substantial number of power rails, which would either require a segmented power plane or a single continuous power plane that only supplies a few power pins leaving the remaining power pins to be supplied by traces. The first solution creates potential hazards when signal traces pass over gaps between individual power planes. The problem with this is identical to the problem raised by breaking up ground planes. Specifically, it can interrupt return currents and add noise to the PCB. The second solution provides us with essentially no benefit. Therefore, it makes more sense to just use an additional ground plane.

The third major decision I've made in regard to grounding is to use a ground fill (a ground plane filling the remaining space on a signal layer) around the sensitive RF circuitry, but not around the digital section on the left side of the top layer or around the power supply on the bottom layer. A ground fill, when used correctly with via fences and via stitching, can increase isolation between sensitive RF traces. A via fence is a series of tightly-spaced ground vias placed on either side of a trace, typically a transmission line. They act as a wall to fringing fields spreading from the transmission line and interfering with other traces or components. These are readily apparent around the microstrip transmission lines in Figure 32.

I've also added stitching vias in this section. Stitching vias are ground vias placed in a regular grid pattern to connect two or more ground regions. They provide a solid electrical (and thermal) connection between the ground fill and inner ground planes. These ensure that regions of the ground fill do not act like unintentional antennas. The general rule-of-thumb for via fences and stitching vias is to place them at a separation of $\lambda_{\mathrm{min}}/10$ or $\lambda_{\mathrm{min}}/20$, where $\lambda_{\mathrm{min}}$ is the wavelength corresponding to the highest frequency signal on the PCB, which in our case is $6\text{\,}\mathrm{G}\mathrm{H}\mathrm{z}$.

I've omitted the ground fill in other regions because of this unintended antenna effect. The improved isolation between traces is important in the sensitive RF section, but is less critical in the digital and power supply regions. Moreover, these regions have a higher density of traces, which would cause a ground fill here to be broken up more. This increases the chance of leaving an unconnected island of copper that could act as an antenna and increase noise in the resulting circuit.

Decoupling (or bypass) capacitor placement is another important consideration for reducing circuit noise. Decoupling capacitors are typically placed adjacent to IC's on a PCB. They serve essentially 2 purposes, which are thematically different but physically the same. The first purpose is that they provide a low-impedance supply of charge to the integrated circuit. This is especially important with digital ICs which draw most of their current during a periodic and small fraction of their clock duty cycle. These are called current transients and the decoupling capacitor acts as the supply of charge for these sudden rises in current draw. Power supplies are unable to satisfy these sudden current demands since they take time to adjust to their current output.

The 2nd reason (which is actually equivalent) is that they prevent noise from leaving the IC and reentering the power supply. The noise is caused by these sudden changes in current. The decoupling capacitors provide a low impedance path for this noise to ground. Analog circuits also require decoupling capacitors, primarily to further isolate themselves from the noise of digital circuits.

Decoupling capacitors can be delineated into 2 size gradations. The first and most important variety are the small (size and value) decoupling capacitors placed immediately adjacent to the ICs whose power supply they decouple. Since these capacitors come in small packages they have low inductance values (inductance is the most critical impedance factor at high frequencies) and therefore are able to provide the current needs for very fast transients. The fact that they are relatively low in value is due to the fact that it's more expensive to put a large capacitance value in a small package. Otherwise, you want the largest capacitance value for the size. The ideal capacitor for this purpose is typically a $100\text{\,}\mathrm{n}\mathrm{F}$ 0402 ceramic capacitor. The poor temperature stability and frequency-dependence of most ceramic capacitors does not matter for bypassing. The 2nd class of capacitors are called bulk decoupling capacitors. These come in much larger packages and provide more capacitance. Typically, a single $100\text{\,}\mathrm{n}\mathrm{F}$ 0402 capacitor is used for each power-supply ground pair and a single bulk decoupling capacitor is used for all the pins of one power supply. The small capacitors provide charge for the fast transients and the bulk capacitor recharges the small capacitors. As a result, the bulk capacitor value should be at least as large as the sum of all the smaller capacitors it serves. This general rule should typically be followed unless a component datasheet provides its own decoupling suggestions, in which case those recommendations should be followed instead. For instance, the FPGA in this design provides specific decoupling guidelines.

In regard to layout, the small decoupling capacitors should be placed in a way that minimizes the inductance between the power supply and ground pins. Where possible, I've placed the decoupling capacitor immediately next to the IC it serves with direct, short connections to the power and ground pins. In cases where this would block a signal trace, I've placed the bypass capacitor as close to the pins it serves as possible and placed two vias to the ground plane: one at the ground pin of the IC and another at the ground side of the capacitor. For ball grid array (BGA) packages (just the FPGA in this design), it's not possible to place bypass capacitors on the same side as the component and next to the pins they serve. In this case, I've placed the bypass capacitors on the other side of the PCB directly below the component, with vias connecting the capacitor and component pins.

There are a few other design decisions worth mentioning. The first is that I've placed ground vias adjacent to all signal vias. This allows the return current to easily jump between the inner ground planes when a signal moves between the top and bottom layers. Additionally, I've added a perimeter ground trace with a via fence. The copper planes of a PCB can cause the inner substrate layers to act as a waveguide for radiation originating at signal vias. This ground perimeter acts as a wall, preventing radiation from leaving or entering the PCB.

## 6. FPGA Gateware

As noted, the FPGA in this radar performs real-time processing of signal frequency differences to calculate the distance of detected objects without resulting in any noticeable latency. Additionally, it provides a state machine for interacting with the radar from a host PC and configures other ICs on the PCB. This section and its subsections contain a number of block diagrams and associated descriptions. These diagrams provide a functional description of the operations performed by the Verilog code I wrote.

This FPGA code supplements the functionality of Henrik's VHDL FPGA code, which enabled host PC interaction with the radar and filtering, by including a window function and FFT, and by allowing the transmission of data from any intermediate processing step to the host PC in real time. This means that we can view the raw data, post-filter data, post-window data or fully-processed post-FFT data. All of this can be specified in software with immediate effect (i.e., without recompiling the FPGA code). This enhanced functionality and flexibility required rewriting all of the FPGA code from scratch. It also required more sophisticated temporary data storage, implemented as synchronous and asynchronous FIFOs (first-in first-out buffers) with associated control logic. In particular, the ability to retrieve a full raw sweep of data (20480 samples of 12 bits), while also being able to configure the FPGA to send data from any other processing step required efficient use of the limited memory resources on the FPGA. Finally, Henrik's implementation uses an FIR core provided by Xilinx. I've created my own implementation, which makes it easier to simulate the design as a whole and allows using tools other than Vivado (e.g., yosys, if/when it fully supports Xilinx 7-series FPGAs).

A top-level block diagram of the FPGA logic is shown in Figure 36. The data output by the ADC (see receiver) enters the FPGA and is demultiplexed into channels A and B, corresponding to the two receivers. For a simple range detection (i.e., ignoring incident angle), one channel is dropped (like most things, channel selection is configurable at runtime in the interfacing software). The selected channel is input to a polyphase FIR decimation filter that filters out frequencies above $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ and downsamples from $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ to $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. The FIR output is then passed through a kaiser window, followed by a FIFO, which is read by the FFT module at $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. The FFT real and imaginary outputs are concatenated and then rearranged to bit-normal order with a random-access memory (RAM) module.

When the bitstream is fully loaded onto the FPGA, the FPGA control logic waits for ADF4158 configuration data from the host PC software. It also requires several other pieces of information from the host PC, such as the receiver channel to use and the desired output (raw data, post-FIR, post-window, or post-FFT). Once the FPGA receives a start signal from the host PC, it configures the frequency synthesizer and then immediately begins capturing and processing data. The data processing action is synchronized to a muxout signal from the frequency synthesizer indicating when the frequency ramp is active. The FPGA waits until all transmission data has successfully been sent to the host PC before attempting to capture and process another sweep. This ensures that the host PC always gets full sweeps. This process is repeated until the host PC sends a stop signal, at which point the FPGA is brought into an idle state, waiting to be reconfigured and restarted.

### 6.1. FIR Polyphase Decimation Filter

The FIR polyphase decimation filter has two functions: it filters out signals above $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$, and it downsamples the data to $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ corresponds to the upper frequency bound of the differential amplifier filter, and the new $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ sampling rate ensures we stay above the Nyquist rate to avoid aliasing.

We can't downsample before filtering because that would alias signals above $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ into our filtered output. But filtering before downsampling would require us to perform filtering on the full $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ input, which is inefficient because most of this data would be immediately discarded. Fortunately, we can use a polyphase decimation filter to get the best of both worlds (see the implementation section for details).

The filter's frequency response is shown in Figure 37. It was designed using the `remez`

function from SciPy.

Figure 37: FIR frequency response from DC to $5\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. The frequency response between $5\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ and $20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ has a minimum attenuation of approximately $40\text{\,}\mathrm{d}\mathrm{B}$. I've restricted the domain to make the passband and transition band easier to see.

The stopband, which stretches from $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ to the Nyquist frequency of $20\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$, has a minimum attenuation of roughly $40\text{\,}\mathrm{d}\mathrm{B}$. This minimizes aliasing of signals above $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ after downsampling. The passband extends from DC to $500\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$ and has a ripple of approximately $0.5\text{\,}\mathrm{d}\mathrm{B}$. The $500\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$ to $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ region consists of the transition band. Ideally, our filter would transition immediately from the passband to the stopband with complete attenuation in the stopband and zero ripple in the passband. We can approach this ideal response as we increase the number of filter taps. However, this becomes increasingly expensive to compute. The 120 tap filter used in this design offers a good tradeoff of performance and computational cost. It's worth bearing in mind that frequencies above $500\text{\,}\mathrm{k}\mathrm{H}\mathrm{z}$ will experience an increasingly large attenuation. We could avoid this by setting the passband's upper limit to $1\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ and increasing the stopband's lower limit, but this would result in objects beyond our maximum distance appearing within the visible range, which we want to avoid.

#### 6.1.1. Implementation

Next high-level section: Kaiser Window

In order for our polyphase decimation FIR filter to work it needs to produce results identical to what we'd get if we first filtered our signal and then downsampled. Our input signal arrives at $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$, and then we downsample to $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ by only keeping every 20th signal.

Start with the normal equation for an FIR filter:

$y[n]=(h*x)(n)=\sum_{k=0}^{N-1}h[k]x[n-k].$ | (6.1.1.1) |

$x[n]=x[0],x[1],\ldots$ is our input sequence where $x[0]$ is the first sample, etc. $h[n]$ is the filter's impulse response (I use a 120-tap filter) which we precompute in software (I've used scipy, see the code for details). We need 1024 post-downsampling samples for the FFT. Expanding this out, we get

$\displaystyle y[0]$ | $\displaystyle=h[0]x[0],$ | (6.1.1.2a) | ||

$\displaystyle y[1]$ | $\displaystyle=h[0]x[1]+h[1]x[0],$ | (6.1.1.2b) | ||

$\displaystyle y[2]$ | $\displaystyle=h[0]x[2]+h[1]x[1]+h[2]x[0],$ | (6.1.1.2c) | ||

$\displaystyle\cdots$ | ||||

$\displaystyle y[20]$ | $\displaystyle=h[0]x[20]+h[1]x[19]+\ldots+h[20]x[0],$ | (6.1.1.2d) | ||

$\displaystyle\cdots$ | ||||

$\displaystyle y[119]$ | $\displaystyle=h[0]x[119]+h[1]x[118]+\ldots+h[119]x[0],$ | (6.1.1.2e) | ||

$\displaystyle y[120]$ | $\displaystyle=h[0]x[120]+h[1]x[119]+\ldots+h[119]x[1].$ | (6.1.1.2f) | ||

$\displaystyle\cdots$ |

None of this is different than a normal FIR filter so far. The difference comes in noticing that the only results we want after filtering and downsampling are

$\displaystyle y[0]$ | $\displaystyle=h[0]x[0],$ | (6.1.1.3a) | ||

$\displaystyle y[20]$ | $\displaystyle=h[0]x[20]+h[1]x[19]+\ldots+h[20]x[0],$ | (6.1.1.3b) | ||

$\displaystyle y[40]$ | $\displaystyle=h[0]x[40]+h[1]x[39]+\ldots+h[40]x[0],$ | (6.1.1.3c) | ||

$\displaystyle y[60]$ | $\displaystyle=h[0]x[60]+h[1]x[59]+\ldots+h[60]x[0],$ | (6.1.1.3d) | ||

$\displaystyle\cdots$ | ||||

$\displaystyle y[120]$ | $\displaystyle=h[0]x[120]+h[1]x[119]+\ldots+h[119]x[1].$ | (6.1.1.3e) | ||

$\displaystyle\cdots$ |

And these should arrive at the output at a frequency of $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$.

We need to figure out how to compute this directly, i.e., without filtering and then downsampling as we did above. Figure 38 implements this.

The leftmost (vertically stacked) flip-flops make up a shift register clocked at $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. Each horizontal set of flip flops make up another shift register, all clocked at $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. The first input, $x[0]$, will be registered in the leftmost flip-flop of the top $2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ shift register. It will also be multiplied by $h[0]$ and pass to the output so that the first output value is $y[0]=h[0]x[0]$, as expected. The horizontal shift registers will not register another input until the new input is $x[20]$ and the vertical shift register has values, $x[19]$, $x[18]$, $\ldots$ and $x[1]$, proceeding from the top flip flop down. The corresponding output will be the sum of $h[0]x[20]$, $h[1]x[19]$, $h[2]x[18]$, $\ldots$, $h[19]x[1]$, and $h[20]x[0]$, which comes from the second multiplier in the top shift register. It's easy to follow this pattern and see that we get the intended output.

You may have noticed that if we followed this diagram precisely, we would need 120 multiplication blocks. That's significantly more than the 45 digital signal processing (DSP) slices we get on our XC7A15T FPGA. Luckily, we can deviate from the diagram and use our multiplies much more efficiently. The key to this is that DSP slices can be clocked at a much higher frequency (typically several hundred MHz) than the frequency at which we're using them ($2\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$). So by time-multiplexing DSP slices, we can use significantly fewer physical multiply blocks than in the naive implementation. Although it might be possible to do better, I've used 10 DSP slices (each DSP is shared by 2 filter banks), which keeps the control logic simple and leaves plenty of positive slack.

By using the multiplies more efficiently, the number of DSP elements is no longer the resource-limiting factor. Instead, we're limited by the number of flip-flops needed for the filter banks. In the current implementation we have 120 filter bank memory elements that must each store 12 bits, which means 1440 flip-flops. We have room to increase this, but a 120-tap filter should be good enough. We could potentially cut down on resources by replacing the filter bank flip-flops with time-multiplexed block RAM (BRAM) in much the same way as the DSP slices. However, this is not a trivial task. Each bank only needs to hold 72 bits, but the minimum size of a BRAM element is 18kb. So, the simple implementation of using one BRAM per bank would waste a lot of storage space. Other modules, such as the FFT and various FIFOs need this limited RAM and use it much more efficiently; we only have a total of 50 18kb RAM blocks. We could try to be a bit more clever and share a single BRAM across multiple banks. But, this would require additional control circuitry and would complicate the module somewhat. If we end up needing a better FIR module, this could be worth investigating, but for the time being our FIR is good enough.

### 6.2. Kaiser Window

Windowing is used to reduce spectral leakage in our input signal before performing an FFT on that signal. The following subsections describe what spectral leakage is, how windowing helps to reduce it, and how to implement a window function on an FPGA.

#### 6.2.1. Spectral Leakage

Next high-level section: FFT

Spectral leakage occurs when an operation introduces frequencies into a signal that were not present in the original signal. In our case, spectral leakage is due to an implicit rectangular window performed on our input by taking a finite-duration subsample of the original signal. The plots in Figure 39 illustrate this effect for a single-frequency sinusoidal input.

We see that the beginning and end of the sample sequence jump from zero to some nonzero value. This corresponds to high frequency components that are not present in the original signal, but that we can detect in the FFT output. We could eliminate the spectral leakage by windowing an exact integral number of periods of the input signal. But we can't rely on being able to do this, especially for signals composed of multiple frequencies.

Some examples of sample sequences illustrate the various effects spectral leakage can have.

Start with a simple sinusoid with frequency 1Hz. We'll always ensure the sampling rate is at or above the Nyquist rate to avoid aliasing.

In Figure 40, the first plot below shows the input sample sequence and the interpolated sine wave. The second shows the resulting FFT.

If the FFT correctly represented the true input signal, the entire spectral content would be within the 0.94Hz frequency bin. The fact that it also registers spectral content at other frequencies reflects the spectral leakage.

If we change the phase of the input signal, the output will change as well, which shows the dependence of our output on uncontrolled variations in the input. This effect is illustrated in Figure 41.

Figure 42 shows the effect of using a larger FFT on the spectral leakage. Namely, the leakage still exists, but is less pronounced. This is expected because the distortion produced at the beginning and end of the sampled signal now account for a smaller proportion of the total measured sequence.

#### 6.2.2. Effect of Windowing on Spectral Leakage

Windowing functions can be effective at reducing spectral leakage. Figure 43 shows the effect of performing a kaiser window function on a sampled sequence before computing its spectrum.

For the windowed function, the spectral content outside the immediate vicinity of the signal frequency is almost zero. To understand how the window works, it is informative to compare the original time-domain signal with the time-domain signal with windowing. A plot of this is shown in Figure 44.

We see that the kaiser window works by reducing the strength of the signal at the beginning and end of the sequence. This makes sense, since it is these parts of the sample that create the discontinuities and generate frequencies not present in the original signal.

#### 6.2.3. Implementation

A window function works by simply multiplying the input signal by the window coefficients. Therefore, implementing one on an FPGA is very easy. We can use numpy to generate the coefficients for us, write these values in hexadecimal format to a file, and load them into an FPGA ROM with `readmemh`

. See the window module for details.

### 6.3. FFT

An FFT is a computationally-efficient implementation of a discrete fourier transform (DFT), which decomposes a discretized, time-domain signal into its frequency spectrum. The frequency is related to the remote object distance, so it's a short step from the frequency data to the final, processed distance data.

#### 6.3.1. Implementation

Next high level section: Additional Considerations

There are several important considerations when implementing an FFT in hardware. First, it should be pipelined in a way that allows a high clock rate and bandwidth. Additionally, it should use as few physical resources as possible, which for an FFT involves the number of multipliers, adders, and memory size. I spent some time looking through relevant research papers to find a resource-efficient algorithm that would fit on a low-cost FPGA, and settled on the R2²SDF algorithm^{2}, which requires that $N$ be a power of 4 and uses $3[\log_{4}{\left(N\right)}-1]$ multipliers (we could get it down to $\log_{4}{\left(N\right)}-1$ if needed), $4\log_{4}{\left(N\right)}$ adders and $N-1$ memory elements. I then implemented this version based on the description in the paper.

Figure 45 shows a block diagram of the hardware implementation, adapted from the original paper.

The N blocks above the BF blocks denote shift registers of the indicated size. The BF blocks denote the butterfly stages with multipliers between each successive stage. The data comes streaming out on the right side in bit-reversed order. This structure is shown in significantly more detail in Figure 46, which shows just the last 2 stages and omits the select line control logic to save space (see the Verilog file for full details).

A few things are worth noting here. Each shift register requires only 1 read and 1 write port (we only write the first value and read the last value each clock period) and therefore can be implemented using dual-port BRAM (see the shift_reg module for the actual implementation). Since the RAM module is described behaviorally, the synthesis tool can decide when it makes sense to use BRAM instead of discrete registers (see the fft_bfi and fft_bfii files). I've also added a register after the complex multiplier to show how this can be pipelined, but the actual module uses more pipeline stages. Lastly, the multiplier in the diagram is a complex multiplier, meaning that it requires 4 real multiplication operations. However, it's possible to use 3 DSP slices for this. Or, if we were really determined, just 1. The current implementation uses 3, although a previous implementation used 1 successfully. The challenge with using just 1 DSP is that it requires additional considerations to meet timing. To understand how we can turn 4 DSP slices into 3, we must consider a complex multiplication:

$R+iI=(a+ib)\times(c+id),$ | (6.3.1.1) |

where $R,I,a,b,c,d\in\mathbb{R}$. However, if we precompute the values

$\displaystyle e$ | $\displaystyle=a-b,$ | (6.3.1.2a) | ||

$\displaystyle f$ | $\displaystyle=c\times e,$ | (6.3.1.2b) |

the result can be achieved with

$\displaystyle R$ | $\displaystyle=b(c-d)+f,$ | (6.3.1.3a) | ||

$\displaystyle I$ | $\displaystyle=a(c+d)-f.$ | (6.3.1.3b) |

If we wanted to, we could also cut down on the number of additions used since $c$ and $d$ denote the real and imaginary parts of the twiddle factors. Since these are precomputed and stored, we could choose to store $c$, $c-d$ and $c+d$ instead of $c$ and $d$. Then, the complex multiply would require 3 additions instead of 5 at the cost of an extra element of storage per twiddle factor. This tradeoff isn't particularly appealing in my case and so I haven't used it.

Lastly, I'll discuss the initial steps for how we could turn the 3 DSP slices into 1, even though I haven't used it in my implementation. I've chosen to discuss it because I've used the same strategy elsewhere in the FPGA code, and it might be necessary in the future if we add functionality that requires further DSP usage.

The key principle, like with the FIR filter, is that we can run our DSP slice much faster than the general FFT clock rate of $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$. By running the DSP slice at $120\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$, we can replace the 3 multipliers otherwise required by a single time-multiplexed multiplier. In the first clock period we compute $f$ above, and then use that value in the next 2 clock periods to compute $R$ and $I$. This creates an additional challenge, however, which is that we need to know which $120\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ period we're in relative to the $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ rising edge. In other words, we'd like a counter like the one shown in the Figure 47.

Figure 47: Phase counter timing diagram.

Clearly, the distinguishing factor when both clocks are at their rising edge is that the $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ clock was low right before the rising edge and is high right after. Naively, we could use something like the following to generate our counter.

reg clk40_last; reg [1:0] ctr; always @(posedge clk120) begin clk40_last <= clk40; if (~clk40_last & clk40) begin ctr <= 2'd1; end else begin if (ctr == 2'd2) begin ctr <= 2'd0; end else begin ctr <= ctr + 1'b1; end end end

But it's a bad idea to use a clock in combinational logic as we've done here. The reason is that FPGA vendors provide dedicated routing paths for clock signals in order to minimize skew and make timing requirements easier to meet. By using clocks in combinational circuits we run the risk of preventing the router from using these dedicated paths. Instead, it would be great if we could use a dual-edge triggered flip flop that used both $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ clock edges and had a separate input for the positive and negative clock edges. Then we could set the flip flop's output to the current $40\text{\,}\mathrm{M}\mathrm{H}\mathrm{z}$ clock state by tying the positive edge input to 1 and the negative edge input to 0. Very few FPGA vendors provide these, but fortunately, we can use two single-edge flip flops to mimic one dual-edge flip flop. The logic for this is shown below.

module dual_ff ( input wire clk, input wire rst_n, input wire dp, input wire dn, output wire q ); reg p, n; always @(posedge clk) begin if (~rst_n) p <= 1'b0; else p <= dp ^ n; end always @(negedge clk) begin if (~rst_n) n <= 1'b0; else n <= dn ^ p; end assign q = p ^ n; endmodule

The `^`

denotes an exclusive-or (xor) operation. This module takes advantage of the fact that a binary value xor'd by the same value twice just gives the original value. To understand how this works, imagine the positive clock edge occurs first. The flip-flop state at that edge will be `(dp^n)^n`

. xor is an associative property so this is equal to `dp^(n^n)`

. `n^n`

must be 0 and `dp^0`

is just `dp`

. For the negative edge, the output is `p^dn^p`

. xor is also commutative, so this is `p^p^dn`

, which is of course `dn`

. It's now easy to adjust our counter module to keep the clock out of combinational logic. For the full details see the pll_sync_ctr and dual_ff modules.

### 6.4. Asynchronous FIFO

A FIFO, or first-in first-out buffer, is a type of temporary data storage. It is characterized by the fact that a read retrieves the oldest item from the buffer. Additionally, reads and writes can be performed simultaneously. There are essentially two different types of FIFOs: synchronous and asynchronous. In a synchronous FIFO, the read and write ports belong to the same clock domain (a clock domain is a group of flip-flops synchronized by clock signals with a fixed phase relationship). In an asynchronous FIFO, the read and write ports belong to different clock domains. Therefore, asynchronous FIFOs are useful when sending data between clock domains. In this radar, I've used synchronous FIFOs to temporarily store the output of data processing modules before passing that data to other modules for further processing. I've used asynchronous FIFOs to handle the bidirectional communication between the FT2232H USB interface IC and the FPGA, which are synchronized by independent clocks.

#### 6.4.1. Implementation

Next high level section: Additional Considerations

In contrast to synchronous FIFOs, which are simple to implement, asynchronous FIFOs present a few design challenges related to clock domain crossing. The ideas in this section come from an excellent paper on asynchronous FIFO design by Clifford Cummings.^{3}

When implementing a synchronous FIFO, we can keep track of whether the FIFO is empty or full by using a counter. The counter is incremented when a write occurs and decremented when a read occurs. The FIFO is empty when the counter value is 0 and full when it reaches a value equal to the size of the underlying storage.

An asynchronous FIFO cannot use the counter method, since the counter would need to be clocked by two asynchronous clocks. Instead, the asynchronous FIFO needs to keep track of a read address in the read clock domain and a write address in the write clock domain. There are three primary difficulties that an asynchronous FIFO implementation must address. The first is that we need to somehow use the read and write addresses to determine when the FIFO is empty or full. The second is that in order to compare these addresses, they need to cross into the other's clock domain. The third difficulty is that a reset will necessarily be asynchronous in at least one of the clock domains. Let's address these concerns one-by-one, starting with the full and empty conditions. Moreover, we'll initially ignore the fact that the read and write pointers must cross a clock domain.

Imagine the size-8 memory storage device shown in Figure 48. At initialization or reset, the read and write pointers both point to address zero. The pointers always point to the next address to be written or read. When a new word is written to the FIFO, the write is performed and then the write address is incremented to point to the next address.

Figure 48: FIFO read and write address at reset.

Clearly, at reset the FIFO is empty since we haven't yet written anything into the FIFO. So, we can signal the empty flag when the read and write addresses are the same.

To fill the FIFO, we write 8 words without reading. However, now the write address points again to address 0. So the full flag is signaled also by the write address and the read address being the same as shown in Figure 49.

Figure 49: FIFO empty condition.

This is of course a problem. We can solve this by using an extra bit to denote the read and write pointers, as shown in Figure 50. Now, at reset the read address will have the value 0000 and the write address will have the same value. When the write pointer increments back to address 0, it will have the value 1000.

Figure 50: FIFO full condition.

If our read address then catches up to the write address, it will also have the value 1000. So, we have a clear way to distinguish our empty and full conditions. Namely, the FIFO is empty when all read address bits are the same as all write address bits, and the FIFO is full when all bits except for the MSB are the same.

Now, let's address the difficulty that the read address must cross into the write clock domain in order to check for a full condition and the write address must cross into the read clock domain in order to check for an empty condition. To perform these clock domain crossings, we can use a 2 flip-flop synchronizer.

A flip-flop synchronizer is a simple device that passes a signal through a series of flip-flops (typically, 2) in the destination clock domain in order to reduce the probability that a metastable state propagates to downstream logic, which could cause state machines to enter into undefined states and create unpredictable circuit behavior. When the setup or hold times of a flip-flop are violated the flip-flop's output can go metastable, which means that it is temporarily at a value between its high and low states. The duration of this metastable state can be short or quite long. When the metastable state lasts sufficiently long, it can be registered by the next flip-flop, creating a chain reaction. A flip-flop synchronizer works by placing a few flip-flops as close together as possible, so as to minimize the signal propagation delay between the flip-flops. The proximity is important because it maximizes the amount of time a flip-flop can be in a metastable state and still pass a definite state to the next flip-flop. Using two flip-flop synchronizers instead of one decreases the likelihood that we pass the metastable state on to the downstream circuitry. We can think about this in terms of the probability, $p$, that a metastable state lasts for the full clock period in the destination clock domain. If we use two flip-flops in the synchronizer, then the probability that the metastable state makes it to the first flip-flop downstream of the synchronizer is $p^{2}$. In cases where we need even greater assurance we might add a third flip-flop. It's worth noting that the flip-flop synchronizer does not provide any guarantee that the flip-flop's state is the correct one. It only reduces the probability of propagating a metastable state.

By using a flip-flop synchronizer to pass the read address into the write domain and the write address into the read domain, we are comparing a potentially old read address with the current write address and a potentially old write address with the current read address. This raises the question of whether this delay creates the possibility of underflow (performing a read when the FIFO is empty) or overflow (performing a write when the FIFO is full). The answer to this question is that there is no risk of underflow or overflow, and the reason is intuitive. When we're checking the full condition, the write address is correct and the read address is potentially a little old. This means that, in the worst case, we measure the buffer as being slightly more full than it is in reality. When we check the empty condition, we might measure the buffer as being slightly more empty than it actually is, but never more full.

There's still a problem we haven't addressed though. The current implementation has us sending binary counters across domains. Binary counters have the unfortunate attribute that incrementing a value by 1 can result in all of the value's bits changing. Since all counter bits will generally not change simultaneously, the output value in the new domain has a reasonable chance of taking on any address value. Instead of using binary counters we can use Gray codes, which avoid this problem. Successive increments (or decrements) in a Gray code differ by exactly one bit, which means that we will either register the old or new state, and not some random state. To understand the difference between binary and Gray codes, Table 51 shows the decimal numbers 0 through 7 and the corresponding binary and Gray codes.

Figure 51: Decimal numbers 0 to 7 and the corresponding binary and Gray codes.

Binary counters are still useful in this context because they are used to address the underlying block RAM and make it easy to perform an increment operation. So, the read address in the read domain and write address in the write domain can be stored and updated as binary codes, but converted into Gray codes for comparison. Fortunately, converting a binary code into a Gray code is simple. If `WIDTH`

is the bit width of the binary and Gray codes, `bin`

is the binary code and `gray`

is the Gray code, then the following Verilog code performs the conversion.

wire [WIDTH-1:0] gray = (bin >> 1) ^ bin;

The final challenge we need to address in the asynchronous FIFO implementation deals with resetting the FIFO. The reset is necessary to start the FIFO in a known, initialized state as well as to perform any resets needed during runtime. The difficulty with implementing the reset comes from the fact that a synchronous reset in the write clock domain is asynchronous in the read clock domain. Similarly, a synchronous reset in the read clock domain is asynchronous in the write clock domain. So, we're forced to use some sort of asynchronous reset here. There are a number of different ways the reset can be performed, including using a synchronous reset in the write domain that is asynchronous in the read domain. This has the advantage of allowing writes to recommence sooner after the reset is asserted since we don't need to synchronize exiting the reset. Since this additional performance isn't critical in our application, I've chosen to make the reset asynchronous in both domains, thus allowing it to be triggered from either domain. This reset is then synchronized into both domains. A reset in the read domain causes the empty flag to be asserted and the read address to be reset and a reset in the write domain causes the full flag to be asserted and the write address to be reset. If the read reset deasserts first, the empty flag will remain asserted because the read and write addresses are identical. If the write reset deasserts first, then the full flag will deassert and we can begin writing to the FIFO.

### 6.5. Additional Considerations

#### 6.5.1. Convergent Rounding

When performing computations on an FPGA we're often faced with representing real numbers using a small number of bits. Specifically, we must do this with real numbers where the fractional value is a large percentage of the total value so we can't simply ignore the non-integral part. CPUs and numerical software use floating point arithmetic to achieve good precision for a wide dynamic range of values. The trick is that successive intervals between numbers are non-constant. The intervals between small numbers are small and the intervals between large numbers large. In other words, the focus is on minimizing the percentage error, not the total error, while allowing a very large range. FPGAs will sometimes implement floating point. However, compared to integer arithmetic, floating point arithmetic is very resource intensive on an FPGA, so this is only done when the design has a strict accuracy requirement and the much higher FPGA cost is worth it.

Twos complement provides us with a good way to represent integral values but doesn't directly give us a way to represent real numbers. However, the way to extend it for this need is simple: we can sacrifice range for precision. For instance, imagine we represent a number with 10 bits. Twos complement would give us a range of -512 to +511. But imagine that our range of values is between -1 and +1. If we multiply all of our real values by $2^{9}$ (left arithmetic shift by 9 bits), our values now fit into the integral twos complement range nicely (note that +1 doesn't and we either need a range where the max value is slightly less than +1, or need to approximate +1 as slightly less). We can right shift by 9 bits to get our values back. This shows us that we're representing a range of $[-1,0.998046875]$ with a fixed distance between successive values of $0.001953125$.

We might do this to multiply an integral value by a fractional value and get the approximately integral result. The problem is that we're implicitly rounding the result in the last step (right bit shift). And, we're using a bad method of rounding: truncation. This might be fine if the result is final. However, if we then use the result in subsequent truncations the bias that we introduce in each truncation can build up and develop a significant bias in the final result. A good solution is to use something called convergent rounding. The zipcpu blog provides an excellent description of this as well as additional rounding methods and their merits. The diagram below is adapted from that post. I'll skip the alternative methods and just discuss convergent rounding.

The basic idea of convergent rounding is to round most numbers toward their nearest integral value, and to round numbers equidistant between two integers to the nearest even integer. This is shown diagrammatically in Figure 52. The numbers in parantheses represent the actual values and the numbers outside the parentheses shows its representation in terms of how it would be interpreted as a traditional twos complement value. Clearly, the mapping between numbers represents a bit shift of 3. The colors show the final mapped value (i.e. all green values get a final value of 0, etc.) and the boxes with thick boundaries show values equidistant between two final values.

We can see that the rounding behavior is symmetric about $0$. This avoids introducing an offset bias. Additionally, as we move away from 0 it alternates between rounding down and rounding up. As a result, we avoid adding a scaling factor bias.

Here's the combinational Verilog code to achieve this. `OW`

represents the output bit width, `IW`

represents the input bit width and `indata`

is the input data. `post_trunc`

, of course, is the final, rounded result.

assign pre_trunc = indata + {{OW{1'b0}}, indata[IW-OW], {IW-OW-1{!indata[IW-OW]}}}; assign post_trunc = pre_trunc[IW-1:IW-OW];

It's helpful to think about actual bit widths to see what's going on here. Take an input width of 7 and output width of 3. The Verilog code above performs the addition

Figure 53: Bracketed values represent an input data bit position. Non-bracketed values denote a literal bit value. The vertical line shows the radix point position.

`post_trunc`

will be the result to the left of the radix point after the addition is performed. To understand how this works, start with positive numbers (i.e. bit 6 is 0). A bit 4 value of 1 means the integral part is odd, and 0 corresponds to an even integral part. If bit 3 is 1 the fractional part has a value of at least 0.5. If the input integral part is odd and has a fractional part of 0.5, we get a carry bit from bit position 3 and our value rounds up to the next higher integer. If the integral part is even we don't get a carry bit and we round down. If we have a fractional part other than 0.5, bits 0-2 ensure we get the normal nearest-integer rounding. For instance, if we have an even integral value and fractional part greater than 0.5 the inverted bit 4s will give us a carry bit and we'll round up. It's not hard to follow this trend to see we get the correct behavior in the other cases.

For negative values, a 1 (0) in bit 4 means an even (odd) integral part. This flips the rounding behavior on the boundary. For instance, an even integral part will round up (effectively truncating the integer) and an odd integral part will round down.

#### 6.5.2. Formal Verification and Testing

A number of Verilog modules use Symbiyosys for formal verification. Currently, only a small portion of the design is formally verified. In this context, it is used primarily as an additional form of testing. See the code for details. All formal verification tests can be run from the Verilog directory Makefile with `make formal`

.

I've used Cocotb for creating and running automated test benches. Again, see the code for details. All tests can be run from the Verilog directory Makefile with `make test`

.

## 7. PC Software

The host PC software has several important tasks. First, it must provide the user with a shell environment to query the current FPGA configuration, write a new configuration, and set various plotting and logging options. Setting the FPGA configuration requires being able to write data to the FTDI FT2232H device.

Next, it must extract each sweep from the raw bytes sent from the FPGA. This needs to be performed efficiently, otherwise it can act as a serious bottleneck. I initially implemented this code entirely in Python but was only able to receive about 10 sweeps per second. Conversely, by rewriting this same code in C (incorporated into the Python code with Cython) I'm able to receive several hundred sweeps each second.

Depending on the user-requested configuration, the software must also perform some data processing. The FPGA gateware and host PC software have been setup so that any intermediate data (raw ADC, FIR downsampled, window, or FFT) can be sent from the FPGA to the PC. Additionally, any final output data can be requested for viewing. For instance, the user might request raw data from the FPGA and FFT data as the final output, which would require the software to perform the polyphase FIR filter, the kaiser window and the FFT. This is, of course, fairly straightforward with numpy and scipy.

The last main task performed by the PC software is plotting. I'm using PyQtGraph, which is excellent for real-time plotting. The user can choose to plot the data as a time-series (except FFT output), as a frequency spectrum, or as a 2D histogram displaying the frequency spectrum over time. If a non-FFT final output is requested for the spectrum or histogram plot, an FFT of the time-series data is taken before plotting.

All of these options (with the exception of the time-series plot) work in real time. In fact, the time-series plot could be made to work in real time by decreasing the plot update rate, but this would make it too fast to see.

The C source code that reads raw bytes from the radar and extracts sweeps from this data is multithreaded so that reading data and constructing a sweep is separated from the code that returns a sweep array to the Python program. This increases the plot rate because the data capture can occur while the Python program is processing and plotting the previous sweep.

A block diagram of the software architecture is shown in Figure 54.

See the main repo's usage section for how to use the software.

## 8. RF Simulations

When dealing with low-frequency electronics (and/or sufficiently short conductors), we're able to use a number of approximations that make our analysis much simpler. These approximations are collectively known as the lumped-element model. However, when we get to higher frequencies these assumptions no longer generate good approximations, and instead we have to deal with the underlying physics more directly. In these cases we use a distributed-element model (which employs calculus to reframe electrically-large circuit elements in terms of a lumped-element model) or use simulation tools to compute Maxwell's equations directly.

The radar's transmitter and receiver, which operate in the $5-$6\text{\,}\mathrm{G}\mathrm{H}\mathrm{z}$$ range, require these more precise forms of analysis. I've used OpenEMS, which is an open-source electromagnetic wave simulation tool, to do this.

Additionally, I wrote a Python driver for OpenEMS that automatically generates a high-quality mesh (a 3-dimensional rectilinear grid specifying where Maxwell's equations should be evaluated at each discrete timestep) from a physical structure. It also provides a number of other useful capabilities such as the ability to optimize some arbitrary part of a structure for a desired result. All of my simulations use this tool and are available in the main radar repo.

### 8.1. Microstrip Transmission Line

The radar primarily employs microstrip transmission lines to transmit high-frequency AC signals on the PCB. Transmission lines become necessary whenever the phase of the signal conducted by the transmission line varies appreciably over the length of that line. This can occur either when the signal is of sufficiently high-frequency, or when the transmission line is sufficiently long. In our case it is due to high signal frequencies.

A microstrip is a very common form of transmission line that employs a signal trace on the surface of a PCB and an unbroken ground plane beneath it. A cross section of this is presented in Figure 55.

All of the microwave ICs have their input and output impedances internally matched to $50\text{\,}\Omega$. This makes our job straightforward as we don't need to provide impedance matching circuits. Instead, we just need to ensure that the characteristic impedance of each microstrip is $50\text{\,}\Omega$. The sole exception to this rule is the differential microstrip between the balun and mixer input, which has a *differential* impedance of $50\text{\,}\Omega$. Because this is a differential impedance, the single-ended impedance of each microstrip alone would be roughly $25\text{\,}\Omega$. However, it's complicated somewhat by the fact that there is coupling between the lines that affects the characteristic impedance. See that section for details.

The reason impedance matching is critical is expounded in the theory section. The actual simulation and analysis are presented in the simulation section.

#### 8.1.1. Theory

Next high-level section: Simulation

Matching the characteristic impedance of a transmission line to the source and load impedance limits signal reflections on the transmission line and maximizes power transfer to the load. Signal reflections due to impedance discontinuities cause signal power to return to the source, which limits the power transmitted to the load and can potentially damage the source. I'll discuss this more at the end of the section, but first it's helpful to develop some transmission line theory.

Let's analyze why impedance discontinuities cause signal reflections. It's worth mentioning that the following discussion does not directly address the original question as to why the transmission lines need to have their characteristic impedance matched to the source and load impedance. Rather, it looks at the closely-related situation in which a mismatched load causes signal reflections. Nonetheless, it's an easy step from this to understanding why impedance-matched transmission lines are important.

Transmission lines have distributed inductance, capacitance, resistance and admittance over their length. Because the voltage signal traversing the transmission line does not have the same value everywhere along the line, we can no longer lump all of this into a single inductor, capacitor and 2 resistors connected by perfect, delay-free conductors. However, we can restrict ourselves to a short enough segment of the transmission line over which all of these values can be lumped. This short segment, with length $\Delta z$, is shown in Figure 56.

$L$ is due to the self-inductance of the two conductors. $C$ is due to the conductors' close proximity. $R$ is due to the metal's finite conductivity. $G$ is due to the dielectric loss of the material between the conductors. We can directly apply Kirchhoff's voltage and current laws to this.

For sufficiently short transmission lines (as in our case) we can neglect the resistive components and our model simplifies to Figure 57.

The complete transmission line of length $l$ consists of $l/\Delta z$ of these short segments chained together. When we add in a source and load impedance, we get the circuit in Figure 58.

The characteristic impedance, which is the instantaneous ratio of voltage to current anywhere along the transmission line assuming no reflections, is given by

$${Z}_{0}=\sqrt{\frac{L}{C}}.$$ |