Beyond the GPU: Why Real-Time Medical Imaging Demands FPGA Determinism

Overview

Medical imaging

When Determinism is a Clinical Requirement, Not a Performance Preference

The medical imaging industry is in the middle of a significant architectural shift. Wearable ultrasound patches are replacing cart-based scanners. Surgical guidance systems now operate in environments where sub-millisecond timing errors carry clinical consequences. AI-assisted diagnostics are pushing compute requirements higher while simultaneously demanding lower latency at the point of acquisition.

Most development teams reach for a GPU first. That is a reasonable instinct. GPUs are powerful, well-documented, and richly supported by signal processing libraries. But in a medical imaging context, raw compute throughput is only half the story. The other half is determinism: the guarantee that every acquisition event, synchronization pulse, and timestamp is processed with exactly the same timing as the one before it.

That guarantee is what an FPGA provides, and what a GPU fundamentally cannot.

The Real Problem: Jitter, Not Throughput

A GPU depends on a CPU host thread to submit work—and that thread runs under an operating system scheduler. On top of that, every GPU kernel invocation carries PCIe bus latency and command-queue overhead. The result is a pipeline where the OS, the bus, and the driver stack all sit between the acquisition event and the compute response. Under normal conditions, that scheduler is invisible. But medical imaging pipelines do not operate under normal conditions. When timestamping a photon arrival, synchronizing a 128-channel ultrasound array, or triggering an optical pulse, scheduler-induced latency spikes—routinely 50 to 200 microseconds or more on standard, non-real-time operating systems—can misalign channel timing, corrupt spatial reconstruction, or invalidate synchronized measurements across modalities. Even mitigations like GPUDirect RDMA, which reduce data-transfer latency between peripherals and the GPU, leave the core problem untouched: OS scheduling non-determinism on the acquisition side cannot be tuned away. In a consumer application, that jitter is imperceptible. In a clinical acquisition pipeline, it is the difference between data you can trust and data you cannot.

This is not a software optimization problem. Real-time OS configurations and thread priority tuning reduce jitter but cannot eliminate it; the non-determinism is architectural. An FPGA eliminates it at the hardware level. Every signal path is dedicated logic. There is no OS, no interrupt controller, no shared memory bus competing for cycles. Timing is deterministic because it is defined in hardware, not scheduled in software. This is not a performance preference. It is a design requirement for any system where trust in the data is non-negotiable. As researchers at Boston University concluded after benchmarking FPGA-based medical inference directly against CPU and GPU architectures: "by directly interfacing and processing sensor data with ultra-low latency, FPGAs can perform real-time analysis during procedures and provide diagnostic feedback that can be critical to achieving higher percentages of successful patient outcomes." (Sanaullah et al., BMC Bioinformatics 2018)

FPGAs and GPUs are not mutually exclusive. The emerging pattern in high-performance medical imaging platforms is an FPGA handling front-end acquisition, synchronization, and initial data reduction, then passing structured data to a GPU or host CPU for AI inference and display. FPGAs do the work that requires determinism; GPUs do the work that benefits from massive parallelism.

Where FPGAs Are Making a Measurable Difference

Ultrasound Beamforming

Modern wearable and handheld ultrasound systems require real-time digital beamforming across dozens or hundreds of transducer channels simultaneously. This means performing delay-and-sum calculations, channel weighting, and adaptive filtering on continuous streaming data with no buffering lag.

FPGAs handle this naturally through hardware parallelism: each transducer channel can be assigned a dedicated processing pipeline running in lockstep with the others. The performance difference is not incremental. In a 2023 study published in IEEE Transactions on Ultrasonics, researchers quantified it directly: a standard CPU-based beamformer sustains approximately 133 million samples per second; a single-FPGA implementation of the same algorithm achieves 4.83 billion samples per second. The result is high-frame-rate imaging in form factors that would be thermally and architecturally impossible for a GPU-based system. Recent research into wearable cardiac monitoring platforms has validated FPGA-based IQ beamforming as the enabling architecture for continuous, ambulatory imaging in applications where patients cannot be tethered to a cart.


Fluorescence-Guided Surgery

Multi-spectral fluorescence imaging, used in oncology to delineate tumor margins during resection, requires simultaneous acquisition and processing across multiple imaging channels with precisely correlated timestamps.

The University of Illinois at Urbana-Champaign developed a bioinspired multispectral imaging system—paired with holographic display goggles—for fluorescence-guided cancer surgery. The sensor delivers six spectral measurements across the visible and NIR spectrum for real-time tumor margin visualization. An Opal Kelly XEM7310 module handles sensor readout and high-speed data transfer to the host PC, with firmware in Verilog and host software in Python and C++.


Optical Coherence Tomography (OCT)

OCT produces continuous streams of interference fringe data that must be Fourier-transformed in real time to generate usable cross-sectional images. Performing this FFT-heavy pipeline on a CPU introduces latency that breaks the live feedback loop surgeons and clinicians depend on.

FPGAs implement the transform pipeline directly in hardware, enabling true real-time 3D reconstruction at the acquisition rates that make intraoperative OCT clinically viable. For OCT systems operating at A-scan rates above 100 kHz, few practical CPU-based alternatives exist.


Electrophysiology and Optogenetics

Optogenetics gives neuroscience researchers the ability to activate or inhibit specific neuron populations using light, but it demands precise, repeatable optical pulse timing. Early optogenetic instruments relied on lasers, which are expensive, mechanically unstable, and difficult to miniaturize. Plexon set out to replace them with high-powered LEDs, which offer lower cost, longer lifetime, and greater stability, but introduced a new engineering challenge: each LED channel requires a high-power programmable current driver that can be triggered with hardware-level timing precision. A missed or mistimed pulse does not just produce noise; it invalidates the experimental trial entirely. Plexon's Optogenetic Controller uses Opal Kelly FPGA modules to coordinate programmable current drivers across four high-powered LED channels, delivering the deterministic optical pulse timing that closed-loop neuroscience experiments require.


AI-Assisted Medical Diagnosis

As AI moves deeper into clinical workflows — from intraoperative decision support to real-time cancer screening — where inference happens matters as much as the inference itself. Running a neural network on a GPU or CPU introduces the same scheduling variability that affects acquisition pipelines; in a real-time diagnostic context, that variability delays feedback at exactly the moment it matters most.

Researchers at Boston University demonstrated this concretely. In a BMC Bioinformatics study, they implemented a Multi-Layer Perceptron inference processor on an FPGA for real-time cancer detection using mass spectrometry data, benchmarking it directly against CPU and GPU baselines. The FPGA achieved an average speedup of 144x over CPU and 21x over GPU. The advantage came not from raw compute power, but from the FPGA's ability to directly interface with sensors, eliminate memory transfer overhead, and tailor the compute pipeline specifically to the application — removing every layer of general-purpose abstraction that sits between data arrival and diagnostic output.

For medical AI applications where inference latency is clinically meaningful, that architectural difference is the deciding factor.

Core Technical Advantages

In-situ data reduction

Rather than streaming raw, high-bandwidth sensor data to a host PC, an FPGA can perform initial filtering, decimation, and feature extraction at the point of acquisition. This reduces downstream bandwidth requirements and enables real-time display without requiring a high-end workstation.

Hardware-level synchronization

Multi-modal imaging systems require sub-microsecond channel correlation. This is trivially defined in FPGA logic and very difficult to achieve reliably in software, particularly across channels on different physical interfaces.

Power efficiency at the edge

Wearable and implantable medical devices operate under strict thermal and power budgets. A mid-range FPGA performing real-time beamforming consumes a fraction of the power that an equivalent GPU-based pipeline would require.

Rapid prototyping without custom silicon

Moving from architectural validation to clinical prototype traditionally required custom PCB design, FPGA integration expertise, and months of board bring-up time. Production-ready FPGA modules eliminate the hardware design phase entirely, letting teams focus on the signal processing pipeline itself.

Why Opal Kelly, Not Just Any FPGA

FPGA development boards are not scarce. What is scarce is a platform that is production-ready, well-supported, and purpose-built for the kind of high-speed host communication that medical imaging systems require.

FrontPanel SDK

Opal Kelly's FrontPanel SDK provides a unified C++, Python, C#, Ruby, and Java API for PC-to-FPGA communication. Rather than spending engineering cycles building and debugging a custom USB host interface, teams get a validated, production-tested communication layer on day one. This alone typically compresses prototype development time from months to weeks.

SYZYGY peripheral ecosystem

The XEM8320 and other SYZYGY-equipped modules provide a standardized high-speed peripheral expansion interface. Custom ADCs, DACs, RF front-ends, and sensor arrays connect through a single, well-defined electrical standard, eliminating the one-off hardware integration work that plagues custom designs.

Regulatory-path compatibility

Medical device development involves regulatory documentation that traces to specific hardware components. Opal Kelly modules are production-stable hardware with documented specifications, long product lifetimes, and engineering support, all characteristics that matter when assembling the traceability documentation required for a device destined for FDA clearance. Additionally, Opal Kelly is ISO 9001 certified, which provides an additional layer of quality assurance documentation relevant to medical device supply chains.

Note that FDA clearance applies to the finished medical device; Opal Kelly modules are components of that system, not independently cleared.


Prototype in Weeks, Not Quarters

Custom FPGA board design is not just a procurement delay. It requires specialized PCB layout expertise, high-speed signal integrity analysis, power delivery design, and a fabrication and bring-up cycle that typically runs 12 to 16 weeks — before a single line of signal processing code can be tested against real sensor data. For a team building a medical imaging system, that window is also time not spent on the signal processing pipeline, the clinical validation protocol, or the regulatory documentation that determines how quickly the device reaches patients.

Opal Kelly modules bypass that hardware design phase entirely for prototype and low-to-mid volume production. The signal processing work starts on day one, not after the board comes back from the fab. Teams that would otherwise spend a quarter getting hardware stable can instead spend that time validating their beamforming algorithm against real transducer data, or refining their synchronization architecture before it gets locked into silicon.

The Plexon example is illustrative. When they needed a high-power programmable current driver platform for their Optogenetic Controller, they built on an Opal Kelly module rather than designing a custom board from scratch. Critically, they were also able to carry forward firmware and design work from a previous Opal Kelly module they had already shipped — a continuity advantage that compounds across product generations and is impossible to replicate with one-off custom hardware.

“We saved significant time by using the XEM6001 as compared to rolling our own solution. Using the XEM6001 was an attractive way to get up and running quickly. … Using Opal Kelly saved us months in our development efforts.”

— Craig Patten, Plexon Inc., via Opal Kelly customer story

Plexon

Selecting the Right Module

The right module depends on acquisition bandwidth, required logic density, and peripheral expansion needs. The table below maps common medical imaging scenarios to the appropriate platform. Visit the Opal Kelly Health Science page for reference designs specific to this domain.

Module FPGA Interfaces Best For
XEM8310 /
XEM8305
Artix UltraScale+ (XCAU25P) USB 3.0 (350+ MiB/s), 3x mezzanine, 2 GiB DDR4 (8310) / 1 GiB DDR4 (8305) OEM production and compact integration. Optimized for space-constrained deployments (80x50mm, 30g).
XEM8370 Kintex UltraScale+ (XCKU11P) USB 3.0 (350+ MiB/s), 3x mezzanine, 4 GiB DDR4, 301 I/O, 20 GTH + 8 GTY High logic density pipelines: complex beamforming, multi-channel FFT chains, and dense transceivers.
XEM8320 Artix UltraScale+ (XCAU25P) USB 3.0 (350+ MiB/s), 4x SYZYGY Std + 2x SYZYGY TXR4, 1 GiB DDR4, 2x SFP+ Rapid prototyping with modular peripheral expansion via SYZYGY pods (ADC, DAC, Camera).
XEM8350 Kintex UltraScale (XCKU060) Dual USB 3.0 (600+ MiB/s combined), 4 GiB ECC DDR4, 28 GTH transceivers, 332 I/O Maximum sustained host transfer bandwidth. Suited for high-frame-rate streaming and ECC buffering.

Getting Started

The practical starting point for any medical imaging architecture evaluation is identifying where your current system is hitting a ceiling: acquisition latency, synchronization jitter, or host bandwidth. In most cases, one of those constraints is limiting system performance, not the processing pipeline downstream of it.

Opal Kelly modules let you validate an FPGA-based architecture against real sensor data in weeks, without committing to a full custom board design. Opal Kelly’s sales team can help you identify the right starting configuration for your application.


More Info:

Visit opalkelly.com or contact sales@opalkelly.com

Explore the Health Science applications page for customer examples and application notes.

Browse all FPGA integration modules or learn about the FrontPanel SDK

By directly interfacing and processing sensor data with ultra-low latency, FPGAs can perform real-time analysis during procedures and provide diagnostic feedback that can be critical to achieving higher percentages of successful patient outcomes.

Subscribe to the Opal Kelly Newsletter

FOLLOW US:

Sorry, please correct the following errors:
  • Bullet 1
  • Bullet 2
  • Bullet 3

Receive general announcements and product lifecycle updates. We'll never spam or give this address away.