SpectraView: Canvas-First Spectral Visualization for the Browser
Read this post on tubhyam.dev for interactive components, animations, and the best reading experience.
Most spectrum viewers on the web use Plotly, Chart.js, or an SVG-based charting library. These work fine for 50-point bar charts. They do not work for vibrational spectra — 2,048 to 8,192 data points, sub-wavenumber precision, users who zoom in to inspect individual peak shoulders. The DOM chokes. Interactions lag. Scientists go back to Origin.
SpectraView exists because I needed a spectrum viewer for Spektron that could render IR and Raman spectra in the browser without compromise. It's a React component — npm install spectraview — that renders spectral data on Canvas and everything else on SVG.
The Rendering Pipeline
Every frame follows the same path: raw data comes in, gets downsampled to the visible resolution, draws onto Canvas as a single polyline, then SVG handles the interactive overlay. This split is what makes 60fps possible at 10,000 points.
The Rendering Split
A spectrum is a dense polyline — thousands of (x, y) pairs connected by segments. SVG represents each segment as a DOM node. At 4,000 points, that's 4,000 <line> or <path> elements that the browser must lay out, paint, and composite. Zooming reflows the entire subtree.
Canvas has none of this overhead. ctx.lineTo() draws to a pixel buffer. 10,000 points take the same time as 100. But Canvas has no event model, no hover detection, no accessible text.
SpectraView uses both. The spectral data renders on Canvas for performance. Axes, grid lines, peak markers, annotations, and the crosshair render as SVG for interactivity and accessibility. The layers are composited by absolute positioning within a single container.
Performance: Why the Split Matters
The difference between SVG-only and Canvas+SVG isn't academic. It's the difference between a usable tool and a frustrating one. Here are the actual numbers from benchmarking with a typical IR spectrum at various point counts:
SVG-only rendering at 10,000 points
SVG creates 10,000 DOM elements — one <line> or path segment per data point. Browser tab memory spikes to 200MB. Frame rate drops to 8fps during pan operations. Zooming triggers a full reflow of the SVG subtree, which takes 120ms — visible as a stutter every time you scroll the mouse wheel. Opening DevTools shows the Recalculate Style phase dominating the frame budget. Panning feels like dragging through mud.
Canvas + SVG hybrid at 10,000 points
Canvas draws the entire spectrum path as a single bitmap operation — beginPath(), a loop of lineTo() calls, one stroke(). Memory stays at 30MB. Frame rate locks at 60fps because the Canvas draw call takes less than 2ms, well within the 16ms frame budget. The SVG overlay handles only interactive elements — the crosshair, peak labels, axis ticks, and selection region — typically fewer than 50 DOM nodes regardless of how many data points are in the spectrum.
The memory difference is the most dramatic. At 50,000 points (common for high-resolution Raman spectra), SVG-only rendering causes Chrome to allocate over 800MB for the tab. The Canvas hybrid stays under 40MB because the pixel buffer is fixed at the canvas resolution — 1600x800 pixels regardless of data density.
Zoom performance is where the architecture pays off most. When the user zooms into a region, SpectraView re-runs LTTB downsampling on the visible window, clears the canvas, and redraws. The entire operation takes 3-4ms. In SVG, the equivalent operation requires updating the d attribute of a <path> element with 10,000 new coordinate pairs, which triggers a full DOM parse and relayout.
LTTB Downsampling
Even Canvas has limits. Drawing 50,000 lineTo calls per frame during a zoom animation is wasteful — most segments map to the same pixel column. SpectraView uses the Largest-Triangle-Three-Buckets (LTTB) algorithm to reduce the rendered point count to plotWidth * 2 while preserving visual fidelity.
LTTB divides the data into equal-sized buckets and selects the point in each bucket that forms the largest triangle with its neighbors. Unlike simple min-max decimation, it preserves the visual shape of peaks and shoulders. The downsampling happens per frame on the zoomed window, so you never lose resolution — zoom in and the algorithm selects more detail.
The implementation is adapted from Sveinn Steinarsson's original paper. A key optimization: SpectraView pre-sorts the data by x-coordinate and uses a sliding-window approach rather than re-partitioning the entire dataset on every zoom change. For a 50,000-point spectrum, LTTB downsampling to 1,000 display points takes 0.8ms — fast enough to run every frame during a smooth zoom animation.
Composable Architecture
SpectraView is not a monolithic charting component. It's 15 components and 9 hooks that compose together. The main <SpectraView /> component is a convenience wrapper — every internal layer is independently exported.
The hooks handle state. useZoomPan wraps d3-zoom with memoized scale management. usePeakPicking runs prominence-based peak detection reactively whenever spectra change. useNormalization applies spectral transformations — baseline correction, normalization, smoothing, derivatives — returning new Spectrum objects without mutating the originals.
This design means you can build a custom viewer that has a minimap and tooltip but no toolbar, or a comparison view with two independent zoom states, or a processing dashboard that chains useNormalization with useHistory for undo/redo.
Building a Custom Viewer in 4 Steps
The composable architecture means you can build exactly the viewer you need, nothing more. Here's a step-by-step walkthrough of building a custom spectral viewer with navigation overview and automatic peak annotation.
Step 1: Load data with the useSpectrum hook. This hook manages the spectrum data lifecycle — loading, parsing, and providing typed access to wavenumbers and intensities. It accepts raw arrays, file objects, or URL strings and normalizes them all into the same Spectrum type.
Step 2: Add the Canvas layer for spectrum rendering. SpectrumCanvas handles all the heavy lifting — LTTB downsampling, device-pixel-ratio scaling for Retina displays, and efficient redraw on zoom/pan.
Step 3: Layer the Minimap for navigation. The Minimap shows the full spectrum in miniature with a viewport indicator showing the currently zoomed region. Dragging the viewport pans the main view. The minimap renders on its own Canvas instance, so it doesn't interfere with the main view's frame budget.
Step 4: Add automatic peak annotation with PeakMarkers. The usePeakPicking hook runs prominence-based detection (using the same algorithm as SciPy's find_peaks) whenever the spectrum or zoom level changes. PeakMarkers renders the results as SVG circles with tooltips showing wavenumber and intensity values.
Each step adds exactly one capability. No step requires any other (except useZoomPan, which all layers share for coordinate synchronization). You can stop at step 2 for a minimal viewer, or continue adding Crosshair, Toolbar, RegionSelector, and Legend as needed.
Parsing Four Formats
Spectroscopists store data in JCAMP-DX, CSV, SPC, and JSON. SpectraView parses all four.
JCAMP-DX: Six Formats in a Trenchcoat
JCAMP-DX is the IUPAC "standard" for spectral data exchange. In theory, it's a simple text format with labeled data records. In practice, it's six different encoding schemes wearing a trenchcoat pretending to be one format.
The two most common variants are XYDATA and PEAK TABLE. An XYDATA block contains the full spectrum — every data point — while a PEAK TABLE contains only peak positions and intensities. SpectraView's parser handles both, but XYDATA is where the complexity lives.
Within XYDATA, there's AFFN (ASCII Free Format Numeric) where each line contains explicit X,Y pairs, and DIFDUP where only the first X value is given along with a ##DELTAX step size, and Y values are stored as differences from the previous value using a compressed single-character encoding. The compression scheme uses uppercase letters A-I to represent digits 1-9 for positive differences and lowercase a-i for negative differences, with @ for zero. A single JCAMP file can contain multiple data blocks with different encoding schemes.
SpectraView's parser detects the encoding from the ##XYDATA= line, handles DIFDUP decompression including the affine transform (Y_real = Y_compressed * YFACTOR + YOFFSET), and validates that the reconstructed X values match the declared ##FIRSTX and ##LASTX bounds within floating-point tolerance.
SPC: Reverse-Engineering a Binary Format
The SPC parser is the most interesting. SPC is a binary format from the 1990s — little-endian, variable-length headers, two different Y-data encodings (16-bit integer or 32-bit float), optional per-sub-spectrum X arrays. The parser reads the format using DataView for byte-level access, handles both single and multi-spectrum files, and automatically maps Thermo's numeric type codes to human-readable unit labels.
The file structure starts with a 512-byte main header containing magic bytes, version flags, the number of sub-spectra, and type codes for X and Y units. The type codes are a numeric enum — 1 means wavenumber (cm-1), 2 means micrometers, 3 means nanometers, 13 means Raman shift, and so on up to type 27. After the header, sub-spectra are stored sequentially, each with its own 32-byte sub-header followed by Y data.
The trickiest part: Y-data encoding depends on the ftflgs byte in the main header. If bit 0 is set, Y values are 16-bit integers that need to be multiplied by an exponent stored in the header (YREAL = Y_INT * 2^EXPONENT). If bit 0 is clear, Y values are IEEE 754 32-bit floats. Older Galactic instruments always use the integer format. Newer Thermo instruments default to float. SpectraView detects both and converts to Float64Array for uniform downstream handling.
CSV: The Non-Standard Standard
CSV is the format scientists default to when nothing else works. The problem: there is no standard. SpectraView's CSV parser handles three levels of ambiguity:
Delimiter detection. Is it comma-separated, tab-separated, or semicolon-separated? The parser scans the first 10 lines and counts occurrences of each candidate delimiter, choosing the one that produces the most consistent column count. European CSV files use semicolons because the comma is the decimal separator in those locales.
Header detection. Does the first row contain column names or data? The parser checks whether the first row contains any non-numeric values. If it does, it's treated as a header row and the column names are used for metadata. If all values are numeric, the first row is data.
Wavenumber identification. Which column is the X axis? The parser uses a heuristic: the first column with monotonically increasing or decreasing values and a range typical of spectroscopic data (400-4000 for IR, 100-4000 for Raman, 200-800 for UV-Vis) is treated as wavenumbers. All other numeric columns become Y series. This heuristic gets it right about 95% of the time — the remaining 5% is what the explicit column mapping API is for.
Processing in the Browser
SpectraView includes spectral processing utilities that run entirely client-side. No server round-trip. No Python dependency.
Every processing function follows the same contract as SpectraKit: arrays in, arrays out. No mutation, no side effects. The useNormalization hook wraps these into a reactive pipeline — change the mode from "none" to "snv" and all spectra re-render with SNV normalization applied, memoized so it doesn't recompute on every frame.
The SpectraKit Connection
SpectraView and SpectraKit are companion libraries that share the same design philosophy but target different runtimes:
Process heavy datasets with SpectraKit in Python. Visualize results with SpectraView in the browser. Both use functional APIs over typed arrays. A spectrum processed by one library can be displayed by the other with zero transformation.
The practical workflow looks like this: you preprocess 10,000 spectra using SpectraKit in a Python pipeline, export the results as HDF5 or JSON, and load them into a SpectraView-powered web dashboard for interactive exploration. The browser handles panning, zooming, peak picking, and comparison — the things scientists need to do interactively. The heavy preprocessing stays on the server where NumPy and SciPy run natively.
Testing
266 tests across 35 test files. Every component, hook, parser, and utility function is tested. The test suite covers:
- Parsers — JCAMP-DX (including DIFDUP decompression), CSV (delimiter detection, header detection), JSON, SPC binary format (both integer and float Y-data)
- Processing — Baseline correction, normalization, smoothing, derivatives, with edge cases for empty arrays and constant signals
- Comparison — Difference spectrum, correlation coefficient, spectral angle, interpolation for mismatched X grids
- Components — All 15 React components via Testing Library, including Canvas mock rendering
- Hooks — Zoom/pan state management, peak detection accuracy, region selection, undo/redo stack integrity
The Canvas testing deserves mention. Testing Canvas rendering is notoriously difficult because the output is a pixel buffer, not a DOM tree. SpectraView's test suite uses a Canvas mock that records all draw calls — beginPath, moveTo, lineTo, stroke — as a call log. Tests assert on the sequence and coordinates of draw calls rather than pixel values. This catches regressions in the drawing logic (wrong scale transform, off-by-one in LTTB output) without the fragility of snapshot-based visual regression tests.
What's Next
SpectraView is the visualization layer for Spektron. The next integration is a prediction viewer — load a spectrum, run it through the Spektron model, and overlay the predicted molecular structure alongside the input spectrum. The composable architecture makes this straightforward: add a new component, wire it to the existing zoom state, render.
Originally published at tubhyam.dev/blog/spectraview