Fixed-Point Math Actually Explained (with a Calculator I Wish I'd Had Years Ago)

Q formats, the mixed-Q mpy/div tricks that save precision, and the quantization traps nobody warns you about - with an interactive calculator embedded alongside each operation so you never have to open Excel for fixed-point math again.

By Sal Torre Published May 1, 2026 23 min

Co-written with AI assistance from Claude Opus 4.7.

#iqmath #qmath #fixed-point #dsp #c2000

Math on a Microcontroller

In many applications like motor control or other signal processing applications, most of the math produces fractional numbers. Currents in per-unit. Control errors. Gain terms. Sin/cos table lookups. Filter coefficients. Almost none of those land on whole integers. So you have a problem: the calculations you need to do produce fractions, but the hardware you have to do them on only work with integers (unless of course, your chip has a floating point unit [FPU]).

Enter Fixed-Point Math Libraries

Fixed-point math is how you bridge that gap. You agree, by convention, that a particular signed integer represents a real number at a fixed binary scale - say, the stored integer times 2^-24. That fixed scale is the binary point: the same idea as a decimal point, but positioned by powers of 2 instead of 10. It tells you how many of the stored bits live “after the point” and represent the fraction versus how many represent whole-number magnitude. The CPU sees integers and does integer math at full speed; your code sees real numbers once you apply the scale. Nothing about the hardware changes; it’s an understanding that this integer means this fraction.

That convention is all fixed-point really is. Every vendor library - TI’s MSP-IQmath for MSP430, the IQmath headers bundled in C2000Ware for C2000, Microchip’s QMath, and the equivalents from other silicon vendors - is a small collection of helpers that apply the scale for you: a macro to turn a C literal like 0.123 into the right integer at compile time, and runtime routines for multiplication and division that handle the shifting and rounding. Cheap, deterministic, and fast enough to run inside a 10 kHz control ISR without breaking a sweat.

Example:

A number like 1.234 can be represented as 1.234 * 2^-24 or 20,703,084.

The Catch

The catch is that fixed-point is unforgiving in a specific way. The math looks like integer math, but you have to constantly think about where the binary point is. Get it wrong and you silently lose precision, overflow, or both. Get it right and you get near-FPU-quality arithmetic on potentially a much cheaper part.

This post walks through how this works, the nuances of addition/subtraction and multiplication/division, what truncation does to your answers, and how mixing different formats lets you keep precision when operands have very different magnitudes. An interactive calculator is embedded alongside each operation so you can try every concept as you read.

Q Formats: Q0 Through Q30

A “Q format” is a way of storing a real number inside a signed 32-bit integer by agreeing on where the implicit binary point lives. For a value in format QN, where N is a number between 0 and 30:

The stored integer represents real_value x 2^N
To get the real value back: stored / 2^N
One sign bit consumes the top bit, so the range is [-2^(31-N), +2^(31-N) - 2^(-N)]
The precision (LSB value) is 2^(-N)

Here’s the range/precision tradeoff at a few common Q values:

Format	Range	Precision (LSB)	Typical use
Q0	+/-2,147,483,648	1	Plain signed integer counter
Q8	+/-8,388,608	~0.0039	Temperature, coarse percent
Q15	+/-65,536	~0.0000305	Common. Voltage, speed - anything that spans hundreds and thousands in real units
Q24	+/-128	~0.0000000596	Common. Near-unity values - per-unit currents, control errors, sin/cos tables
Q30	+/-2	~0.000000000931	Very high precision needed

Rule of thumb: pick the largest Q that still leaves headroom above your value’s peak - that gives you the most precision without risking an overflow. Q15 and Q24 are the two formats you’ll see most often in shipped code. Q7 and Q8 show up less frequently, but they’re useful for packing 16-bit lookup tables that you later convert up to Q15 at runtime.

The Magic Is in Bit-Shifting

Notice that multiplying a number by 2 raised to the power of N is equivalent to bit-shifting left N times, and multiplying a number by 2 raised to the power of -N is equivalent to bit-shifting right N times. This is the foundation of how fixed-point math works. Bit-shifting typically takes only 1 clock cycle.

Below is a calculator I made to help others understand how to work with fixed-point numbers. Try entering 1.25 in Q24, then flip it to Q10. The real value is the same but the raw integer changes. Then try 65537 in Q15 - it returns an out of range message.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 20971520 = 20971520 / 2^24

Stored as: 1.250000000

Quant error: 0.000000%

Q24: range +/-128, LSB 5.96e-8

Q24(must match A for +/-)

Raw int: 4194304 = 4194304 / 2^24

Stored as: 0.2500000000

Quant error: 0.000000%

Q24: range +/-128, LSB 5.96e-8

Output Q = Q24 - +/- requires both operands in the same Q format.

C code

IQmath style

_IQ24(1.25) + _IQ24(0.25);   // result Q24

Result

1.500000000

Ideal (real): 1.50000000000

Raw int (Q24): 25165824

Total error vs real math: 0.000000%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

Addition and Subtraction

If both operands are in the same Q format, addition is just:

IQ15 result = a + b;   // both Q15, result Q15

No shifts, no scale factors - adding two value x 2^15 quantities gives (value_a + value_b) x 2^15, which is still Q15. The 2^15 factors out. Just be careful of overflow by understanding the ranges of numbers that will be used.

You cannot directly add or subtract two values in different Q formats. a24 + b15 as raw ints gives you a garbage number - the two values are on different number lines, so adding the stored integers adds them at different scales.

Concretely, with a = 1.5 in Q24 and b = 0.25 in Q15:

	Real value	Stored integer
a_q24	1.5	25,165,824
b_q15	0.25	8,192
raw sum	???	25,174,016

Interpret that raw sum as Q24: 25,174,016 / 2^24 = 1.5005 - should be 1.75. Interpret it as Q15: 25,174,016 / 2^15 = 768.19 - also not 1.75.

The bits add fine. They just don’t mean anything together until you align the scales first. The only way out is to convert one operand to match the other before the +/- operation:

IQ24 a_q24 = ...;
IQ15 b_q15 = ...;
IQ15 a_q15 = _IQ24toIQ15(a_q24);   // convert, possibly losing precision
IQ15 sum = a_q15 + b_q15;          // now both are Q15

Conversion itself has potential risks (left-shift can overflow if b15 was near the top of Q15’s range; right-shift loses LSB precision) so be conscious of the values you will be working with.

The calculator below locks B’s Q to match A’s when you pick + or -. There’s no meaningful “different Qs for add” case - either you align them first, or you have a software defect.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 1572864 = 1572864 / 2^20

Stored as: 1.500000000

Quant error: 0.000000%

Q20: range +/-2048, LSB 9.54e-7

Q20(must match A for +/-)

Raw int: 262144 = 262144 / 2^20

Stored as: 0.2500000000

Quant error: 0.000000%

Q20: range +/-2048, LSB 9.54e-7

Output Q = Q20 - +/- requires both operands in the same Q format.

C code

IQmath style

_IQ20(1.5) + _IQ20(0.25);   // result Q20

Result

1.750000000

Ideal (real): 1.75000000000

Raw int (Q20): 1835008

Total error vs real math: 0.000000%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

Multiplication

Multiplying two Q-format numbers produces a result whose Q format is the sum of the two input Q formats:

(a x 2^Qa) x (b x 2^Qb) = (a x b) x 2^(Qa + Qb)

If both operands are Q15, the product has 2^30 worth of scale - which overflows Int32 for any value near 1. You need a 64-bit intermediate, then a right-shift to land back in a sensible Q. That’s exactly why you would use the function _IQNmpy instead of the * operator:

IQ24 a = _IQ24(1.25);         // = 20971520
IQ24 b = _IQ24(2.00);         // = 33554432
IQ24 result = _IQ24mpy(a, b); // (20971520 x 33554432) >> 24 = 41943040 (= 2.5 in Q24)

For same-Q operands, _IQ24mpy shifts right by 24 and you land back in Q24. For different-Q operands, the shift still happens - the result’s Q ends up being Qa + Qb - N, where N is the number in _IQNmpy.

The Trick: Pick N to Land the Output in One of the Input Qs

Given a in Q24 and b in Q15, there are two N values that make sense and one to avoid:

Call	Product scale	Shift	Result scale	Output Q	Matches
_IQ15mpy(a_Q24, b_Q15)	2^39	>> 15	2^24	Q24	A
_IQ24mpy(a_Q24, b_Q15)	2^39	>> 24	2^15	Q15	B
_IQ20mpy(a_Q24, b_Q15)	2^39	>> 20	2^19	Q19	neither - footgun

The N in _IQNmpy is the shift count, not the output Q. Pick N = qA or N = qB and the result lands in one of your input formats. Pick anything else and you get a third Q you probably didn’t intend.

The calculator below picks the shift N directly so you can see the effect of every choice. The two quick-pick buttons give you the two standard moves - N = qB leaves the output in A’s Q, N = qA leaves it in B’s Q - and the C code updates to the matching _IQ{N}mpy(...) either way.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 20971520 = 20971520 / 2^24

Stored as: 1.250000000

Quant error: 0.000000%

Q24: range +/-128, LSB 5.96e-8

Raw int: 10923 = 10923 / 2^15

Stored as: 0.3333435059

Quant error: 0.0032%

Q15: range +/-65536, LSB 0.0000305

Shift count N in _IQNmpy:

Output Q = Q24qA + qB - N = 24 + 15 - 15 = 24 - follows A's Q

C code

IQmath style

_IQ15mpy(_IQ24(1.25), _IQ15(0.333333));   // result Q24

Result

0.4166793823

Ideal (real): 0.416666250000

Raw int (Q24): 6990720

Total error vs real math: 0.0032%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

Flip N between 15 (output follows A, Q24) and 24 (output follows B, Q15) and watch the result error change. One of those probably gives you meaningfully better precision for this specific pair; the other is your overflow headroom. Push N to something neither qA nor qB and the calculator flags it - valid math, but you’ve landed in a Q you probably didn’t mean to.

Q0 x QN: Native Multiply Works

There’s a free simplification hiding in the shift formula. When one operand is Q0 - a plain integer count - the _IQNmpy shift collapses: N = 0 means no shift after the multiply, which means no helper function. The plain * operator does the right thing.

The canonical example is scaling a 12-bit ADC reading into a per-unit value for the control loop. Practically every embedded project that reads a sensor does some version of this:

IQ24 inv_4095 = _IQ24(1.0 / 4095.0);     // precomputed, ~ 0.000244, Q24
// ... each sample:
uint16_t adc = read_adc();                // 12-bit count, naturally Q0
IQ24 adc_pu = (int32_t)adc * inv_4095;    // Q0 x Q24 = Q24, no library call

Output Q = qA + qB - N = 0 + 24 - 0 = Q24, just from integer arithmetic on a signed-32 variable. The int32_t cast keeps the compiler from doing anything surprising with the uint16_t promotion.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 2874 = 2874 / 2^0

Stored as: 2874.000000

Quant error: 0.000000%

Q0: range +/-2147483648, LSB 1.00

Raw int: 4094 = 4094 / 2^24

Stored as: 0.0002440214157

Quant error: 0.0088%

Q24: range +/-128, LSB 5.96e-8

Shift count N in _IQNmpy:

Output Q = Q24qA + qB - N = 0 + 24 - 0 = 24 - follows B's Q

C code

IQmath style

_IQ0mpy(_IQ0(2874), _IQ24(0.000244));   // result Q24

Result

0.7013175488

Ideal (real): 0.701256000000

Raw int (Q24): 11766156

Total error vs real math: 0.0088%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

The calculator shows 2874 counts (about 70% of full scale) normalizing to ~ 0.701 per-unit, tracking the ideal 2874/4095 to essentially machine precision. This pattern applies any time a raw integer measurement meets a fractional gain - ADCs, encoders, pulse counts, temperature sensor LSBs. Just check the worst-case product fits in Int32: 4095 x 4097 ~ 1.7x10^7 is well within range here. A near-unity Q24 gain (raw value up around 2^24) would overflow against a 12-bit count and you’d need to promote to 64-bit first.

Big x Small = Precision Loss (the Hidden Trap)

When neither operand is Q0 you lose the *-operator shortcut - and that’s where mixed-Q actually earns its keep. The natural counterpart to the ADC normalization above is the reverse trip: a per-unit value coming out of your control loop multiplied by a base scale to recover engineering units for display, logging, or a downstream peripheral.

Take a 1200 A motor drive. The per-unit current from the control loop is a near-unity value; the base scale is 1200 A itself. Completely different magnitudes, different Q formats forced on each:

	i_pu (~0.001 per-unit)	base_current (1200 A)
Format	Q24	Q15
Range	+/-128 (fits trivially)	+/-65,536 (fits easily)
LSB	~6 x 10^-8	~30 uA
Why that Q	Tiny value needs resolution	1200 A blows past Q24’s +/-128 ceiling

IQ15 base_current = _IQ15(1200.0);                // 1200 A base, Q15
// ... each sample, after the control loop:
IQ24 i_pu = /* per-unit current from loop, Q24 */;
IQ15 i_amps = _IQ24mpy(i_pu, base_current);       // per-unit x amps = amps, Q15

_IQ24mpy shifts the product right by 24, landing the result at qA + qB - N = 24 + 15 - 24 = Q15 - matching the base scale’s format, which is what the display consumer expects.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 20636 = 20636 / 2^24

Stored as: 0.001230001450

Quant error: 0.000118%

Q24: range +/-128, LSB 5.96e-8

Raw int: 39321600 = 39321600 / 2^15

Stored as: 1200.000000

Quant error: 0.000000%

Q15: range +/-65536, LSB 0.0000305

Shift count N in _IQNmpy:

Output Q = Q15qA + qB - N = 24 + 15 - 24 = 15 - follows B's Q

C code

IQmath style

_IQ24mpy(_IQ24(0.00123), _IQ15(1200));   // result Q15

Result

1.475982666

Ideal (real): 1.47600000000

Raw int (Q15): 48365

Total error vs real math: 0.0012%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

The calculator is set up with a 0.00123 per-unit reading (Q24) times the 1200 A base (Q15), producing ~ 1.476 A for display. Now the precision story: if you’d stored the per-unit operand in Q15 instead of Q24 - the lazy “everything’s Q15” move - 0.00123 rounds to 40/32768 ~ 0.00122, roughly 0.7% error on the input before the multiply. Flip A’s Q from 24 to 15 in the calculator and the quant-error line on A and the total error on the result both jump by about 40x. Try 0.5 instead and both Q choices look equally clean - the precision hit isn’t uniform, it hurts most when the small operand is actually small.

This is the same rule the ADC case followed, just run backwards. The takeaway is consistent: pick the Q format for each operand based on its own magnitude. Small values (per-unit, errors, deviations) want high Q for resolution. Big values (base scales, rated magnitudes) want low Q for range. Pick N in _IQNmpy to land the product in whichever Q your downstream consumer expects.

Division

Division mirrors multiplication in its arithmetic: pre-shift the numerator left by N, then divide by the denominator. For same-Q operands the straightforward form is:

IQ24 quotient = _IQ24div(a, b);   // (a << 24) / b, both in Q24, result Q24

For mixed-Q operands the output Q works out to qA + N - qB. Pick N arbitrarily and you get a format you didn’t intend: _IQ20div(_IQ24(1.25), _IQ15(0.333)) produces a result at scale 2^(24+20-15) = 2^29 = Q29, which is almost never the Q you meant to be working in.

Two patterns cover almost every division you’ll write. Pattern 1 is the workhorse - usually run once at startup. Pattern 2 is rarer but worth recognizing when you hit it.

Pattern 1: Per-Unit Conversion

The workhorse: normalize two real-unit quantities into a high-precision near-unity format. Ratio a value against a base so the result sits in [0, 1]-ish and downstream code can use Q24 without caring about the original engineering units.

Speed as a fraction of max speed. Current as a fraction of max current. Voltage as a fraction of nominal. In all of these, both the numerator and denominator naturally live in a lower-Q format (Q15 for RPM, amps, volts) but the ratio wants the extra precision of Q24.

Crucially, this is a one-time startup calc - usually against nameplate or configuration values - not something you’d do on live samples inside a control ISR (more on why below). You divide once, store the result, and multiply by it afterward.

Say your maximum readable speed is 1500 RPM and maps to per-unit 1.0. Your rated base speed is 1000 RPM, stored in Q15. You want that rated speed expressed in per-unit Q24:

// One-time startup calc: convert rated speed to per-unit in Q24
// Both operands Q15; N = 24 places the result in Q24
IQ24 base_speed_pu = _IQ24div(_IQ15(1000), _IQ15(1500));   // ~ 0.66667

Working the shift: _IQ24div pre-shifts the Q15 numerator left by 24 (scale 2^(15+24) = 2^39), then divides by the Q15 denominator (scale 2^15). The result scale is 2^(39 - 15) = 2^24 - exactly Q24. The formula holds: output Q = qA + N - qB = 15 + 24 - 15 = 24.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 32768000 = 32768000 / 2^15

Stored as: 1000.000000

Quant error: 0.000000%

Q15: range +/-65536, LSB 0.0000305

Raw int: 49152000 = 49152000 / 2^15

Stored as: 1500.000000

Quant error: 0.000000%

Q15: range +/-65536, LSB 0.0000305

Shift count N in _IQNdiv:

Output Q = Q24qA + N - qB = 15 + 24 - 15 = 24 - follows higher-precision Q (per-unit conversion)

Per-unit / scale-up division. N = 24 > qB, so the output lifts into a higher-precision format (Q24). Common for normalizing real-unit values against a base so downstream math runs in Q24 or similar.

C code

IQmath style

_IQ24div(_IQ15(1000), _IQ15(1500));   // result Q24

Result

0.6666666865

Ideal (real): 0.666666666667

Raw int (Q24): 11184811

Total error vs real math: 0.000003%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

Push N up from the qB = 15 quick-pick default into the 20s and watch the output Q climb with it. N = 24 lifts the answer into Q24 where it belongs for a near-unity per-unit value, and the calculator tags it as a legitimate per-unit / scale-up move. N = 15 drops the result back to Q15 - fine for intermediate math but way too coarse here. The amber “heads up” only fires when N is below qB for division - the _IQ20div footgun above, where the output lands in a third Q format you didn’t intend.

Pattern 2: Q-Format Switch (Ratio in A’s Q)

Less common, but worth knowing. You have a numerator A in some Q format and a denominator B in a genuinely different non-trivial Q, and you want the quotient back in A’s Q so downstream math that consumes it stays in the same frame. Pick N = qB and output Q = qA + N - qB = qA - A’s format falls through. (If the denominator is Q0, you can skip the library call and use / directly, same way you use * for a Q0 multiply.)

The universal case is Ohm’s law with a small-magnitude resistance: I = V / R. V spans tens to hundreds of volts, so it lives in Q15. R for a motor winding, a brake resistor, a shunt, or a cable run is tens of milliohms - small enough that Q15’s 3x10^-5 LSB eats meaningful precision and you stash it in Q24 instead. The current you compute is an engineering-units amp value, and you want it back in Q15 where every other current in the system lives.

// Drive init, motor stopped - safe to divide:
IQ15 v_rated   = _IQ15(24.0);                   // rated supply voltage, Q15
IQ24 r_stator  = _IQ24(0.1);                    // 100 mOhm winding resistance, Q24
IQ15 i_locked  = _IQ24div(v_rated, r_stator);   // ~ 240 A locked-rotor current, Q15
// Used downstream as a fault threshold, soft-start ceiling, protection trip, etc.

_IQ24div pre-shifts the numerator by N = qB = 24 (matching the denominator’s Q), divides, and lands the result at qA + N - qB = 15 + 24 - 24 = 15 - Q15, same format as v_rated.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 786432 = 786432 / 2^15

Stored as: 24.00000000

Quant error: 0.000000%

Q15: range +/-65536, LSB 0.0000305

Raw int: 1677722 = 1677722 / 2^24

Stored as: 0.1000000238

Quant error: 0.000024%

Q24: range +/-128, LSB 5.96e-8

Shift count N in _IQNdiv:

Output Q = Q15qA + N - qB = 15 + 24 - 24 = 15 - follows A's Q

C code

IQmath style

_IQ24div(_IQ15(24), _IQ24(0.1));   // result Q15

Result

239.9999390

Ideal (real): 240.000000000

Raw int (Q15): 7864318

Total error vs real math: 0.000025%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

The same shape shows up in any startup calc that mixes a moderate-scale physical quantity (Q15) with a small coefficient or sensitivity (Q24):

Braking-resistor current from DC bus voltage and a low-value brake R.
Angular acceleration alpha = T / J for a small-inertia motor (Q15 torque, Q24 J in kg*m^2).
Thermal rise deltaT = P / G_th for a highly-conductive heat path (Q15 power, Q24 conductance).
Flow rate through a low-impedance restriction in fluid-power systems.

Pattern 2 lives outside the hot loop - at init, on parameter updates, during safe-to-stall maintenance ops - and that’s no accident. Out there, the divide’s 20-70 cycle cost is free money. Inside a running control ISR, it is decidedly not.

But Inside the Loop, Division Is Expensive

A 10 kHz control ISR on a 150 MHz C28x has 15,000 cycles of budget per pass. An IQ multiply is ~7 cycles. An IQ divide is 20-70 cycles depending on implementation and data - a handful of live divides and you’ve burned a measurable chunk of your entire loop on operations you could have replaced with a multiply.

The fix is almost always: multiply by the reciprocal constant, computed once at compile time.

// Bad: runtime divide
speed_fraction = speed / 1000;

// Good: compile-time reciprocal, runtime multiply
#define INV_1000 _IQ24(0.001)
speed_fraction = _IQ24mpy(speed, INV_1000);

The _IQ24(0.001) macro is where the preprocessor earns its keep. _IQ24(x) is typically defined as ((IQ24) ((x) * 16777216.0)) - the compiler evaluates 0.001 x 2^24 at compile time and substitutes the resulting integer literal into your code. Zero runtime cost. You write readable C (_IQ24(0.001)) and the emitted instructions are identical to what you’d get from hand-typing 16777.

This is also why _IQ24(1.0/3.0) is safe: the preprocessor does the float division at compile time, you get a clean Int constant at runtime.

A Practical Example: The Speed Loop That Wouldn’t Tune

Early in the Venturi Buckeye Bullet 3 project - the Ohio State / Venturi EV that went after the FIA electric land-speed record - I wrote the outer speed loop for the permanent-magnet traction inverter. Setpoint, feedback, and error all as plain integers in RPM. Effectively everything was in Q0. The code compiled, the motor spun, and then it refused to stop hunting around the setpoint. About +/-2-3 RPM of lazy oscillation - 4 to 6 RPM peak-to-peak - that no amount of Kp / Ki tuning could damp out. I could hear it - the motor pulsed audibly on the dyno - and it was bad enough to corrupt every attempt to tune the inner current loop underneath.

The symptom looked like a tuning problem. It was a quantization problem.

At a 1500 RPM setpoint, one LSB of Q0 is 1 full RPM - and the cast from a fractional measurement to integer truncates. A real speed of 1500.4 RPM truncates to 1500. So does 1500.9. The measured error sits stubbornly at 0 across that entire RPM-wide window above setpoint, then steps to +1 only when the true speed crosses below 1500.0, then back to 0 the instant it climbs above. During the zero-error deadband the integrator holds while fractional error the controller can’t see keeps building. When the measured error finally flips to +/-1, the integrator corrects hard against a state that’s already wrong - the motor overshoots, the loop swings to the other side, and that’s the +/-2-3 RPM limit cycle I was hearing on the dyno.

You can see the quantization collapse directly. The calculator below does the same subtraction the loop was doing: 1500.4 - 1500 in Q0. The ideal answer is 0.4. The fixed-point answer is 0.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 1500 = 1500 / 2^0

Stored as: 1500.000000

Quant error: 0.0267%

Q0: range +/-2147483648, LSB 1.00

Q0(must match A for +/-)

Raw int: 1500 = 1500 / 2^0

Stored as: 1500.000000

Quant error: 0.000000%

Q0: range +/-2147483648, LSB 1.00

Output Q = Q0 - +/- requires both operands in the same Q format.

C code

IQmath style

_IQ0(1500.4) - _IQ0(1500);   // result Q0

Result

0.000000000

Ideal (real): 0.400000000000

Raw int (Q0): 0

Total error vs real math: 100.0000%- significant, check Q choices

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

The fix was converting the entire loop to per-unit in Q24: divide RPM by a base (max) speed once, store setpoint and feedback as fractions of that base, and do all loop math in Q24. One LSB of Q24 at a 1500 RPM base works out to 1500 x 2^-24 ~ 9x10^-5 RPM - about a ten-thousandth of an RPM of resolution on the state variable.

Same real measurement, expressed per-unit: 1500.4 / 1500 ~ 1.000267. Subtract the 1.0 per-unit setpoint and you recover the actual 0.000267 per-unit error - a clean continuous signal the PI controller can actually respond to.

Fixed-Point Math Calculator

Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.

Raw int: 16781696 = 16781696 / 2^24

Stored as: 1.000267029

Quant error: 0.000003%

Q24: range +/-128, LSB 5.96e-8

Q24(must match A for +/-)

Raw int: 16777216 = 16777216 / 2^24

Stored as: 1.000000000

Quant error: 0.000000%

Q24: range +/-128, LSB 5.96e-8

Output Q = Q24 - +/- requires both operands in the same Q format.

C code

IQmath style

_IQ24(1.000267) - _IQ24(1.0);   // result Q24

Result

0.0002670288086

Ideal (real): 0.000267000000000

Raw int (Q24): 4480

Total error vs real math: 0.0108%

Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.

After the conversion, the controller settled smoothly - no retuning required. Same poles, same gains (scaled once for the new units), same plant. Just a state representation with enough resolution for the error signal to actually exist.

The lesson: when you pick a Q format, don’t just ask “does my biggest value fit?” Ask “is one LSB of this format smaller than the smallest difference my loop needs to see?”

What This All Means in Practice

A few rules of thumb from shipped motor control code:

Pick the Q format per variable, not per project. Small values need high Q; large values need low Q. A codebase with “everything is Q24” will always either overflow or underflow somewhere.
Avoid division. If the denominator is known at compile time, replace with a multiply by its reciprocal. If it’s only known at startup, compute the reciprocal once and cache it. If it varies per sample, reconsider your algorithm or take the cycle hit.
Use the library macros for constants. _IQ24(0.123) is always safer than ((IQ24) 2063598) even if you do the math right, because the next reader of the code will understand instantly.
Mixed-Q _IQNmpy is your precision lever. If you find yourself losing precision in a particular multiply, the fix isn’t to use floats - it’s to pick the right combination of input Qs and _IQNmpy variant.
Keep the calculator one click away. Bookmark the Fixed-Point Math Calculator and pull it up whenever you’re picking a Q format for a new variable, sizing an _IQNmpy shift, or double-checking a per-unit conversion. Checking a single Q choice in the OS calculator means tapping x^y, computing value x 2^N, rounding to an integer, confirming it fits in 32 bits, then redoing the arithmetic in reverse to get the quantization error - easily a minute per variable, longer if you mistype a power of two. The same check in this tool is a single keystroke: type the value, pick Q, read the raw integer, quantized value, and percent error live. Five seconds vs. a minute adds up fast over a few dozen variables in a real control loop.

If you’re doing fixed-point DSP on a real project and want a second set of eyes - or a full design review - let me know.

Fixed-Point Math Actually Explained (with a Calculator I Wish I'd Had Years Ago)

Math on a Microcontroller

Enter Fixed-Point Math Libraries

The Catch

Q Formats: Q0 Through Q30

The Magic Is in Bit-Shifting

Addition and Subtraction

Multiplication

The Trick: Pick N to Land the Output in One of the Input Qs

Q0 x QN: Native Multiply Works

Big x Small = Precision Loss (the Hidden Trap)

Division

Pattern 1: Per-Unit Conversion

Pattern 2: Q-Format Switch (Ratio in A’s Q)

But Inside the Loop, Division Is Expensive

A Practical Example: The Speed Loop That Wouldn’t Tune

What This All Means in Practice

Have questions about this topic?