Fixed-Point Math Actually Explained (with a Calculator I Wish I'd Had Years Ago)
Q formats, the mixed-Q mpy/div tricks that save precision, and the quantization traps nobody warns you about - with an interactive calculator embedded alongside each operation so you never have to open Excel for fixed-point math again.
Co-written with AI assistance from Claude Opus 4.7.
Math on a Microcontroller
In many applications like motor control or other signal processing applications, most of the math produces fractional numbers. Currents in per-unit. Control errors. Gain terms. Sin/cos table lookups. Filter coefficients. Almost none of those land on whole integers. So you have a problem: the calculations you need to do produce fractions, but the hardware you have to do them on only work with integers (unless of course, your chip has a floating point unit [FPU]).
Enter Fixed-Point Math Libraries
Fixed-point math is how you bridge that gap. You agree, by convention, that a particular signed integer represents a real number at a fixed binary scale - say, the stored integer times 2^-24. That fixed scale is the binary point: the same idea as a decimal point, but positioned by powers of 2 instead of 10. It tells you how many of the stored bits live “after the point” and represent the fraction versus how many represent whole-number magnitude. The CPU sees integers and does integer math at full speed; your code sees real numbers once you apply the scale. Nothing about the hardware changes; it’s an understanding that this integer means this fraction.
That convention is all fixed-point really is. Every vendor library - TI’s MSP-IQmath for MSP430, the IQmath headers bundled in C2000Ware for C2000, Microchip’s QMath, and the equivalents from other silicon vendors - is a small collection of helpers that apply the scale for you: a macro to turn a C literal like 0.123 into the right integer at compile time, and runtime routines for multiplication and division that handle the shifting and rounding. Cheap, deterministic, and fast enough to run inside a 10 kHz control ISR without breaking a sweat.
Example:
A number like 1.234 can be represented as 1.234 * 2^-24 or 20,703,084.
The Catch
The catch is that fixed-point is unforgiving in a specific way. The math looks like integer math, but you have to constantly think about where the binary point is. Get it wrong and you silently lose precision, overflow, or both. Get it right and you get near-FPU-quality arithmetic on potentially a much cheaper part.
This post walks through how this works, the nuances of addition/subtraction and multiplication/division, what truncation does to your answers, and how mixing different formats lets you keep precision when operands have very different magnitudes. An interactive calculator is embedded alongside each operation so you can try every concept as you read.
Q Formats: Q0 Through Q30
A “Q format” is a way of storing a real number inside a signed 32-bit integer by agreeing on where the implicit binary point lives. For a value in format QN, where N is a number between 0 and 30:
- The stored integer represents
real_value x 2^N - To get the real value back:
stored / 2^N - One sign bit consumes the top bit, so the range is
[-2^(31-N), +2^(31-N) - 2^(-N)] - The precision (LSB value) is
2^(-N)
Here’s the range/precision tradeoff at a few common Q values:
| Format | Range | Precision (LSB) | Typical use |
|---|---|---|---|
| Q0 | +/-2,147,483,648 | 1 | Plain signed integer counter |
| Q8 | +/-8,388,608 | ~0.0039 | Temperature, coarse percent |
| Q15 | +/-65,536 | ~0.0000305 | Common. Voltage, speed - anything that spans hundreds and thousands in real units |
| Q24 | +/-128 | ~0.0000000596 | Common. Near-unity values - per-unit currents, control errors, sin/cos tables |
| Q30 | +/-2 | ~0.000000000931 | Very high precision needed |
Rule of thumb: pick the largest Q that still leaves headroom above your value’s peak - that gives you the most precision without risking an overflow. Q15 and Q24 are the two formats you’ll see most often in shipped code. Q7 and Q8 show up less frequently, but they’re useful for packing 16-bit lookup tables that you later convert up to Q15 at runtime.
The Magic Is in Bit-Shifting
Notice that multiplying a number by 2 raised to the power of N is equivalent to bit-shifting left N times, and multiplying a number by 2 raised to the power of -N is equivalent to bit-shifting right N times. This is the foundation of how fixed-point math works. Bit-shifting typically takes only 1 clock cycle.
Below is a calculator I made to help others understand how to work with fixed-point numbers. Try entering 1.25 in Q24, then flip it to Q10. The real value is the same but the raw integer changes. Then try 65537 in Q15 - it returns an out of range message.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 20971520 = 20971520 / 2^24
Stored as: 1.250000000
Quant error: 0.000000%
Q24: range +/-128, LSB 5.96e-8
Raw int: 4194304 = 4194304 / 2^24
Stored as: 0.2500000000
Quant error: 0.000000%
Q24: range +/-128, LSB 5.96e-8
_IQ24(1.25) + _IQ24(0.25); // result Q24
Ideal (real): 1.50000000000
Raw int (Q24): 25165824
Total error vs real math: 0.000000%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
Addition and Subtraction
If both operands are in the same Q format, addition is just:
IQ15 result = a + b; // both Q15, result Q15
No shifts, no scale factors - adding two value x 2^15 quantities gives (value_a + value_b) x 2^15, which is still Q15. The 2^15 factors out. Just be careful of overflow by understanding the ranges of numbers that will be used.
You cannot directly add or subtract two values in different Q formats. a24 + b15 as raw ints gives you a garbage number - the two values are on different number lines, so adding the stored integers adds them at different scales.
Concretely, with a = 1.5 in Q24 and b = 0.25 in Q15:
| Real value | Stored integer | |
|---|---|---|
| a_q24 | 1.5 | 25,165,824 |
| b_q15 | 0.25 | 8,192 |
| raw sum | ??? | 25,174,016 |
Interpret that raw sum as Q24: 25,174,016 / 2^24 = 1.5005 - should be 1.75.
Interpret it as Q15: 25,174,016 / 2^15 = 768.19 - also not 1.75.
The bits add fine. They just don’t mean anything together until you align the scales first. The only way out is to convert one operand to match the other before the +/- operation:
IQ24 a_q24 = ...;
IQ15 b_q15 = ...;
IQ15 a_q15 = _IQ24toIQ15(a_q24); // convert, possibly losing precision
IQ15 sum = a_q15 + b_q15; // now both are Q15
Conversion itself has potential risks (left-shift can overflow if b15 was near the top of Q15’s range; right-shift loses LSB precision) so be conscious of the values you will be working with.
The calculator below locks B’s Q to match A’s when you pick + or -. There’s no meaningful “different Qs for add” case - either you align them first, or you have a software defect.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 1572864 = 1572864 / 2^20
Stored as: 1.500000000
Quant error: 0.000000%
Q20: range +/-2048, LSB 9.54e-7
Raw int: 262144 = 262144 / 2^20
Stored as: 0.2500000000
Quant error: 0.000000%
Q20: range +/-2048, LSB 9.54e-7
_IQ20(1.5) + _IQ20(0.25); // result Q20
Ideal (real): 1.75000000000
Raw int (Q20): 1835008
Total error vs real math: 0.000000%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
Multiplication
Multiplying two Q-format numbers produces a result whose Q format is the sum of the two input Q formats:
(a x 2^Qa) x (b x 2^Qb) = (a x b) x 2^(Qa + Qb)
If both operands are Q15, the product has 2^30 worth of scale - which overflows Int32 for any value near 1. You need a 64-bit intermediate, then a right-shift to land back in a sensible Q. That’s exactly why you would use the function _IQNmpy instead of the * operator:
IQ24 a = _IQ24(1.25); // = 20971520
IQ24 b = _IQ24(2.00); // = 33554432
IQ24 result = _IQ24mpy(a, b); // (20971520 x 33554432) >> 24 = 41943040 (= 2.5 in Q24)
For same-Q operands, _IQ24mpy shifts right by 24 and you land back in Q24. For different-Q operands, the shift still happens - the result’s Q ends up being Qa + Qb - N, where N is the number in _IQNmpy.
The Trick: Pick N to Land the Output in One of the Input Qs
Given a in Q24 and b in Q15, there are two N values that make sense and one to avoid:
| Call | Product scale | Shift | Result scale | Output Q | Matches |
|---|---|---|---|---|---|
| _IQ15mpy(a_Q24, b_Q15) | 2^39 | >> 15 | 2^24 | Q24 | A |
| _IQ24mpy(a_Q24, b_Q15) | 2^39 | >> 24 | 2^15 | Q15 | B |
| _IQ20mpy(a_Q24, b_Q15) | 2^39 | >> 20 | 2^19 | Q19 | neither - footgun |
The N in _IQNmpy is the shift count, not the output Q. Pick N = qA or N = qB and the result lands in one of your input formats. Pick anything else and you get a third Q you probably didn’t intend.
The calculator below picks the shift N directly so you can see the effect of every choice. The two quick-pick buttons give you the two standard moves - N = qB leaves the output in A’s Q, N = qA leaves it in B’s Q - and the C code updates to the matching _IQ{N}mpy(...) either way.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 20971520 = 20971520 / 2^24
Stored as: 1.250000000
Quant error: 0.000000%
Q24: range +/-128, LSB 5.96e-8
Raw int: 10923 = 10923 / 2^15
Stored as: 0.3333435059
Quant error: 0.0032%
Q15: range +/-65536, LSB 0.0000305
_IQ15mpy(_IQ24(1.25), _IQ15(0.333333)); // result Q24
Ideal (real): 0.416666250000
Raw int (Q24): 6990720
Total error vs real math: 0.0032%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
Flip N between 15 (output follows A, Q24) and 24 (output follows B, Q15) and watch the result error change. One of those probably gives you meaningfully better precision for this specific pair; the other is your overflow headroom. Push N to something neither qA nor qB and the calculator flags it - valid math, but you’ve landed in a Q you probably didn’t mean to.
Q0 x QN: Native Multiply Works
There’s a free simplification hiding in the shift formula. When one operand is Q0 - a plain integer count - the _IQNmpy shift collapses: N = 0 means no shift after the multiply, which means no helper function. The plain * operator does the right thing.
The canonical example is scaling a 12-bit ADC reading into a per-unit value for the control loop. Practically every embedded project that reads a sensor does some version of this:
IQ24 inv_4095 = _IQ24(1.0 / 4095.0); // precomputed, ~ 0.000244, Q24
// ... each sample:
uint16_t adc = read_adc(); // 12-bit count, naturally Q0
IQ24 adc_pu = (int32_t)adc * inv_4095; // Q0 x Q24 = Q24, no library call
Output Q = qA + qB - N = 0 + 24 - 0 = Q24, just from integer arithmetic on a signed-32 variable. The int32_t cast keeps the compiler from doing anything surprising with the uint16_t promotion.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 2874 = 2874 / 2^0
Stored as: 2874.000000
Quant error: 0.000000%
Q0: range +/-2147483648, LSB 1.00
Raw int: 4094 = 4094 / 2^24
Stored as: 0.0002440214157
Quant error: 0.0088%
Q24: range +/-128, LSB 5.96e-8
_IQ0mpy(_IQ0(2874), _IQ24(0.000244)); // result Q24
Ideal (real): 0.701256000000
Raw int (Q24): 11766156
Total error vs real math: 0.0088%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
The calculator shows 2874 counts (about 70% of full scale) normalizing to ~ 0.701 per-unit, tracking the ideal 2874/4095 to essentially machine precision. This pattern applies any time a raw integer measurement meets a fractional gain - ADCs, encoders, pulse counts, temperature sensor LSBs. Just check the worst-case product fits in Int32: 4095 x 4097 ~ 1.7x10^7 is well within range here. A near-unity Q24 gain (raw value up around 2^24) would overflow against a 12-bit count and you’d need to promote to 64-bit first.
Big x Small = Precision Loss (the Hidden Trap)
When neither operand is Q0 you lose the *-operator shortcut - and that’s where mixed-Q actually earns its keep. The natural counterpart to the ADC normalization above is the reverse trip: a per-unit value coming out of your control loop multiplied by a base scale to recover engineering units for display, logging, or a downstream peripheral.
Take a 1200 A motor drive. The per-unit current from the control loop is a near-unity value; the base scale is 1200 A itself. Completely different magnitudes, different Q formats forced on each:
i_pu (~0.001 per-unit) | base_current (1200 A) | |
|---|---|---|
| Format | Q24 | Q15 |
| Range | +/-128 (fits trivially) | +/-65,536 (fits easily) |
| LSB | ~6 x 10^-8 | ~30 uA |
| Why that Q | Tiny value needs resolution | 1200 A blows past Q24’s +/-128 ceiling |
IQ15 base_current = _IQ15(1200.0); // 1200 A base, Q15
// ... each sample, after the control loop:
IQ24 i_pu = /* per-unit current from loop, Q24 */;
IQ15 i_amps = _IQ24mpy(i_pu, base_current); // per-unit x amps = amps, Q15
_IQ24mpy shifts the product right by 24, landing the result at qA + qB - N = 24 + 15 - 24 = Q15 - matching the base scale’s format, which is what the display consumer expects.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 20636 = 20636 / 2^24
Stored as: 0.001230001450
Quant error: 0.000118%
Q24: range +/-128, LSB 5.96e-8
Raw int: 39321600 = 39321600 / 2^15
Stored as: 1200.000000
Quant error: 0.000000%
Q15: range +/-65536, LSB 0.0000305
_IQ24mpy(_IQ24(0.00123), _IQ15(1200)); // result Q15
Ideal (real): 1.47600000000
Raw int (Q15): 48365
Total error vs real math: 0.0012%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
The calculator is set up with a 0.00123 per-unit reading (Q24) times the 1200 A base (Q15), producing ~ 1.476 A for display. Now the precision story: if you’d stored the per-unit operand in Q15 instead of Q24 - the lazy “everything’s Q15” move - 0.00123 rounds to 40/32768 ~ 0.00122, roughly 0.7% error on the input before the multiply. Flip A’s Q from 24 to 15 in the calculator and the quant-error line on A and the total error on the result both jump by about 40x. Try 0.5 instead and both Q choices look equally clean - the precision hit isn’t uniform, it hurts most when the small operand is actually small.
This is the same rule the ADC case followed, just run backwards. The takeaway is consistent: pick the Q format for each operand based on its own magnitude. Small values (per-unit, errors, deviations) want high Q for resolution. Big values (base scales, rated magnitudes) want low Q for range. Pick N in _IQNmpy to land the product in whichever Q your downstream consumer expects.
Division
Division mirrors multiplication in its arithmetic: pre-shift the numerator left by N, then divide by the denominator. For same-Q operands the straightforward form is:
IQ24 quotient = _IQ24div(a, b); // (a << 24) / b, both in Q24, result Q24
For mixed-Q operands the output Q works out to qA + N - qB. Pick N arbitrarily and you get a format you didn’t intend: _IQ20div(_IQ24(1.25), _IQ15(0.333)) produces a result at scale 2^(24+20-15) = 2^29 = Q29, which is almost never the Q you meant to be working in.
Two patterns cover almost every division you’ll write. Pattern 1 is the workhorse - usually run once at startup. Pattern 2 is rarer but worth recognizing when you hit it.
Pattern 1: Per-Unit Conversion
The workhorse: normalize two real-unit quantities into a high-precision near-unity format. Ratio a value against a base so the result sits in [0, 1]-ish and downstream code can use Q24 without caring about the original engineering units.
Speed as a fraction of max speed. Current as a fraction of max current. Voltage as a fraction of nominal. In all of these, both the numerator and denominator naturally live in a lower-Q format (Q15 for RPM, amps, volts) but the ratio wants the extra precision of Q24.
Crucially, this is a one-time startup calc - usually against nameplate or configuration values - not something you’d do on live samples inside a control ISR (more on why below). You divide once, store the result, and multiply by it afterward.
Say your maximum readable speed is 1500 RPM and maps to per-unit 1.0. Your rated base speed is 1000 RPM, stored in Q15. You want that rated speed expressed in per-unit Q24:
// One-time startup calc: convert rated speed to per-unit in Q24
// Both operands Q15; N = 24 places the result in Q24
IQ24 base_speed_pu = _IQ24div(_IQ15(1000), _IQ15(1500)); // ~ 0.66667
Working the shift: _IQ24div pre-shifts the Q15 numerator left by 24 (scale 2^(15+24) = 2^39), then divides by the Q15 denominator (scale 2^15). The result scale is 2^(39 - 15) = 2^24 - exactly Q24. The formula holds: output Q = qA + N - qB = 15 + 24 - 15 = 24.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 32768000 = 32768000 / 2^15
Stored as: 1000.000000
Quant error: 0.000000%
Q15: range +/-65536, LSB 0.0000305
Raw int: 49152000 = 49152000 / 2^15
Stored as: 1500.000000
Quant error: 0.000000%
Q15: range +/-65536, LSB 0.0000305
_IQ24div(_IQ15(1000), _IQ15(1500)); // result Q24
Ideal (real): 0.666666666667
Raw int (Q24): 11184811
Total error vs real math: 0.000003%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
Push N up from the qB = 15 quick-pick default into the 20s and watch the output Q climb with it. N = 24 lifts the answer into Q24 where it belongs for a near-unity per-unit value, and the calculator tags it as a legitimate per-unit / scale-up move. N = 15 drops the result back to Q15 - fine for intermediate math but way too coarse here. The amber “heads up” only fires when N is below qB for division - the _IQ20div footgun above, where the output lands in a third Q format you didn’t intend.
Pattern 2: Q-Format Switch (Ratio in A’s Q)
Less common, but worth knowing. You have a numerator A in some Q format and a denominator B in a genuinely different non-trivial Q, and you want the quotient back in A’s Q so downstream math that consumes it stays in the same frame. Pick N = qB and output Q = qA + N - qB = qA - A’s format falls through. (If the denominator is Q0, you can skip the library call and use / directly, same way you use * for a Q0 multiply.)
The universal case is Ohm’s law with a small-magnitude resistance: I = V / R. V spans tens to hundreds of volts, so it lives in Q15. R for a motor winding, a brake resistor, a shunt, or a cable run is tens of milliohms - small enough that Q15’s 3x10^-5 LSB eats meaningful precision and you stash it in Q24 instead. The current you compute is an engineering-units amp value, and you want it back in Q15 where every other current in the system lives.
// Drive init, motor stopped - safe to divide:
IQ15 v_rated = _IQ15(24.0); // rated supply voltage, Q15
IQ24 r_stator = _IQ24(0.1); // 100 mOhm winding resistance, Q24
IQ15 i_locked = _IQ24div(v_rated, r_stator); // ~ 240 A locked-rotor current, Q15
// Used downstream as a fault threshold, soft-start ceiling, protection trip, etc.
_IQ24div pre-shifts the numerator by N = qB = 24 (matching the denominator’s Q), divides, and lands the result at qA + N - qB = 15 + 24 - 24 = 15 - Q15, same format as v_rated.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 786432 = 786432 / 2^15
Stored as: 24.00000000
Quant error: 0.000000%
Q15: range +/-65536, LSB 0.0000305
Raw int: 1677722 = 1677722 / 2^24
Stored as: 0.1000000238
Quant error: 0.000024%
Q24: range +/-128, LSB 5.96e-8
_IQ24div(_IQ15(24), _IQ24(0.1)); // result Q15
Ideal (real): 240.000000000
Raw int (Q15): 7864318
Total error vs real math: 0.000025%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
The same shape shows up in any startup calc that mixes a moderate-scale physical quantity (Q15) with a small coefficient or sensitivity (Q24):
- Braking-resistor current from DC bus voltage and a low-value brake R.
- Angular acceleration
alpha = T / Jfor a small-inertia motor (Q15 torque, Q24 J in kg*m^2). - Thermal rise
deltaT = P / G_thfor a highly-conductive heat path (Q15 power, Q24 conductance). - Flow rate through a low-impedance restriction in fluid-power systems.
Pattern 2 lives outside the hot loop - at init, on parameter updates, during safe-to-stall maintenance ops - and that’s no accident. Out there, the divide’s 20-70 cycle cost is free money. Inside a running control ISR, it is decidedly not.
But Inside the Loop, Division Is Expensive
A 10 kHz control ISR on a 150 MHz C28x has 15,000 cycles of budget per pass. An IQ multiply is ~7 cycles. An IQ divide is 20-70 cycles depending on implementation and data - a handful of live divides and you’ve burned a measurable chunk of your entire loop on operations you could have replaced with a multiply.
The fix is almost always: multiply by the reciprocal constant, computed once at compile time.
// Bad: runtime divide
speed_fraction = speed / 1000;
// Good: compile-time reciprocal, runtime multiply
#define INV_1000 _IQ24(0.001)
speed_fraction = _IQ24mpy(speed, INV_1000);
The _IQ24(0.001) macro is where the preprocessor earns its keep. _IQ24(x) is typically defined as ((IQ24) ((x) * 16777216.0)) - the compiler evaluates 0.001 x 2^24 at compile time and substitutes the resulting integer literal into your code. Zero runtime cost. You write readable C (_IQ24(0.001)) and the emitted instructions are identical to what you’d get from hand-typing 16777.
This is also why _IQ24(1.0/3.0) is safe: the preprocessor does the float division at compile time, you get a clean Int constant at runtime.
A Practical Example: The Speed Loop That Wouldn’t Tune
Early in the Venturi Buckeye Bullet 3 project - the Ohio State / Venturi EV that went after the FIA electric land-speed record - I wrote the outer speed loop for the permanent-magnet traction inverter. Setpoint, feedback, and error all as plain integers in RPM. Effectively everything was in Q0. The code compiled, the motor spun, and then it refused to stop hunting around the setpoint. About +/-2-3 RPM of lazy oscillation - 4 to 6 RPM peak-to-peak - that no amount of Kp / Ki tuning could damp out. I could hear it - the motor pulsed audibly on the dyno - and it was bad enough to corrupt every attempt to tune the inner current loop underneath.
The symptom looked like a tuning problem. It was a quantization problem.
At a 1500 RPM setpoint, one LSB of Q0 is 1 full RPM - and the cast from a fractional measurement to integer truncates. A real speed of 1500.4 RPM truncates to 1500. So does 1500.9. The measured error sits stubbornly at 0 across that entire RPM-wide window above setpoint, then steps to +1 only when the true speed crosses below 1500.0, then back to 0 the instant it climbs above. During the zero-error deadband the integrator holds while fractional error the controller can’t see keeps building. When the measured error finally flips to +/-1, the integrator corrects hard against a state that’s already wrong - the motor overshoots, the loop swings to the other side, and that’s the +/-2-3 RPM limit cycle I was hearing on the dyno.
You can see the quantization collapse directly. The calculator below does the same subtraction the loop was doing: 1500.4 - 1500 in Q0. The ideal answer is 0.4. The fixed-point answer is 0.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 1500 = 1500 / 2^0
Stored as: 1500.000000
Quant error: 0.0267%
Q0: range +/-2147483648, LSB 1.00
Raw int: 1500 = 1500 / 2^0
Stored as: 1500.000000
Quant error: 0.000000%
Q0: range +/-2147483648, LSB 1.00
_IQ0(1500.4) - _IQ0(1500); // result Q0
Ideal (real): 0.400000000000
Raw int (Q0): 0
Total error vs real math: 100.0000%- significant, check Q choices
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
The fix was converting the entire loop to per-unit in Q24: divide RPM by a base (max) speed once, store setpoint and feedback as fractions of that base, and do all loop math in Q24. One LSB of Q24 at a 1500 RPM base works out to 1500 x 2^-24 ~ 9x10^-5 RPM - about a ten-thousandth of an RPM of resolution on the state variable.
Same real measurement, expressed per-unit: 1500.4 / 1500 ~ 1.000267. Subtract the 1.0 per-unit setpoint and you recover the actual 0.000267 per-unit error - a clean continuous signal the PI controller can actually respond to.
Fixed-Point Math Calculator
Pick a Q format for each input. For multiply and divide, pick the shift count N - the output Q is computed from qA, qB, and N. Add and subtract require both operands in the same Q.
Raw int: 16781696 = 16781696 / 2^24
Stored as: 1.000267029
Quant error: 0.000003%
Q24: range +/-128, LSB 5.96e-8
Raw int: 16777216 = 16777216 / 2^24
Stored as: 1.000000000
Quant error: 0.000000%
Q24: range +/-128, LSB 5.96e-8
_IQ24(1.000267) - _IQ24(1.0); // result Q24
Ideal (real): 0.000267000000000
Raw int (Q24): 4480
Total error vs real math: 0.0108%
Uses signed-32-bit storage. Values above a format’s range clamp to Int32 bounds. Multiplication uses BigInt internally for full 64-bit precision before the right-shift back to the output Q.
After the conversion, the controller settled smoothly - no retuning required. Same poles, same gains (scaled once for the new units), same plant. Just a state representation with enough resolution for the error signal to actually exist.
The lesson: when you pick a Q format, don’t just ask “does my biggest value fit?” Ask “is one LSB of this format smaller than the smallest difference my loop needs to see?”
What This All Means in Practice
A few rules of thumb from shipped motor control code:
-
Pick the Q format per variable, not per project. Small values need high Q; large values need low Q. A codebase with “everything is Q24” will always either overflow or underflow somewhere.
-
Avoid division. If the denominator is known at compile time, replace with a multiply by its reciprocal. If it’s only known at startup, compute the reciprocal once and cache it. If it varies per sample, reconsider your algorithm or take the cycle hit.
-
Use the library macros for constants.
_IQ24(0.123)is always safer than((IQ24) 2063598)even if you do the math right, because the next reader of the code will understand instantly. -
Mixed-Q
_IQNmpyis your precision lever. If you find yourself losing precision in a particular multiply, the fix isn’t to use floats - it’s to pick the right combination of input Qs and_IQNmpyvariant. -
Keep the calculator one click away. Bookmark the Fixed-Point Math Calculator and pull it up whenever you’re picking a Q format for a new variable, sizing an
_IQNmpyshift, or double-checking a per-unit conversion. Checking a single Q choice in the OS calculator means tappingx^y, computingvalue x 2^N, rounding to an integer, confirming it fits in 32 bits, then redoing the arithmetic in reverse to get the quantization error - easily a minute per variable, longer if you mistype a power of two. The same check in this tool is a single keystroke: type the value, pick Q, read the raw integer, quantized value, and percent error live. Five seconds vs. a minute adds up fast over a few dozen variables in a real control loop.
If you’re doing fixed-point DSP on a real project and want a second set of eyes - or a full design review - let me know.