Why ABC was randomly crashing our FPGA CI (and the 20-year-old assert behind it)
April 2026
The hook
A 20-year-old assert in ABC was written for 32-bit systems, where it made some degree of sense. On 64-bit with address space layout randomization, it fires intermittently; just rarely enough to look like noise. We run our FPGA toolchain dozens to hundreds of times per day, so it got loud.
At my day job, I'm responsible for managing a custom FPGA toolchain. The toolchain is built on fully open-source tools:
- elaboration with the slang SystemVerilog parser (or GHDL for VHDL)
- wildebeest, a custom synthesis engine built on top of ABC and yosys
- VTR's VPR for place, route
- OpenSTA for timing
- and back to VTR's genfasm for bitstream generation.
As part of our day-to-day CI operation, we run our toolchain a lot. Hundreds of times per day on a slow day. This helps us catch if any architecture or CAD development we've done introduced bugs or quality regressions. It's stopped bugs in their tracks. But running code at that frequency has also exposed issues in our dependencies.
The catch
Starting late last year, every few weeks, our team would get a red 'X' in a CI run. This is not unusual when building an FPGA platform with a tiny team. Here's the error:
| 4.73.1. Extracting gate netlist of module `\----' to `<abc-temp-dir>/input.blif'.. |
| yosys-abc: src/opt/lpk/lpkCut.c:200: unsigned int* abc::Lpk_CutTruth(abc::Lpk_Man_t*, abc::Lpk_Cut_t*, int):Assertion `((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000' failed. |
| ERROR: ABC failed with status 86 |
Error 86 doesn't mean anything to me, but this is clearly a failed assert. After retriggering the job, I got a green check. A few weeks later, we hit a different failure on a completely different design and a different FPGA architecture.
I found a report of the same error in the yosys github issue tracked (here), closed seven years ago as irreproducible while remaining unsolved. *takes drag from cigarette*.
Let's take a moment to reason about the assert. It's at line 200:
| 175 | unsigned * Lpk_CutTruth( Lpk_Man_t * p, Lpk_Cut_t * pCut, int fInv ) |
| 176 | { |
| 177 | Hop_Man_t * pManHop = (Hop_Man_t *)p->pNtk->pManFunc; |
| 178 | Hop_Obj_t * pObjHop; |
| 179 | Abc_Obj_t * pObj = NULL; // Suppress "might be used uninitialized" |
| 180 | Abc_Obj_t * pFanin; |
| 181 | unsigned * pTruth = NULL; // Suppress "might be used uninitialized" |
| 182 | int i, k, iCount = 0; |
| 183 | // Lpk_NodePrintCut( p, pCut ); |
| 184 | assert( pCut->nNodes > 0 ); |
| 185 | |
| 186 | // initialize the leaves |
| 187 | Lpk_CutForEachLeaf( p->pNtk, pCut, pObj, i ) |
| 188 | pObj->pCopy = (Abc_Obj_t *)Vec_PtrEntry( p->vTtElems, fInv? pCut->nLeaves-1-i : i ); |
| 189 | |
| 190 | // construct truth table in the topological order |
| 191 | Lpk_CutForEachNodeReverse( p->pNtk, pCut, pObj, i ) |
| 192 | { |
| 193 | // get the local AIG |
| 194 | pObjHop = Hop_Regular((Hop_Obj_t *)pObj->pData); |
| 195 | // clean the data field of the nodes in the AIG subgraph |
| 196 | Hop_ObjCleanData_rec( pObjHop ); |
| 197 | // set the initial truth tables at the fanins |
| 198 | Abc_ObjForEachFanin( pObj, pFanin, k ) |
| 199 | { |
| 200 | assert( ((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000 ); |
| 201 | Hop_ManPi( pManHop, k )->pData = pFanin->pCopy; |
| 202 | } |
| 203 | // compute the truth table of internal nodes |
| 204 | pTruth = Lpk_CutTruth_rec( pManHop, pObjHop, pCut->nLeaves, p->vTtNodes, &iCount ); |
| 205 | if ( Hop_IsComplement((Hop_Obj_t *)pObj->pData) ) |
| 206 | Kit_TruthNot( pTruth, pTruth, pCut->nLeaves ); |
| 207 | // set the truth table at the node |
| 208 | pObj->pCopy = (Abc_Obj_t *)pTruth; |
| 209 | } |
| 210 | |
| 211 | // make sure direct truth table is stored elsewhere (assuming the first call for direct truth!!!) |
| 212 | if ( fInv == 0 ) |
| 213 | { |
| 214 | pTruth = (unsigned *)Vec_PtrEntry( p->vTtNodes, iCount++ ); |
| 215 | Kit_TruthCopy( pTruth, (unsigned *)(ABC_PTRUINT_T)pObj->pCopy, pCut->nLeaves ); |
| 216 | } |
| 217 | assert( iCount <= Vec_PtrSize(p->vTtNodes) ); |
| 218 | return pTruth; |
| 219 | } |
This code is part of the implementation of lutpack, an ABC command described in this paper.
Let's resist the temptation to get bogged down in the algorithmic details; all we need to know is that it is a useful command for FPGA synthesis.
I see two possibilities.
- This is a valid defense against corruption of program state:
pCopyshould only be holding a value ≥ 0x10000. - There's some old-school pointer math going on based on an out-of-date model of address spaces and no one's been hit by this often enough to want to fix it.
Let's break down the assert, and analyze it from the perspective of the second possibility.
assert( ((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000 );
pFaninis anAbc_Obj_t *, which itself holds anAbc_Obj_t *namedpCopy— not avoid *being interpreted as the wrong type, so we're off to a good start.- On a 64-bit system,
pFanin->pCopypoints to an 8-byte address. (ABC_PTRUINT_T)pFanin->pCopycasts totypedef unsigned long ABC_PTRUINT_T, which is also 8 bytes.(unsigned)(ABC_PTRUINT_T)pFanin->pCopythen truncates down to 4 bytes.
On 32-bit Linux, the user address space is typically 0x00000000-0xBFFFFFFF. The heap, stack, and mmap regions sit well above 0x10000 by convention; the bottom of the address space is left unmapped to catch null dereferences, and the text segment historically starts at 0x08048000.
So in practice, heap and mmap pointers have at least one of bits 16-31 set. The assert was almost certainly relying on that convention.
On 64-bit, wouldn't you know it, pointers are 64 bits wide. When you cast to unsigned (32 bits) you get the lower half, and that lower half
can be anything (including values below 0x10000) depending on where ASLR placed the mapping. The discrimination
completely breaks down.
The code was almost certainly written and tested on 32-bit, worked fine for years, then started failing intermittently
when the codebase moved to 64-bit systems with ASLR (which was around the time this line was last edited). Nobody noticed for a long time because it only fires when the
allocation happens to land at an address where the_lower32_bits < 0x10000 (0x0000-0xFFFF). With ~13 bits of brk entropy on Linux in page-sized steps, that works out to roughly 1 in 512 runs.
How do I replicate the error? Well, I could run either raw ABC or our synthesis tool in a tight loop in a debugger and hope for the worst, but that seems like more of a plan B. Instead of hiring a bunch of monkeys with typewriters, what if I got a really eloquent monkey?
The bait
I should probably write my own malloc to allocate memory at 4GB boundaries, which should trigger the assert with valid input.
So far this blog post has been free of AI. Unfortunately I'm going to need some assistance from an agent here. I know malloc, but I don't know how to trigger a specific memory allocation at a specific address. I imagine this is the sort of thing an infosec expert does on the daily, but I wear enough hats without the white or black one. So let's phone a friend.
| Claude: Generate a malloc implementation that will trigger the assert at lpkCut.c:200 and provide instructions for how to trigger it. |
Look at this gist for the full output. Here's a highlight:
| 63 | static void init(void) |
| 64 | { |
| 65 | bootstrapping = 1; |
| 66 | real_malloc = dlsym(RTLD_NEXT, "malloc"); |
| 67 | real_free = dlsym(RTLD_NEXT, "free"); |
| 68 | real_realloc = dlsym(RTLD_NEXT, "realloc"); |
| 69 | bootstrapping = 0; |
| 70 | |
| 71 | /* try several 4 GB-boundary + 4 KB addresses until one maps */ |
| 72 | uint64_t candidates[] = { |
| 73 | 0x100000000ULL, |
| 74 | 0x200000000ULL, |
| 75 | 0x300000000ULL, |
| 76 | 0x400000000ULL, |
| 77 | 0x500000000ULL, |
| 78 | }; |
| 79 | for (int i = 0; i < 5; i++) { |
| 80 | void *p = mmap((void *)candidates[i], BAD_REGION_SIZE, |
| 81 | PROT_READ | PROT_WRITE, |
| 82 | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE, |
| 83 | -1, 0); |
| 84 | if (p != MAP_FAILED) { |
| 85 | bad_region = p; |
| 86 | break; |
| 87 | } |
| 88 | } |
| 89 | |
| 90 | if (bad_region) { |
| 91 | unsigned lo32 = (unsigned)(uintptr_t)bad_region; |
| 92 | unsigned check = lo32 & 0xffff0000u; |
| 93 | fprintf(stderr, |
| 94 | "[trigger] bad_region @ %p lower32=0x%08x check=0x%08x -> %s\n", |
| 95 | bad_region, lo32, check, |
| 96 | check == 0 ? "assert will FIRE" : "BAD address choice"); |
| 97 | } else { |
| 98 | fprintf(stderr, "[trigger] WARNING: could not map bad region - bug won't trigger\n"); |
| 99 | } |
| 100 | } |
The file is a shim over the linked malloc. It can allocate exactly at 4 GB boundaries, which is exactly the range of allocations that have triggered the assert in CI! Then the hard part:
it intercepts exactly the right allocation, malloc(13416), for vTtElems, the leaf truth-table buffer. If the allocation was the same size as everything else,
I imagine we'd have to modify the allocation to a unique number. This allocation is also specific to the design I've been testing this on
(which happens to be named arbiter.blif). The shim lets the vTtElems allocation leak rather than free it: not a quirk of the agent, but forced by the design, as memory placed at a hand-picked address was never given out by the real allocator, so it can't be handed back to it.
The switch
My intuition was to just replace this with a null check. pCopy has a * type. Why would we still have to worry about an illegal address?
Well, because of this line in Lpk_NodeCutsCheckDsd:
| pObj->pCopy = (Abc_Obj_t *)(ABC_PTRUINT_T)i; |
I personally haven't seen the integer-stuffed-into-pointer-field case cause an assert, but this is clearly what the assert is safeguarding against. We're assigning an int (cast to a pointer)
to pCopy.
While I've only hit the assert in the legal-pointer-cast-down case, it is protecting against undesired behaviour.
Here's my final offer:
assert((ABC_PTRUINT_T)pFanin->pCopy > 0xFFFF ); // catch small int values or NULL |
The original assert wasn't far off from this, but it had growing pains when the world moved to 64-bit. The inequality also makes the intent a bit clearer vs the bitwise '&'.
As an added security measure, I recommend building ABC with -DNDEBUG to strip asserts from builds deployed to customers, who shouldn't ever have to look at asserts.
The release
I opened a PR for ABC, and it got merged with no notes! There was one other line in the same file which was exposed to the same behaviour .
Open-source EDA projects need all the help they can get! I've been working on an ABC regression test framework, and this is the perfect time to open-source it. It's hosted at abc-1212 and runs CI daily. It's still in development, and needs more tests! If you know anything about ABC and want to add tracking for your use case, please file an issue or pull-request some regression tests of your own.