Why ABC was randomly crashing our FPGA CI

Why ABC was randomly crashing our FPGA CI (and the 20-year-old assert behind it)

April 2026

The hook

A 20-year-old assert in ABC was written for 32-bit systems, where it made some degree of sense. On 64-bit with address space layout randomization, it fires intermittently; just rarely enough to look like noise. We run our FPGA toolchain dozens to hundreds of times per day, so it got loud.

At my day job, I'm responsible for managing a custom FPGA toolchain. The toolchain is built on fully open-source tools:

elaboration with the slang SystemVerilog parser (or GHDL for VHDL)
wildebeest, a custom synthesis engine built on top of ABC and yosys
VTR's VPR for place, route
OpenSTA for timing
and back to VTR's genfasm for bitstream generation.

As part of our day-to-day CI operation, we run our toolchain a lot. Hundreds of times per day on a slow day. This helps us catch if any architecture or CAD development we've done introduced bugs or quality regressions. It's stopped bugs in their tracks. But running code at that frequency has also exposed issues in our dependencies.

The catch

Starting late last year, every few weeks, our team would get a red 'X' in a CI run. This is not unusual when building an FPGA platform with a tiny team. Here's the error:

4.73.1. Extracting gate netlist of module `\----' to `<abc-temp-dir>/input.blif'..

yosys-abc: src/opt/lpk/lpkCut.c:200: unsigned int* abc::Lpk_CutTruth(abc::Lpk_Man_t*, abc::Lpk_Cut_t*, int):Assertion `((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000' failed.

ERROR: ABC failed with status 86

Error 86 doesn't mean anything to me, but this is clearly a failed assert. After retriggering the job, I got a green check. A few weeks later, we hit a different failure on a completely different design and a different FPGA architecture.

I found a report of the same error in the yosys github issue tracked (here), closed seven years ago as irreproducible while remaining unsolved. *takes drag from cigarette*.

Let's take a moment to reason about the assert. It's at line 200:

175	unsigned * Lpk_CutTruth( Lpk_Man_t * p, Lpk_Cut_t * pCut, int fInv )
176	{
177	Hop_Man_t * pManHop = (Hop_Man_t *)p->pNtk->pManFunc;
178	Hop_Obj_t * pObjHop;
179	Abc_Obj_t * pObj = NULL; // Suppress "might be used uninitialized"
180	Abc_Obj_t * pFanin;
181	unsigned * pTruth = NULL; // Suppress "might be used uninitialized"
182	int i, k, iCount = 0;
183	// Lpk_NodePrintCut( p, pCut );
184	assert( pCut->nNodes > 0 );
185
186	// initialize the leaves
187	Lpk_CutForEachLeaf( p->pNtk, pCut, pObj, i )
188	pObj->pCopy = (Abc_Obj_t *)Vec_PtrEntry( p->vTtElems, fInv? pCut->nLeaves-1-i : i );
189
190	// construct truth table in the topological order
191	Lpk_CutForEachNodeReverse( p->pNtk, pCut, pObj, i )
192	{
193	// get the local AIG
194	pObjHop = Hop_Regular((Hop_Obj_t *)pObj->pData);
195	// clean the data field of the nodes in the AIG subgraph
196	Hop_ObjCleanData_rec( pObjHop );
197	// set the initial truth tables at the fanins
198	Abc_ObjForEachFanin( pObj, pFanin, k )
199	{
200	assert( ((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000 );
201	Hop_ManPi( pManHop, k )->pData = pFanin->pCopy;
202	}
203	// compute the truth table of internal nodes
204	pTruth = Lpk_CutTruth_rec( pManHop, pObjHop, pCut->nLeaves, p->vTtNodes, &iCount );
205	if ( Hop_IsComplement((Hop_Obj_t *)pObj->pData) )
206	Kit_TruthNot( pTruth, pTruth, pCut->nLeaves );
207	// set the truth table at the node
208	pObj->pCopy = (Abc_Obj_t *)pTruth;
209	}
210
211	// make sure direct truth table is stored elsewhere (assuming the first call for direct truth!!!)
212	if ( fInv == 0 )
213	{
214	pTruth = (unsigned *)Vec_PtrEntry( p->vTtNodes, iCount++ );
215	Kit_TruthCopy( pTruth, (unsigned *)(ABC_PTRUINT_T)pObj->pCopy, pCut->nLeaves );
216	}
217	assert( iCount <= Vec_PtrSize(p->vTtNodes) );
218	return pTruth;
219	}

This code is part of the implementation of lutpack, an ABC command described in this paper. Let's resist the temptation to get bogged down in the algorithmic details; all we need to know is that it is a useful command for FPGA synthesis.

I see two possibilities.

This is a valid defense against corruption of program state: pCopy should only be holding a value ≥ 0x10000.
There's some old-school pointer math going on based on an out-of-date model of address spaces and no one's been hit by this often enough to want to fix it.

Let's break down the assert, and analyze it from the perspective of the second possibility.

assert( ((unsigned)(ABC_PTRUINT_T)pFanin->pCopy) & 0xffff0000 );

pFanin is an Abc_Obj_t *, which itself holds an Abc_Obj_t * named pCopy — not a void * being interpreted as the wrong type, so we're off to a good start.
On a 64-bit system, pFanin->pCopy points to an 8-byte address.
(ABC_PTRUINT_T)pFanin->pCopy casts to typedef unsigned long ABC_PTRUINT_T, which is also 8 bytes.
(unsigned)(ABC_PTRUINT_T)pFanin->pCopy then truncates down to 4 bytes.

On 32-bit Linux, the user address space is typically 0x00000000-0xBFFFFFFF. The heap, stack, and mmap regions sit well above 0x10000 by convention; the bottom of the address space is left unmapped to catch null dereferences, and the text segment historically starts at 0x08048000. So in practice, heap and mmap pointers have at least one of bits 16-31 set. The assert was almost certainly relying on that convention.

On 64-bit, wouldn't you know it, pointers are 64 bits wide. When you cast to unsigned (32 bits) you get the lower half, and that lower half can be anything (including values below 0x10000) depending on where ASLR placed the mapping. The discrimination completely breaks down.

The code was almost certainly written and tested on 32-bit, worked fine for years, then started failing intermittently when the codebase moved to 64-bit systems with ASLR (which was around the time this line was last edited). Nobody noticed for a long time because it only fires when the allocation happens to land at an address where the_lower32_bits < 0x10000 (0x0000-0xFFFF). With ~13 bits of brk entropy on Linux in page-sized steps, that works out to roughly 1 in 512 runs.

How do I replicate the error? Well, I could run either raw ABC or our synthesis tool in a tight loop in a debugger and hope for the worst, but that seems like more of a plan B. Instead of hiring a bunch of monkeys with typewriters, what if I got a really eloquent monkey?

The bait

I should probably write my own malloc to allocate memory at 4GB boundaries, which should trigger the assert with valid input.

So far this blog post has been free of AI. Unfortunately I'm going to need some assistance from an agent here. I know malloc, but I don't know how to trigger a specific memory allocation at a specific address. I imagine this is the sort of thing an infosec expert does on the daily, but I wear enough hats without the white or black one. So let's phone a friend.

Claude: Generate a malloc implementation that will trigger the assert at lpkCut.c:200 and provide instructions for how to trigger it.

Look at this gist for the full output. Here's a highlight:

63	static void init(void)
64	{
65	bootstrapping = 1;
66	real_malloc = dlsym(RTLD_NEXT, "malloc");
67	real_free = dlsym(RTLD_NEXT, "free");
68	real_realloc = dlsym(RTLD_NEXT, "realloc");
69	bootstrapping = 0;
70
71	/* try several 4 GB-boundary + 4 KB addresses until one maps */
72	uint64_t candidates[] = {
73	0x100000000ULL,
74	0x200000000ULL,
75	0x300000000ULL,
76	0x400000000ULL,
77	0x500000000ULL,
78	};
79	for (int i = 0; i < 5; i++) {
80	void p = mmap((void )candidates[i], BAD_REGION_SIZE,
81	PROT_READ \| PROT_WRITE,
82	MAP_PRIVATE \| MAP_ANONYMOUS \| MAP_FIXED_NOREPLACE,
83	-1, 0);
84	if (p != MAP_FAILED) {
85	bad_region = p;
86	break;
87	}
88	}
89
90	if (bad_region) {
91	unsigned lo32 = (unsigned)(uintptr_t)bad_region;
92	unsigned check = lo32 & 0xffff0000u;
93	fprintf(stderr,
94	"[trigger] bad_region @ %p lower32=0x%08x check=0x%08x -> %s\n",
95	bad_region, lo32, check,
96	check == 0 ? "assert will FIRE" : "BAD address choice");
97	} else {
98	fprintf(stderr, "[trigger] WARNING: could not map bad region - bug won't trigger\n");
99	}
100	}

The file is a shim over the linked malloc. It can allocate exactly at 4 GB boundaries, which is exactly the range of allocations that have triggered the assert in CI! Then the hard part: it intercepts exactly the right allocation, malloc(13416), for vTtElems, the leaf truth-table buffer. If the allocation was the same size as everything else, I imagine we'd have to modify the allocation to a unique number. This allocation is also specific to the design I've been testing this on (which happens to be named arbiter.blif). The shim lets the vTtElems allocation leak rather than free it: not a quirk of the agent, but forced by the design, as memory placed at a hand-picked address was never given out by the real allocator, so it can't be handed back to it.

The switch

My intuition was to just replace this with a null check. pCopy has a * type. Why would we still have to worry about an illegal address? Well, because of this line in Lpk_NodeCutsCheckDsd:

pObj->pCopy = (Abc_Obj_t *)(ABC_PTRUINT_T)i;

I personally haven't seen the integer-stuffed-into-pointer-field case cause an assert, but this is clearly what the assert is safeguarding against. We're assigning an int (cast to a pointer) to pCopy. While I've only hit the assert in the legal-pointer-cast-down case, it is protecting against undesired behaviour. Here's my final offer:

assert((ABC_PTRUINT_T)pFanin->pCopy > 0xFFFF ); // catch small int values or NULL

The original assert wasn't far off from this, but it had growing pains when the world moved to 64-bit. The inequality also makes the intent a bit clearer vs the bitwise '&'.

As an added security measure, I recommend building ABC with -DNDEBUG to strip asserts from builds deployed to customers, who shouldn't ever have to look at asserts.

The release

I opened a PR for ABC, and it got merged with no notes! There was one other line in the same file which was exposed to the same behaviour .

Open-source EDA projects need all the help they can get! I've been working on an ABC regression test framework, and this is the perfect time to open-source it. It's hosted at abc-1212 and runs CI daily. It's still in development, and needs more tests! If you know anything about ABC and want to add tracking for your use case, please file an issue or pull-request some regression tests of your own.