16x Faster Icospheres with AI-Assisted Optimization

A while back I wrote fast_icosphere, a C++ library for generating icospheres with fine-grained control over the number of faces. Most icosphere implementations work by recursive subdivision: start with an icosahedron and repeatedly split each triangle into four. That gives you 20, 80, 320, 1280 faces and so on, always powers of four times 20. If you want something in between, you’re out of luck.

My approach is different. Instead of halving edges recursively, I place n additional vertices along each edge of the base icosahedron, then fill each original triangle face with a grid of smaller triangles. This gives you 20 * (n+1)^2 faces for any value of n, which means you can dial in exactly the resolution you need rather than jumping between powers of four.

The algorithm was correct but not particularly fast. I recently sat down with Claude and asked it to optimize the implementation while keeping the same algorithmic structure. The results were better than I expected.

What changed

Six targeted optimizations, none of them algorithmic:

1. std::map to std::unordered_map with a custom hash. The original code used std::map for edge key lookups, which gives O(log n) per lookup. Switching to std::unordered_map with a Szudzik pairing function as the hash drops this to O(1). Edge lookups happen constantly during subdivision, so this alone made a big difference.

2. Fixed-size arrays for faces. Every triangle face is exactly three indices. The original code stored these as std::vector<size_t>, which means a heap allocation per face. Replacing with std::array<size_t, 3> eliminates millions of small allocations at higher subdivision levels.

3. Removed redundant position data from edges. The edge map used to store full vertex positions alongside indices. Now it stores only vertex indices, and positions are looked up from the vertex array when needed. Less memory, fewer copies.

4. Eliminated copies in the hot loop. FillIcoFace was the most expensive function, and it was copying vectors on every call. The refactored version uses const auto& references for edge data and std::swap on row buffers instead of reallocating.

5. Inlined normalization. The original code had separate Norm() and Normalize() helper functions that each did three divisions. Now it’s a single 1.0 / std::sqrt(...) multiply, inlined at the call site. Three multiplications instead of three divisions, and no function call overhead.

6. Added missing #include <array>. Not a performance fix, but the code wouldn’t compile on modern GCC without it. Good to have.

Benchmarks

Compiled with g++ -O2 -std=c++17, times averaged over multiple runs:

n	Vertices	Before (ms)	After (ms)	Speedup
10	1,212	0.22	0.02	11x
50	26,012	5.07	0.34	15x
100	102,012	19.87	1.28	16x
200	404,012	82.04	9.28	9x
500	2,510,012	511.10	80.01	6x

The sweet spot is around n=100, where the speedup peaks at 16x. At larger sizes the gains taper off, likely because memory bandwidth starts to dominate over the per-element overhead that the optimizations targeted.

What this tells me about AI-assisted optimization

None of these changes are conceptually difficult. Replacing std::map with std::unordered_map is textbook advice. Using std::array for fixed-size data is obvious in hindsight. Removing unnecessary copies is basic performance hygiene.

But I hadn’t done any of it. The code worked, it was correct, and I moved on. That’s the gap AI fills well: the difference between what you know you should do and what you actually get around to doing. Claude looked at the code, identified six concrete improvements, and applied them in a way that preserved the algorithm exactly. No clever tricks, just disciplined application of things any C++ programmer knows but doesn’t always prioritize.

The speedup is real and useful. If you’re generating icospheres at runtime, going from 20ms to 1.3ms at n=100 matters. And the code is arguably cleaner now too, with less redundant data and fewer allocations to reason about.

If you want to play with the icosphere interactively, check out the polyscope fork which has visualization set up. The library itself is a single header and source file, easy to drop into any project.