Sometimes - there's multiple types of execution units in a CPU core (even multip...

Sometimes - there's multiple types of execution units in a CPU core (even multiples of the same type), and a thread can dispatch to multiple units at once (superscalar execution). It can also reorder the instruction stream to keep all the units occupied (out-of-order execution), preemptively execute along the most likely direction a branch will take (speculative execution), etc.

Basically, it's all a massive game to keep all the units of a core busy to execute the desired instruction stream as fast as possible. Over time, successive CPU architectures have gotten better at playing the game: better occupancy, more execution units, and more powerful units (SSE, AVX, etc), which translates into a greater number of instructions executed per clock cycle (IPC).

That's why a Skylake is much faster than a Pentium 4, even though the P4 might run at a higher clockrate. The Skylake has better IPC.

And as a side note: what Hyperthreading does is duplicate the part of the core that manages registers and instruction dispatch for a thread. So you have a second thread that can utilize any execution units that the first thread left unoccupied.

Bulldozer works somewhat similarly: two threads share a single core, and each core has a pair of integer execution units but they share a floating-point unit. So kinda like a Super-Hyperthreading, where they include a duplicate of (what they hope is) the most needed execution unit. Doesn't always work out in reality though.

https://en.wikipedia.org/wiki/Instructions_per_cycle

https://en.wikipedia.org/wiki/Superscalar_processor

https://en.wikipedia.org/wiki/Out-of-order_execution

https://en.wikipedia.org/wiki/Speculative_execution

https://en.wikipedia.org/wiki/Hyper-threading

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)