Not an easy thing to count globally. Your GC cost (at least in mark-sweep) is dependent on the number of objects you're not deallocating. The rest are implicitly destroyed.
You can't decouple the cost of malloc() from the cost of free(); if malloc has to do any more work than simply bumping a pointer (or, in the general case, grabbing a mutex and then bumping a pointer), it's a concession to free() (or to defragmentation, a side effect of free).
Semispace (and generational) collectors do decouple the cost of allocation from the cost of collection by only considering the set of live objects. They also compact (defragment) as a side effect of collection.
In the case of a minor collection, you don't do a full mark-sweep. You only look at a small fraction of the live objects. Generally you only need to trace objects in the youngest generation which are reachable from GC roots. Since you expect most young generation objects to be eligible for collection, this is extremely fast.
It is unusual for a lot of references to exist from older generations to the youngest generation, but when this occurs there are data structures for detecting and resolving these references very efficiently.