Avoid memory leaks (malloc w/o free). You can monitor memory usage using
htop
; press F6 and sort by VIRT
(virtual memory).
Avoid malloc
when possible. Especially, instead of allocating
structures on the heap, look for opportunities to use local variables
of struct
type to and return values of struct
type.
Avoid memory traffic.
- Instead of reading or writing through pointers, use local
variables, parameters, and results. (May require interface
changes or returning values of `struct` type.)
- Reduce memory traffic by taking advantage of pointer
arithmetic. (Example?)
Use fewer levels of indirection. Here's a representation of a 2-D array that uses two levels of indirection:
struct Array2_T {
int width, height;
char *elements;
};
To get to an array element 'i', you have to dereference two pointers and do address arithmetic twice (one address is static and is coded into the instruction; the other is dynamic):
a->elements[i] // same as *((*a).elements+i)
Instead, you can use a trick built into C99: the incomplete array field:
struct Array2_T {
int width, height;
char elements[];
};
This representation, although it uses exactly the same notation, requires only one pointer dereference and one arithmetic operation:
a->elements[i] // something like *(a + offset(elements) + i)
To allocate the first kind of array:
struct Array_T *p = malloc(sizeof(*p));
p->elements = malloc(n * sizeof(*p->elements));
To allocate the second kind of array:
struct Array_T *p = malloc(sizeof(*p) + n * sizeof(*p->elements));
The advantages:
One heap object, not two
Created by one malloc()
, not two
Element access with one load, not two
The disadvantage:
If you have very small functions, where the work of the function is
only a few instructions, avoid the function-call overhead by declaring
the function inline
. (This trick works only on function calls that
don't cross module boundaries. It's not worth violating modularity to
save just a few instructions.)
If one are more parameters are compile-time constants, it may be worth inlining a function in order to specialize it (meaning that much of the function's work can be done once, at compile time).
Know your function-call overheads:
- The overhead of calling a large function is much smaller than
the cost of executing the function body: if the function can't
be specialized, there's no point in inlining it, and inlining
will increase code size and make the instruction cache less
effective.
- The overhead of calling a very small function is comparable to
the cost of executing the function body: such a function might
be worth inlining.
- If a function can be specialized, it may be that the work of the
function body can be reduced by a significant factor:
specialization may make a function 5 or 10 times faster.
Inlining is needed to enable specialization, but to know whether
specialization looks worthwhile, you have to
analyze the situation carefully. If you want to specialize,
mark the function as `inline` and **let the compiler specialize it**.
Raise operations out of loops
- Loop-invariant computations (arithmetic, memory references, afunction
calls, and more) can be lifted out of the loop.
- Often you know that a computation is loop-invariant, but the
compiler doesn't know. In these cases, **assign the results of
loop-invariant computations to local variables**. For
example, if you write
for (int i = 0; i < a->width * a->height; i++) {
... function call that gets passed 'a' ...
}
the compiler doesn't know that `a->width` and `a->height` don't
change.
Rewrite the loop as
int w = a->width;
int h = a->height;
for (int i = 0; i < w * h; i++) { ... }
- If the compiler can identify a loop-invariant computation, **let
the compiler deal with it.** For example, in
int w = a->width;
int h = a->height;
for (int i = 0; i < w * h; i++) { ... }
there is nothing to be gained by fiddling with the loop
condition: the compiler can see *all* accesses to local
variables `w` and `h`, and it's *easy* for the compiler to hoist
the multiplication out of the loop. **Let the compiler do its job.**
- Calculations that are *partially* or *almost* loop-invariant can
often be made loop invariant by refactoring the code or
introducing a nested loop structure.