Understanding Pointers
The following is an in-depth, step by step method of understanding pointers, along with their core concepts.
Intro
For many beginners in programming and software engineering, pointers can prove a difficult idea to grasp. Memory, addresses, dereferencing, these terms on their own have tricky definitions. In part, because they are usually not taught with practical examples. Here, we will explore the fine details of pointers, memory, and their true powers in programming.
The main programming language used here will be C. It’s the most effective for learning pointers as it let’s you focus on raw pointers, not hiding anything behind the scenes. Additionally, C allows more in-depth exploration of the different memory types relevant for pointers. new
and delete
hide some details of the way pointers can access memory in C++
.
Fundamentals
In C, all variables have both a type and a value. Every type has some specific size. Every type’s size in C, C++, and similar languages is defined in terms of bytes. A byte is for nearly all cases, an unsigned 8-bit integer. Below are some example C types:
1 | int a = 5; |
These types represent different varieties of data, and also have different sizes. In C and C++, you can determine the size of a value by using the sizeof
operator. The size of any type never changes. Let’s look at sizes of common C types by printing their respective sizeof
calls.
1 | printf("sizeof int is %lu\n", sizeof(int)); |
Which will print:
1 | sizeof int is 4 |
Note: When a variable is passed to sizeof
, it’s size is inferred from it’s type. Calling sizeof
on a type or a variable is identical in behavior.
Next, let’s talk about memory.
Memory
Previously, we discussed types, and their sizes. All values, data, and variables, whatever you want to name them, are stored in memory. For the purposes of this article, let’s think of memory as a finite collection of bytes. In C, there are 3 main types of memory: static, stack, and heap memory.
stack: Memory used within a function’s call frame. It only exist while the stack frame of a function has not yet hit a return statement.
heap: Memory allocated from functions like malloc, calloc, or realloc. This memory exists until it is deallocated with free.
static: Memory used by declaring a variable static. This type of memory lives for the entire time the program is running, and usually is located in the data segment of the executable from the C program being run.
Unlike stack memory, heap memory is never freed automatically during the runtime of the program. C++ classes have destructors or smart pointers which can do this, but C does not. Abandoning the last pointer to a chunk of heap memory without freeing it results in a memory leak, a situation where allocated memory on the heap can no longer be reached in order to be freed.
Below are examples of pointers to different types of memory:
1 | static int foo = 6; |
Here, foo
is declared as static
, rendering it static memory. Although we are not inside a function body, assume that doo
is a stack allocated variable and not a global variable. The pointer static_ptr
is initialized via using the addressof operator, &
. Yet, heap_ptr
is not. That’s because the function malloc
already returns a pointer, not a value. The addressof operator is used to determine the address of a value. Specifically, these are values that are not lvalues. For now, let’s talk about addresses.
Pointers
Now that we have discussed some fundamentals of C sizes and data types, let’s dive into the main topic, pointers.
Addresses
All bytes in memory, be it stack memory, static memory, or heap memory carry some address. An address is a unique, integer location of where a value currently exists in memory. Much like real residental and business addresses, each address is unique, and can tell us how to find where someone lives, or where some business operates. Similarly, values and data can move between addresses, if a program reads and writes them accordingly. Just as people move to new residences, and so on.
Unlike real people though, values in C, and related languages can exist without an address. This distinction is known as lvalues and rvalues. An lvalue in C is a value that’s normally found on the left side of an assignment statement =
. While an rvalue is normally found on the right side. For example,
1 | int x = 4; |
This program will print several messages to stdout
like so:
1 | a is now 10 |
In this program, a stack-allocated integer, a
, is being continuously added to by a pointer to it’s address, b
. The pointer manages this by indirection. It uses the unary *
operator to create an lvalue that allows binary operations like addition, in this case. Furthermore, it is always true that, a == *b
. Thus, the statements a += 5;
and *b += 5
are equivalent.
Pointer Arithmetic
Pointers can be modified via arithmetic. Previously, we have seen pointers moved with the increment and decrement operators, ++
and --
respectively. However, pointers can also be modified via relative expressions. In one of the first sections, we explored pointers represent different sizes of data.
When pointers are expressed via arithmetic, they are expressed in respect to the size they represent. For example:
1 |
|
The program above iterates through on array of integers, printing the address of each integer and it’s value. This example also brings up another concept, the similarity between arrays and pointers. For now though, let’s break down what’s going on:
1 | while(b < bin + 4) |
In this while
condition, we are comparing the addresses of pointers, as well as expressing a pointer four addresses forward. In memory, addresses are sequential, and increase as each successive stack frame is created. The <
operator for pointers, and similar logical operators, act in respect to the size of data the pointer points to. We can see this in the output of the program:
1 | the value of b is 0, while the address is 0x7ffeefbff560 |