1
PROBLEMS:
PROBLEMS:
NR→N10
Give the entire count sequence of binary patterns for 3-bit unsigned integers.
What is the decimal value represented by the 8-bit binary number 11001001.01012 when interpreted as:
For embedded programmers the most important improvement to C in the C99 standards is the new
stdint.h
header file.
Computer programmers don't always care how wide an integer is when held by the processor. For example, when we write:
for (int i=0; i < N; i++) { ... }
we generally expect our compiler to generate the most efficient code possible, whether that makes the loop counter an 8-, 16-, 32-, or even 64-bit quantity.
So long as it's wide enough to hold the maximum value, in the above case N, we'd like the most efficient use of the processor to be made. And, so, that's precisely what the ISO C and C++ standards tell the compiler writer to do: choose the most efficient integer size that will fulfill the specific request. Because of the variable size of integers on different processors and the corresponding flexibility of the language standards, the code above may result in a 32-bit integer with one compiler but a 16-bit integer with another—possibly even if the very same processor is targeted!
But in many other programming situations, integer size matters. Embedded programming, in particular, often involves considerable manipulation of integer data of fixed widths. The most obvious example is the use of memory-mapped I/O to read and write peripheral control and status registers.
In the process of using memory-mapped I/O, it's common to write code like that in Listing 1. In this example we first declare a struct
that represents the registers in a timer/counter chip. We then declare a pointer to a data structure of that type, and initialize it to point at the memory address assigned (by the hardware designer) to the counter chip. After that setup, we can use the pointer to read and write the registers within the chip.
typedef struct
{
unsigned int count; // Current count register; offset 0x00.
unsigned int max; // Maximum count register; offset 0x02.
unsigned int _reserved; // Unused 16-bit register; offset 0x04.
unsigned int flags; // Control/status register; offset 0x06.
} Counter;
Counter volatile * const pCounter = 0x10000000;
// Chip base address.
// ...
pCounter->max = 5000; // Count from 0 to 5000, then interrupt.
pCounter->flags |= GO; // Start the timer.
//...
if (pCounter->flags &= DONE) {
// ...
}
// ...
The obvious advantage of implementing memory-mapped I/O this way is that the compiler automates the calculation of the offsets of the individual registers within the chip. The compiler also automatically adds the offsets of a register to the base address. These things are done at compile time, thus a readable line of code like:
pCounter->max = 5000;
will be executed just as efficiently as the more cryptic:
*( (unsigned int *)0x10000002 ) = 5000;
But what if you port this code to a new compiler or target processor? Will the new compiler agree with the old one that those unsigned int
are two bytes wide and calculate the offsets appropriately? How can you tell the new compiler more precisely what you want? Other integer names like short
and long
won't help, since the sizes of those types are also left up to the compiler writers—subject only to a small set of restrictions relative to the size chosen for int
.
This ambiguity is not a new problem and solutions have been around for decades. The approach generally taken is to define a new data type for each fixed-width integer you plan to use in your programs. For example, we might define:
typedef unsigned int uint16;
in a header file and then declare each of the registers in the struct in Listing 1 as uint16
. By using char
, short
, and long
and compiler-specific knowledge, you can define both signed and unsigned 8-, 16-, and 32-bit integer data types. And if the compiler or target processor does later change, you'll need to modify only the typedef
s to adjust all of the fixed-integer size requests throughout what might be a large set of source files.
This is a good solution and widely used. The problem, however, has always been the lack of agreement on the names for these fixed-width integer typedef
s. To date, I've seen all of the following names used for signed 16-bit integers in production code and coding standards: INT16
, int16
, INT16S
, and INT2
, the latter scheme placing the emphasis on the number of bytes rather than bits. I'm sure other names are in use elsewhere.
In hindsight, it sure would've been nice if the authors of the C standard had defined some standard names and made compiler providers responsible for providing the appropriate typedef
for each fixed-size integer type in a library header file. Alternatively, the C standard could have specified (as Java does) that each of the types short
, int
, and long
has a standard width on all platforms; but that could've had an impact on performance, particularly on 8-bit processors that must implement 16- and 32-bit additions in multi- instruction sequences.
Interestingly, it turns out the 1999 update to the International Organization for Standardization's (ISO) C standard (hereafter "C99") did just that. The ISO has finally put the weight of its standard behind a preferred set of names for signed and unsigned fixed-size integer data types. The newly defined type names are:
bit | signed | unsigned |
---|---|---|
8-bit | int8_t |
uint8_t |
16-bit | int16_t |
uint16_t |
32-bit | int32_t |
uint32_t |
64-bit | int64_t |
uint64_t |
According to the updated standard, this required set of typedef
s (along with some others) is to be defined by compiler vendors and included in the new header file stdint.h
.
If you're already using a C99-compliant compiler, this new language feature makes declaring each required fixed-width integer variable or register definition as straightforward as using one of the new type names.
Even if you don't have an updated compiler, the inclusion of these names in the C99 standard suggests it's time to update your coding standards and practices. Love them or hate them, at least these new names are part of an accepted international standard. As a direct result, it'll be far easier in the future to port C programs that require fixed-width integers to other compilers and target platforms. In addition, modules that are reused or sold with source can be more easily understood when they conform to standard naming and typing conventions like those in C99.
Of course, if you don't have a C99-compliant compiler yet you'll still have to write your own set of typedef
s, using compiler-specific knowledge of the char
, short
, and long
primitive widths. I recommend putting these typedef
s in a header file of your own design and adding the anonymous union declaration shown in Listing 2 to a linked source module to check their sizes; that is, to gently "remind" whomever might someday have the task of porting your code.
static union
{
char int8_t_incorrect [sizeof( int8_t)==1];
char uint8_t_incorrect [sizeof( uint8_t)==1];
char int16_t_incorrect [sizeof( int16_t)==1];
char uint16_t_incorrect [sizeof( uint16_t)==1];
char int32_t_incorrect [sizeof( int32_t)==1];
char uint32_t_incorrect [sizeof( uint32_t)==1];
};
The new C99 standard includes a number of minor improvements to the C programming language. Among these, the most important for embedded programmers to understand are the contents of the new stdint.h
header file. There's a lot more in the C99 standard than I've had room to mention in this brief article. Further research is left as an exercise for the reader.
There are a couple of ways of doing 2's complement multiplication by hand. Neither is actually used by the circuitry of a computer because there are more efficient (and more complex) algorithms for hardware implementation.
First, recall that multiplying one N-bit number by another N-bit number will create a product of 2N bits.
Method #1: First, sign-extend both operands from N to 2N bits and then perform normal binary multiplication. Use the least-significant 2N bits of the result and discard the rest.
Method #2: Never multiply by a negative value, and always sign-extend the partial products:
Now apply this to the problem we had in class
−110×−210
Actually, if you didn't notice, there's a problem here! The first partial product is 10, which when sign-extended should be 1110, not 0010. We know that 10 is supposed to be +2, but there's no way the computer would know because it just thinks that it's a 2-bit 2's complement number. The reality here is that +2 cannot be represented as a 2-bit 2's complement number, and so this method will not work in this case.
Consider the following C declaration:
char a[4];
This declares that the identifier a
represents an object, which is an array. Arrays are simply collections of other objects, and in this example, those objects are chars. Thus the "attribute list" that describes the object a
is:
a
is an array of (4) <chars>.
The word "array" is underlined to emphasize that the object 'a' is first understood to be an array, even before we worry about the objects it contains. The text in italics and surrounded by angle brackets describes the kind of objects held within the array a
.
The dimension is enclosed in parentheses since it really doesn't tell us much about the fundamental organization of the data, but only measures its extent. However, the dimension will be important later when we need to determine the "size
" of a
.
Applying this to declarations of arrays of two or more dimensions, such as:
char b[2][4] ;
char c[3][2][4] ;
we can write the corresponding attribute lists as:
b
is an array of (2) <arrays of (4) chars>.
c
is an array of (3) <arrays of (2) arrays of (4) chars>.
First, note that b
and c
are both arrays of objects, just like a
above. The only difference among a
, b
, and c
is the type of objects that these arrays contain:
The objects in
a
are <chars>.The objects in
b
are |<arrays of (4) chars>.The objects in
c
are <arrays of (2) arrays of (4) chars>.
We may employ typedefs to emphasize this "array of objects" concept shared by a
, b
, and c
:
typedef char A_OBJECTS;
typedef char B_OBJECTS[4];
typedef char C_OBJECTS[2][4];
A_OBJECTS a[4]; // 'a' is an array of (4) <A_OBJECTS>.
B_OBJECTS b[2]; // 'b' is an array of (2) <B_OBJECTS>.
C_OBJECTS c[3]; // 'c' is an array of (3) <C_OBJECTS>.
Returning to the declaration of a
, recall that the identifier a
used without any subscript provides a constant address expression whose value is the base address of the array a
. In effect, the words "array of" in the attribute list get replaced by the words "pointer to":
a
is a pointer to (the 1st of 4) <chars>.
In a similar manner, we have:
b
is a pointer to (the 1st of 2) <arrays of (4) chars>.
c
is a pointer to (the 1st of 3) <arrays of (2) arrays of (4) chars>
This understanding of the identifiers (without full subscripting) is crucial because their appearance in expressions involving pointer arithmetic demands that we understand the size of the objects. Thus,
'
a+1
' is a pointer to the 2nd <char>.'
b+1
' is a pointer to the 2nd <array of (4) chars>.'
c+1
' is a pointer to the 2nd <array of (2) arrays of (4) chars>.
Now it should be clear that the expressions 'a[1]
' and '*(a+1)
' are equivalent, because both refer to the 2nd char in the array a
. Internally, C compilers always convert a subscripted array reference into the equivalent pointer expression before compiling it[1]. We can use this "equivalence" between subscripts and asterisks to determine whether or not a more complicated expression involving an array identifier is "fully subscripted" or not. For example, consider the expression:
*(c[2]+1)+3
Is this a char, a pointer to a char, or what?
If we go back to the declaration of c
, we see that it is an array of three dimensions. Since the above expression contains one subscript and one asterisk, only two of the three dimensions have been taken care of within the expression. Since the expression is not fully subscripted, it yields a pointer, not a char.
In particular, since we know that the first two of three dimensions have been subscripted, then we can conclude that two of the three "array of" phrases within the attribute list of c
have been "removed". In other words, the object
*(c[2]+1)
is an array of (4) <chars>
When used as an expression, it is treated as a pointer to the 1st of 4 chars. Adding 3 to this then provides a pointer to the 4th char.
Alternately, we could start with the attribute list of the identifier c
, removing phrases as we add parts of the expression:
c
is an array of (3) <arrays of (2) arrays of (4) chars>.'
c[2]
' is (the 3rd) array of (2) <arrays of (4) chars>.'
c[2]
' is a pointer to (the 1st of 2) <array of (4) chars>.'
c[2]+1
' is a pointer to (the 2nd of 2) <array of (4) chars>.'
*(c[2]+1)
' is (the 2nd) array of (4) <chars>.'
*(c[2]+1)
' is a pointer to (the 1st of 4) <chars>.'
*(c[2]+1)+3
' is a pointer to (the 4th of 4) <chars>.