[26.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?
No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.
Even if you think of a "character" as a multi-byte thingy, char is not.
sizeof(char) is always exactly 1. No exceptions, ever.
Look, I know this is going to hurt your head, so please, please just
read the next few FAQs in sequence and hopefully the pain will go away by
sometime next week.
For example, if sizeof(Fred) is 8, the distance between two Fred objects
in an array of Freds will be exactly 8 bytes.
As another example, this means sizeof(char) is onebyte. That's right: one byte. One, one, one, exactly one byte,
always one byte. Never two bytes. No exceptions.
[26.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?
Yes that's right: the thing commonly referred to as a "character" might be
different from the thing C++ calls a char.
I'm really sorry if that hurts, but believe me, it's better to get all the
pain over with at once. Take a deep breath and repeat after me: "character
and char might be different." There, doesn't that feel better? No? Well
keep reading it gets worse.
[26.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!?
Yep, that's right: a C++ byte might have more than 8 bits.
The C++ language guarantees a byte must always have at least 8 bits.
But there are implementations of C++ that have more than 8 bits per byte.
[26.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?
Wrong.
I have heard of one implementation of C++ that has 64-bit "bytes." You read
that right: a byte on that implementation has 64 bits. 64 bits per byte. 64.
As in 8 times 8.
And yes, you're right, combining with the above would
mean that a char on that implementation would have 64 bits.
[26.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time?
Here are the rules:
The C++ language gives the programmer the impression that memory is
laid out as a sequence of something C++ calls "bytes."
Each of these things that the C++ language calls a byte has at
least 8 bits, but might have more than 8 bits.
The C++ language guarantees that a char* (char pointers) can
address individual bytes.
The C++ language guarantees there are no bits between two
bytes. This means every bit in memory is part of a byte. If you grind your
way through memory via a char*, you will be able to see every
bit.
The C++ language guarantees there are no bits that are part of two
distinct bytes. This means a change to one byte will never cause a change
to a different byte.
The C++ language gives you a way to find out how many bits are in a
byte in your particular implementation: include the header <climits>,
then the actual number of bits per byte will be given by the CHAR_BIT
macro.
Let's work an example to illustrate these rules. The PDP-10 has 36-bit words
with no hardware facility to address anything within one of those words. That
means a pointer can point only at things on a 36-bit boundary: it is not
possible for a pointer to point 8 bits to the right of where some other
pointer points.
One way to abide by all the above rules is for a PDP-10 C++ compiler to define
a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9
bits, and simulate a char* by two words of memory: the first could point to
the 36-bit word, the second could be a bit-offset within that word. In that
case, the C++ compiler would need to add extra instructions when compiling
code using char* pointers. For example, the code generated for *p =
'x' might read the word into a register, then use bit-masks and bit-shifts
to change the appropriate 9-bit byte within that word. An int* could
still be implemented as a single hardware pointer, since C++ allows
sizeof(char*) != sizeof(int*).
Using the same logic, it would also be possible to define a PDP-10 C++ "byte"
as 12-bits or 18-bits. However the above technique wouldn't allow us to
define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte
we would skip 4 bits. A more complicated approach could be used for
those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent
36-bit words. The important point here is that memcpy() has to be
able to see every bit of memory: there can't be any bits between two adjacent
bytes.
Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5
bytes (of 7-bits each) into each 36-bit word. However this won't work in C or
C++ since 5*7 = 35, meaning using char*s to walk through memory would "skip"
a bit every fifth byte (and also because C++ requires bytes to have at least 8
bits).
A type that consists of nothing but Plain Old
Data.
A POD type is a C++ type that has an equivalent in C, and that uses the same
rules as C uses for initialization, copying, layout, and addressing.
As an example, the C declaration struct Fred x; does not initialize the
members of the Fred variable x. To make this same behavior happen in C++,
Fred would need to not have any constructors. Similarly to make the
C++ version of copying the same as the C version, the C++ Fred must not have
overloaded the assignment operator. To make sure the other rules match, the
C++ version must not have virtual functions, base classes, non-static members
that are private or protected, or a destructor. It can, however, have
static data members, static member functions, and non-static non-virtual
member functions.
The actual definition of a POD type is recursive and gets a little gnarly.
Here's a slightly simplified definition of POD: a POD type's
non-static data members must be public and can be of any of these types:
bool, any numeric type including the various char variants, any
enumeration type, any data-pointer type (that is, any type convertible to
void*), any pointer-to-function type, or any POD type, including arrays of
any of these. Note: data-pointers and pointers-to-function are okay, but
pointers-to-member are not. Also note that
references are not allowed. In addition, a POD type can't have constructors,
virtual functions, base classes, or an overloaded assignment operator.
[26.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment?
For symmetry, it is usually best to initialize all non-static data members in
the constructor's "initialization list," even those that are of a built-in /
intrinsic / primitive type. The FAQ shows you why and
how.
[26.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"?
Yes, if you initialize your built-in / intrinsic / primitive variable by an
expression that the compiler doesn't evaluate solely at compile-time. The FAQ
provides several solutions for
this (subtle!) problem.
[26.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?
No, the C++ language requires that your operator overloads take at least one
operand of a "class type" or enumeration type. The C++ language will not let you define an
operator all of whose operands / parameters are of primitive types.
If C++ let you redefine the meaning of operators on built-in types, you
wouldn't ever know what 1 + 1 is: it would depend on which headers got
included and whether one of those headers redefined addition to mean, for
example, subtraction.
[26.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?
Because you can't.
Look, please don't write me an email asking me why C++ is what it is.
It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent
book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your
real goal is to write some code, don't waste too much time figuring out
why C++ has these rules, and instead just abide by its rules.
So here's the rule: if a points to an array of thingies that was
allocated via new T[n], then you must,
must, mustdelete it via delete[] a. Even if the
elements in the array are built-in types. Even if they're of type char or
int or void*. Even if you don't understand why.