Files
afs2026-hw1/c++example/c++.tex
2026-03-27 10:12:37 +08:00

1577 lines
65 KiB
TeX
Executable File

\documentstyle[12pt,fullpage]{article}
\newcommand{\putfig}[3]%
{\begin{figure}%
\centerline{%
\psfig{figure=#1.ps,width=#3}}%
\caption{#2}%
\label{fig:#1}%
\end{figure}}
\input{psfig}
\begin{document}
\begin{figure*}[t]
\begin{center}
{\LARGE\bf A Quick Introduction to C++}
\vspace{3.0ex}
{\Large Tom Anderson}
\end{center}
\end{figure*}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\footnotetext{This article is based on an earlier version written by Wayne Christopher.}
\renewcommand{\thefootnote}{}
\renewcommand{\thefootnote}{\arabic{footnote}}
\begin{quote}
``If programming in Pascal is like being put in a straightjacket,
then programming in C is like playing with knives, and programming
in C++ is like juggling chainsaws.'' \\ \hbox{} \hfill Anonymous.
\end{quote}
\section{Introduction}
This note introduces some simple C++ concepts and outlines a
subset of C++ that is easier to learn and use than
the full language. Although we originally wrote this note for
explaining the C++ used in the Nachos project, I believe it is
useful to anyone learning C++.
I assume that you are already somewhat familiar with C concepts
like procedures, for loops, and pointers; these are pretty easy
to pick up from reading Kernighan and Ritchie's ``The C Programming
Language.''
I should admit up front that I am quite opinionated about C++, if
that isn't obvious already.
I know several C++ purists (an oxymoron perhaps?) who violently
disagree with some of
the prescriptions contained here; most of the objections are of
the form, ``How could you have possibly left out feature X?''
However, I've found from teaching C++ to nearly 1000 undergrads
over the past several years that the subset of C++ described here is
pretty easy to learn, taking only a day or so for most students
to get started.
The basic premise of this note is that while object-oriented
programming is a useful way to simplify programs, C++ is a wildly
over-complicated
language, with a host of features that only very, very rarely find a
legitimate use. It's not too far off the mark to say that C++ includes
every programming language feature ever imagined, and more.
The natural tendency when faced with a new language feature
is to try to use it, but in C++ this approach leads to disaster.
Thus, we need to carefully distinguish between (i) those concepts
that are fundamental (e.g., classes, member functions, constructors)
-- ones that everyone should know and use, (ii) those that are sometimes
but rarely useful (e.g., single inheritance, templates) -- ones that
beginner programmers should be able to recognize (in case they run across
them) but avoid using in their own programs, at least for a while,
and (iii) those that are just a bad idea and should be avoided like
the plague (e.g., multiple inheritance, exceptions, overloading,
references, etc).
Of course, all the items in this last category have their proponents,
and I will admit that, like the hated goto, it is possible to
construct cases when the program would be simpler using a goto or
multiple inheritance. However, it is
my belief that most programmers will never encounter such cases,
and even if you do, you will be much more likely to misuse the
feature than properly apply it.
For example, I seriously doubt an undergraduate would need any of
the features listed under (iii) for any course project (at least
at Berkeley this is true). And if you find yourself wanting to use
a feature like multiple inheritance, then, my advice is to fully
implement your program both with and without the feature, and choose
whichever is simpler. Sure, this takes more effort, but
pretty soon you'll know from experience when a feature is useful and when
it isn't, and you'll be able to skip the dual implementation.
A really good way to learn a language is to read clear programs in that
language. I have tried to make the Nachos code as readable as possible;
it is written in the subset of C++ described in this note.
It is a good idea to look over the first assignment as you read this
introduction. Of course, your TA's will answer any questions you may
have.
You should not need a book on C++ to do the Nachos assignments, but if
you are curious, there is a large selection of C++ books
at Cody's and other technical bookstores. (My wife quips that C++ was
invented to make researchers at Bell Labs rich from writing
``How to Program in C++'' books.) Most new software development
these days is being done in C++, so it is a pretty good bet you'll
run across it in the future. I use Stroustrup's "The C++
Programming Language" as a reference manual, although other
books may be more readable. I would also recommend Scott Meyer's
``Effective C++'' for people just beginning to learn the language,
and Coplien's ``Advanced C++'' once you've been programming in C++
for a couple years and are familiar with the language basics.
Also, C++ is continually evolving, so be careful to buy books that describe
the latest version (currently 3.0, I think!).
\section{C in C++}
To a large extent, C++ is a superset of C, and most carefully written
ANSI C will compile as C++. There are a few major caveats though:
\begin{enumerate}
\item All functions must be declared before they are used, rather than
defaulting to type {\tt int}.
\item All function declarations and definition headers must use
new-style declarations, e.g.,
\begin{verbatim}
extern int foo(int a, char* b);
\end{verbatim}
The form {\tt extern int foo();} means that {\tt foo} takes {\it no}
arguments, rather than arguments of an unspecified type and number.
In fact, some advise using a C++ compiler even
on normal C code, because it will catch errors like misused functions that
a normal C compiler will let slide.
\item If you need to link C object files together with C++, when you
declare the C functions for the C++ files, they must be done like this:
\begin{verbatim}
extern "C" int foo(int a, char* b);
\end{verbatim}
Otherwise the C++ compiler will alter the name in a strange manner.
\item There are a number of new keywords, which you may not use as
identifiers --- some common ones are {\tt new}, {\tt delete}, {\tt
const}, and {\tt class}.
\end{enumerate}
\section{Basic Concepts}
Before giving examples of C++ features, I will first go over some of
the basic concepts of object-oriented languages. If this discussion
at first seems a bit obscure, it will become clearer when we get
to some examples.
\begin{enumerate}
\item {\bf Classes and objects}. A class is similar to a C {\em structure},
except that the definition of the data structure, {\em and} all of the
functions that operate on the data structure are grouped together
in one place. An {\em object} is an instance of a class (an instance
of the data structure); objects share the same functions with other objects
of the same class, but each object (each instance) has its own copy of
the data structure. A class thus defines two aspects of the objects:
the {\em data} they contain, and the {\em behavior} they have.
\item {\bf Member functions}. These are functions which are
considered part of the object and are declared in the class
definition. They are often referred to as {\em methods} of the class.
In addition to member functions, a class's behavior is also defined
by:
\begin{enumerate}
\item What to do when you create a new object (the {\bf constructor}
for that object) -- in other words, initialize the object's data.
\item What to do when you delete an object (the {\bf destructor} for
that object).
\end{enumerate}
\item {\bf Private vs. public members}. A public member of a class is
one that can be read or written by anybody, in the case of a data
member, or called by anybody, in the case of a member function. A
private member can only be read, written, or called by a member
function of that class.
\end{enumerate}
Classes are used for two main reasons: (1) it makes it much easier to
organize your programs if you can group together data with the
functions that manipulate that data, and (2) the use of private
members makes it possible to do {\em information hiding}, so that you
can be more confident about the way information flows in your
programs.
\subsection{Classes}
C++ classes are similar to C structures in many ways. In fact, a C++
struct is really a class that has only public data members.
In the following explanation of how classes work, we will use a stack
class as an example.
\begin{enumerate}
\item {\bf Member functions.} Here is a (partial) example of a class
with a member function and some data members:
\begin{verbatim}
class Stack {
public:
void Push(int value); // Push an integer, checking for overflow.
int top; // Index of the top of the stack.
int stack[10]; // The elements of the stack.
};
void
Stack::Push(int value) {
ASSERT(top < 10); // stack should never overflow
stack[top++] = value;
}
\end{verbatim}
This class has two data members, {\tt top} and {\tt stack}, and one
member function, {\tt Push}.
The notation {\em class}::{\em function} denotes the
{\em function} member of the class {\em class}. (In the style we use,
most function names are capitalized.) The function is defined beneath
it.
As an aside, note that we use a call to {\tt ASSERT} to check that
the stack hasn't overflowed; ASSERT drops into the debugger if the condition
is false. It is an extremely good idea for you to use ASSERT
statements liberally throughout your code to document assumptions
made by your implementation. Better to catch errors automatically
via ASSERTs than to let them go by and have your program overwrite
random locations.
In actual usage, the definition of {\tt class Stack} would typically
go in the file {\tt stack.h} and the definitions of the member
functions, like {\tt Stack::Push}, would go in the file {\tt
stack.cc}.
If we have a pointer to a {\tt Stack} object called {\tt s}, we can
access the {\tt top} element as {\tt s->top}, just as in C. However,
in C++ we can also call the member function using the following syntax:
\begin{verbatim}
s->Push(17);
\end{verbatim}
Of course, as in C, {\tt s} must point to a valid {\tt Stack} object.
Inside a member function, one may refer to the members of the class
by their names alone. In other words, the class definition
creates a scope that includes the member (function and data) definitions.
Note that if you are inside a member function, you can get a pointer
to the object you were called on by using the variable {\tt this}.
If you want to call another member function on the same object, you
do not need to use the {\tt this} pointer, however. Let's extend the Stack
example to illustrate this by adding a {\tt Full()} function.
\begin{verbatim}
class Stack {
public:
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
int top; // Index of the lowest unused position.
int stack[10]; // A pointer to an array that holds the contents.
};
\end{verbatim}
\newpage
\begin{verbatim}
bool
Stack::Full() {
return (top == 10);
}
\end{verbatim}
Now we can rewrite {\tt Push} this way:
\begin{verbatim}
void
Stack::Push(int value) {
ASSERT(!Full());
stack[top++] = value;
}
\end{verbatim}
We could have also written the ASSERT:
\begin{verbatim}
ASSERT(!(this->Full());
\end{verbatim}
but in a member function, the \verb+this->+ is implicit.
The purpose of member functions is to encapsulate the functionality of
a type of object along with the data that the object contains. A
member function does not take up space in an object of the class.
\item {\bf Private members.} One can declare some
members of a class to be {\it private}, which are hidden to all but
the member functions of that class, and some to be {\it public}, which
are visible and accessible to everybody. Both data and function members
can be either public or private.
In our stack example, note that once we have the {\tt Full()}
function, we really don't need to look at the {\tt top} or {\tt stack}
members outside of the class -- in fact, we'd rather that users of the Stack
abstraction {\em not} know about its internal implementation, in case
we change it. Thus we can rewrite the class as follows:
\begin{verbatim}
class Stack {
public:
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
private:
int top; // Index of the top of the stack.
int stack[10]; // The elements of the stack.
};
\end{verbatim}
Before, given a pointer to a {\tt Stack} object, say {\tt s}, any part
of the program could access {\tt s->top}, in potentially bad ways.
Now, since the {\tt top} member is private, only a member function,
such as {\tt Full()}, can access it. If any other part of the
program attempts to use {\tt s->top} the compiler will report an error.
You can have alternating {\tt public:} and {\tt private:} sections in
a class. Before you specify either of these, class members are
private, thus the above example could have been written:
\begin{verbatim}
class Stack {
int top; // Index of the top of the stack.
int stack[10]; // The elements of the stack.
public:
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
};
\end{verbatim}
Which form you prefer is a matter of style, but it's usually best
to be explicit, so that it is obvious what is intended. In Nachos,
we make everything explicit.
What is not a matter of style: {\bf all data members of a class
should be private.} All operations on data should be via that
class' member functions. Keeping data private adds to the modularity
of the system, since you can redefine how the data members are stored
without changing how you access them.
\item {\bf Constructors and the operator new.} In C, in
order to create a new object of type {\tt Stack}, one might write:
\begin{verbatim}
struct Stack *s = (struct Stack *) malloc(sizeof (struct Stack));
InitStack(s, 17);
\end{verbatim}
The {\tt InitStack()} function might take the second argument as the
size of the stack to create, and use {\tt malloc()} again to get an
array of 17 integers.
The way this is done in C++ is as follows:
\begin{verbatim}
Stack *s = new Stack(17);
\end{verbatim}
The {\tt new} function takes the place of {\tt malloc()}. To
specify how the object should be initialized, one declares a {\it
constructor} function as a member of the class, with the name of the
function being the same as the class name:
\begin{verbatim}
class Stack {
public:
Stack(int sz); // Constructor: initialize variables, allocate space.
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
private:
int size; // The maximum capacity of the stack.
int top; // Index of the lowest unused position.
int* stack; // A pointer to an array that holds the contents.
};
Stack::Stack(int sz) {
size = sz;
top = 0;
stack = new int[size]; // Let's get an array of integers.
}
\end{verbatim}
There are a few things going on here, so we will describe them one at
a time.
The {\tt new} operator automatically creates (i.e. allocates) the object
and then calls the constructor function for the new object.
This same sequence happens even if, for instance, you declare an object
as an automatic variable inside a function or block -- the compiler allocates
space for the object on the stack, and calls the constructor function on it.
In this example, we create two stacks of different sizes, one
by declaring it as an automatic variable, and one by using {\tt new}.
\begin{verbatim}
void
test() {
Stack s1(17);
Stack* s2 = new Stack(23);
}
\end{verbatim}
Note there are two ways of providing arguments to constructors: with
{\tt new}, you put the argument list after the class name, and with
automatic or global variables, you put them after the variable name.
It is crucial that you {\bf always} define a constructor
for every class you define, and that the constructor initialize
{\bf every} data member of the class. If you don't define
your own constructor, the compiler will automatically define
one for you, and believe me, it won't do what you want
(``the unhelpful compiler'').
The data members will be initialized to random, unrepeatable
values, and while your program may work anyway, it might not
the next time you recompile (or vice versa!).
As with normal C variables, variables declared inside a function
are deallocated automatically when the function returns; for
example, the {\tt s1} object is deallocated when {\tt test}
returns. Data allocated with {\tt new} (such as {\tt s2}) is
stored on the heap, however, and remains after the function returns;
heap data must be explicitly disposed of using {\tt delete}, described below.
The {\tt new} operator can also be used to allocate arrays, illustrated
above in allocating an array of {\tt ints}, of dimension {\tt size}:
\begin{verbatim}
stack = new int[size];
\end{verbatim}
Note that you can use {\tt new} and {\tt delete} (described below)
with built-in types like {\tt int} and {\tt char} as well as with
class objects like {\tt Stack}.
\item {\bf Destructors and the operator delete.} Just as {\tt new} is the
replacement for {\tt malloc()}, the replacement for {\tt free()} is
{\tt delete}. To get rid of the {\tt Stack} object we allocated
above with {\tt new}, one can do:
\begin{verbatim}
delete s2;
\end{verbatim}
This will deallocate the object, but first it will call the
{\it destructor} for the {\tt Stack} class, if there is one. This
destructor is a member function of {\tt Stack} called {\tt {\verb^~^}Stack()}:
\begin{verbatim}
class Stack {
public:
Stack(int sz); // Constructor: initialize variables, allocate space.
~Stack(); // Destructor: deallocate space allocated above.
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
private:
int size; // The maximum capacity of the stack.
int top; // Index of the lowest unused position.
int* stack; // A pointer to an array that holds the contents.
};
Stack::~Stack() {
delete [] stack; // delete an array of integers
}
\end{verbatim}
The destructor has the job of deallocating the data the constructor
allocated. Many classes won't need destructors, and some will use
them to close files and otherwise clean up after themselves.
The destructor for an object is called when the object is deallocated.
If the object was created with {\tt new}, then you must call
{\tt delete} on the object, or else the object will continue
to occupy space until the program is over -- this is called
``a memory leak.'' Memory leaks are bad things -- although virtual
memory is supposed to be unlimited, you can in fact run out of it --
and so you should be careful to {\bf always} delete what you allocate.
Of course, it is even worse to call {\tt delete} too early --
{\tt delete} calls the destructor and puts the space back on the heap
for later re-use. If you are still using the object, you will
get random and non-repeatable results that will be very difficult
to debug. In my experience, using data that has already been deleted
is major source of hard-to-locate bugs in student (and professional)
programs, so hey, be careful out there!
If the object is an automatic, allocated on the execution stack
of a function, the destructor will be called and the space deallocated when
the function returns; in the {\tt test()} example above, {\tt s1}
will be deallocated when {\tt test()} returns, without you having to
do anything.
In Nachos, we always explicitly allocate and deallocate objects with
{\tt new} and {\tt delete}, to make it clear when the constructor and
destructor is being called. For example, if an object contains another
object as a member variable, we use
{\tt new} to explicitly allocated and initialize the member variable,
instead of implicitly allocating it as part of the containing object.
C++ has strange, non-intuitive rules for the order in which the
constructors and destructors are called when you implicitly allocate
and deallocate objects. In practice, although simpler, explicit allocation
is slightly slower and it makes it more likely that you will forget
to deallocate an object (a bad thing!), and so some would disagree with
this approach.
When you deallocate an array, you have to tell the compiler that
you are deallocating an array, as opposed to a single element in the array.
Hence to delete the array of integers in {\tt Stack::{\verb^~^}Stack}:
\begin{verbatim}
delete [] stack;
\end{verbatim}
\end{enumerate}
\subsection{Other Basic C++ Features}
Here are a few other C++ features that are useful to know.
\begin{enumerate}
\item When you define a {\tt class Stack}, the name {\tt Stack} becomes
usable as a type name as if created with {\tt typedef}. The same is
true for {\tt enum}s.
\item You can define functions inside of a {\tt class} definition,
whereupon they become {\it inline functions}, which are expanded in
the body of the function where they are used. The rule of thumb to
follow is to only consider inlining one-line functions, and even then
do so rarely.
As an example, we could make the {\tt Full} routine an inline.
\begin{verbatim}
class Stack {
...
bool Full() { return (top == size); };
...
};
\end{verbatim}
There are two motivations for inlines: convenience
and performance. If overused, inlines can make your code more confusing,
because the implementation for an object is no longer in one place,
but spread between the {\tt .h} and {\tt .c} files. Inlines can sometimes
speed up your code (by avoiding the overhead of a procedure call), but
that shouldn't be your principal concern as a student (rather, at least to
begin with, you should be most concerned with writing code that is simple
and bug free). Not to mention that inlining sometimes slows down a program,
since the object code for the function is duplicated wherever the function
is called, potentially hurting cache performance.
\item Inside a function body, you can declare some variables, execute
some statements, and then declare more variables. This can make code
a lot more readable. In fact, you can even write things like:
\begin{verbatim}
for (int i = 0; i < 10; i++) ;
\end{verbatim}
Depending on your compiler, however, the variable {\tt i} may still visible
after the end of the {\tt for}
loop, however, which is not what one might expect or desire.
\item Comments can begin with the characters \verb+//+ and extend to
the end of the line. These are usually more handy than the
\verb+/* */+ style of comments.
\item C++ provides some new opportunities to use the
{\tt const} keyword from ANSI C. The basic idea of {\tt const}
is to provide extra information to the compiler about how a variable
or function is used, to allow it to flag an error if it is being
used improperly. You should always look for ways to get the compiler
to catch bugs for you. After all, which takes less time? Fixing
a compiler-flagged error, or chasing down the same bug using gdb?
For example, you can declare that a member function only reads the
member data, and never modifies the object:
\begin{verbatim}
class Stack {
...
bool Full() const; // Full() never modifies member data
...
};
\end{verbatim}
As in C, you can use {\tt const} to declare that a variable is never
modified:
\begin{verbatim}
const int InitialHashTableSize = 8;
\end{verbatim}
This is {\em much} better than using {\tt \#define} for constants,
since the above is type-checked.
\item Input/output in C++ can be done with the {\tt >>} and {\tt <<}
operators and the objects {\tt cin} and {\tt cout}. For example,
to write to {\tt stdout}:
\begin{verbatim}
cout << "Hello world! This is section " << 3 << "!";
\end{verbatim}
This is equivalent to the normal C code
\begin{verbatim}
fprintf(stdout, "Hello world! This is section %d!\n", 3);
\end{verbatim}
except that the C++ version is type-safe; with {\tt printf}, the
compiler won't complain if you try to print a floating point number
as an integer. In fact, you can use traditional {\tt printf} in a C++
program, but you will get bizarre behavior if you try to use both
{\tt printf} and {\tt <<} on the same stream. Reading from {\tt stdin}
works the same way as writing to {\tt stdout}, except using the shift
right operator instead of shift left.
In order to read two integers from {\tt stdin}:
\begin{verbatim}
int field1, field2;
cin >> field1 >> field2;
// equivalent to fscanf(stdin, "%d %d", &field1, &field2);
// note that field1 and field2 are implicitly modified
\end{verbatim}
In fact, {\tt cin} and {\tt cout} are implemented as normal C++
objects, using operator overloading and reference parameters, but
(fortunately!) you don't need to understand either of those to be able
to do I/O in C++.
\end{enumerate}
\section{Advanced Concepts in C++: Dangerous but Occasionally Useful}
There are a few C++ features, namely (single) inheritance and templates,
which are easily abused, but can dramatically simplify an
implementation if used properly. I describe the basic idea
behind these ``dangerous but useful'' features here, in case you
run across them. Feel free to skip this section -- it's long,
complex, and you can understand 99\% of the code in Nachos without
reading this section.
Up to this point, there really hasn't been any fundamental difference
between programming in C and in C++. In fact, most experienced
C programmers organize their functions into modules that relate
to a single data structure (a "class"), and often even use a naming
convention which mimics C++, for example, naming routines
{\tt StackFull()} and {\tt StackPush()}. However, the features
I'm about to describe {\em do} require a paradigm shift -- there
is no simple translation from them into a normal C program.
The benefit will be that, in some circumstances, you will be able to
write generic code that works with multiple kinds of objects.
Nevertheless, I would advise a beginning C++ programmer against trying
to use these features, because you will almost
certainly misuse them. It's possible (even easy!) to write completely
inscrutable code using inheritance and/or templates. Although
you might find it amusing to write code that is impossible for your
graders to understand, I assure you they won't find it amusing at all,
and will return the favor when they assign grades. In industry,
a high premium is placed on keeping code simple and readable.
It's easy to write new code, but the real cost comes when
you try to keep it working, even as you add new features to it.
Nachos contains a few examples of the correct use of inheritance
and templates, but realize that Nachos does {\em not} use them
everywhere. In fact, if you get confused by this section, don't worry,
you don't need to use any of these features in order to do the Nachos
assignments. I omit a whole bunch of details; if you find yourself
making widespread use of inheritance or templates, you should consult a C++
reference manual for the real scoop. This is meant to
be just enough to get you started, and to help you identify when it would
be appropriate to use these features and thus learn more
about them!
\subsection{Inheritance}
Inheritance captures the idea that certain classes of objects are
related to each other in useful ways. For example, lists
and sorted lists have quite similar behavior -- they both
allow the user to insert, delete, and find elements that are
on the list. There are two benefits to using inheritance:
\begin{enumerate}
\item You can write generic code that doesn't
care exactly which kind of object it is manipulating. For
example, inheritance is widely used in windowing systems.
Everything on the screen (windows, scroll bars, titles, icons)
is its own object, but they all share a set of member functions
in common, such as a routine {\tt Repaint} to redraw the object
onto the screen. This way, the code to repaint the entire screen
can simply call the {\tt Repaint} function on every object on the screen.
The code that calls {\tt Repaint} doesn't need to know which
kinds of objects are on the screen, as long as each implements
{\tt Repaint}.
\item You can share pieces of an implementation between two
objects. For example, if you were to implement both lists and
sorted lists in C, you'd probably find yourself repeating code
in both places -- in fact, you might be really tempted to
only implement sorted lists, so that you only had to debug
one version. Inheritance provides a way to re-use code
between nearly similar classes. For example, given an implementation
of a list class, in C++ you can implement sorted lists by replacing
the insert member function -- the other functions, delete, isFull,
print, all remain the same.
\end{enumerate}
\subsubsection{Shared Behavior}
Let me use our Stack example to illustrate the first of these.
Our Stack implementation
above could have been implemented with linked lists, instead of an array.
Any code using a Stack shouldn't
care which implementation is being used, except that the linked list
implementation can't overflow. (In fact, we could also change the
array implementation to handle overflow by automatically resizing
the array as items are pushed on the stack.)
To allow the two implementations to coexist, we first define an
{\em abstract} Stack, containing just the public member functions,
but no data.
\begin{verbatim}
class Stack {
public:
Stack();
virtual ~Stack(); // deallocate the stack
virtual void Push(int value) = 0;
// Push an integer, checking for overflow.
virtual bool Full() = 0; // Is the stack is full?
};
// For g++, need these even though no data to initialize.
Stack::Stack {}
Stack::~Stack() {}
\end{verbatim}
The {\tt Stack} definition is called a {\em base class} or sometimes a {\em
superclass}. We can then define two different {\em derived classes},
sometimes called {\em subclasses} which inherit behavior from the base
class. (Of course, inheritance is recursive -- a derived class can in
turn be a base class for yet another derived class, and so on.)
Note that I have prepended the functions in the base class is prepended
with the keyword {\tt virtual}, to signify that they can be redefined by
each of the two derived classes. The virtual functions are
initialized to zero, to tell the compiler that those functions
must be defined by the derived classes.
Here's how we could declare the array-based and list-based
implementations of {\tt Stack}. The syntax {\tt : public Stack} signifies
that both {\tt ArrayStack} and {\tt ListStack} are kinds
of {\tt Stacks}, and share the same behavior as the base class.
\begin{verbatim}
class ArrayStack : public Stack { // the same as in Section 2
public:
ArrayStack(int sz); // Constructor: initialize variables, allocate space.
~ArrayStack(); // Destructor: deallocate space allocated above.
void Push(int value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
private:
int size; // The maximum capacity of the stack.
int top; // Index of the lowest unused position.
int *stack; // A pointer to an array that holds the contents.
};
class ListStack : public Stack {
public:
ListStack();
~ListStack();
void Push(int value);
bool Full();
private:
List *list; // list of items pushed on the stack
};
ListStack::ListStack() {
list = new List;
}
ListStack::~ListStack() {
delete list;
}
\end{verbatim}
\newpage
\begin{verbatim}
void ListStack::Push(int value) {
list->Prepend(value);
}
bool ListStack::Full() {
return FALSE; // this stack never overflows!
}
\end{verbatim}
The neat concept here is that I can assign pointers to instances of
{\tt ListStack} or {\tt ArrayStack} to a variable of type {\tt Stack}, and
then use them as if they were of the base type.
\begin{verbatim}
Stack *s1 = new ListStack;
Stack *s2 = new ArrayStack(17);
if (!stack->Full())
s1->Push(5);
if (!s2->Full())
s2->Push(6);
delete s1;
delete s2;
\end{verbatim}
The compiler automatically invokes {\tt ListStack} operations
for {\tt s1}, and {\tt ArrayStack} operations for {\tt s2};
this is done by creating a procedure table for each object,
where derived objects override the default entries in the table
defined by the base class. To the code above, it invokes the
operations {\tt Full}, {\tt Push}, and {\tt delete} by indirection
through the procedure table, so that the code doesn't need to know
which kind of object it is.
In this example, since I never create an instance of the
abstract class {\tt Stack}, I do not need to {\em implement} its
functions. This might seem a bit strange, but remember that
the derived classes are the various implementations of Stack,
and Stack serves only to reflect the shared behavior between
the different implementations.
Also note that the destructor for {\tt Stack} is a virtual
function but the constructor is not. Clearly, when I create an
object, I have to know which kind of object it is, whether
{\tt ArrayStack} or {\tt ListStack}. The compiler
makes sure that no one creates an instance of the abstract {\tt Stack}
by mistake -- you cannot instantiate any class whose virtual
functions are not completely defined (in other words, if any of
its functions are set to zero in the class definition).
But when I deallocate an object, I may no longer know its exact
type. In the above code, I want to call the destructor for the
derived object, even though the code only knows that I am deleting
an object of class {\tt Stack}. If the destructor were not virtual,
then the compiler would invoke {\tt Stack}'s destructor, which is
not at all what I want. This is an easy mistake to make (I made
it in the first draft of this article!) -- if you don't define
a destructor for the abstract class, the compiler will define one
for you implicitly (and by the way, it won't be virtual, since you
have a {\em really} unhelpful compiler). The result for the
above code would be a memory leak, and who knows how you would
figure that out!
\subsubsection{Shared Implementation}
What about sharing code, the other reason for inheritance?
In C++, it is possible to use member functions
of a base class in its derived class. (You can also share
data between a base class and derived classes, but this
is a bad idea for reasons I'll discuss later.)
Suppose that I wanted to add a new member function,
{\tt NumberPushed()}, to both implementations of {\tt Stack}.
The {\tt ArrayStack} class already keeps count of the number of
items on the stack, so I could duplicate that code in {\tt ListStack}.
Ideally, I'd like to be able to use the same code in both places.
With inheritance, we can move the counter into the
{\tt Stack} class, and then invoke the base class operations
from the derived class to update the counter.
\begin{verbatim}
class Stack {
public:
virtual ~Stack(); // deallocate data
virtual void Push(int value); // Push an integer, checking for overflow.
virtual bool Full() = 0; // return TRUE if full
int NumPushed(); // how many are currently on the stack?
protected:
Stack(); // initialize data
private:
int numPushed;
};
Stack::Stack() {
numPushed = 0;
}
void Stack::Push(int value) {
numPushed++;
}
int Stack::NumPushed() {
return numPushed;
}
\end{verbatim}
We can then modify both {\tt ArrayStack} and {\tt ListStack}
to make use the new behavior of {\tt Stack}. I'll only list
one of them here:
\begin{verbatim}
class ArrayStack : public Stack {
public:
ArrayStack(int sz);
~ArrayStack();
void Push(int value);
bool Full();
private:
int size; // The maximum capacity of the stack.
int *stack; // A pointer to an array that holds the contents.
};
ArrayStack::ArrayStack(int sz) : Stack() {
size = sz;
stack = new int[size]; // Let's get an array of integers.
}
void
ArrayStack::Push(int value) {
ASSERT(!Full());
stack[NumPushed()] = value;
Stack::Push(); // invoke base class to increment numPushed
}
\end{verbatim}
There are a few things to note:
\begin{enumerate}
\item The constructor for {\tt ArrayStack} needs to invoke the
constructor for {\tt Stack}, in order to initialize {\tt numPushed}.
It does that by adding {\tt : Stack()} to the first line in the constructor:
\begin{verbatim}
ArrayStack::ArrayStack(int sz) : Stack()
\end{verbatim}
The same thing applies to destructors. There are special rules for which
get called first -- the constructor/destructor for the base class or
the constructor/destructor for the derived class. All I should say is,
it's a bad idea to rely on whatever the rule is -- more generally, it is a
bad idea to write code which requires the reader to consult a manual
to tell whether or not the code works!
\item I introduced a new keyword, {\tt protected}, in the new definition
of {\tt Stack}. For a base class, {\tt protected} signifies that those
member data and functions are accessible to classes derived (recursively)
from this class, but inaccessible to other classes. In other words, protected
data is {\tt public} to derived classes, and {\tt private} to everyone else.
For example, we need {\tt Stack}'s constructor to be callable by
{\tt ArrayStack} and {\tt ListStack}, but we don't want anyone
else to create instances of {\tt Stack}. Hence, we make {\tt Stack}'s
constructor a protected function. In this case, this is not strictly
necessary since the compiler will complain if anyone tries to create an
instance of {\tt Stack} because {\tt Stack} still has an undefined virtual
functions, {\tt Push}. By defining {\tt Stack::Stack} as {\tt protected},
you are safe even if someone comes along later and defines {\tt Stack::Push}.
Note however that I made {\tt Stack}'s data member {\tt private}, not
{\tt protected}. Although there is some debate on this point,
as a rule of thumb you should never allow one class to see directly
access the data in another, even among classes related
by inheritance. Otherwise, if you ever change the implementation
of the base class, you will have to examine and change all the
implementations of the derived classes, violating modularity.
\item The interface for a derived class automatically includes all
functions defined for its base class, without having to explicitly
list them in the derived class. Although we didn't define
{\tt NumPushed()} in {\tt ArrayStack}, we can still call it for
those objects:
\begin{verbatim}
ArrayStack *s = new ArrayStack(17);
ASSERT(s->NumPushed() == 0); // should be initialized to 0
\end{verbatim}
\item Conversely, even though we have defined a routine {\tt Stack::Push()},
because it is declared as {\tt virtual}, if we invoke {\tt Push()}
on an {\tt ArrayStack} object, we will get {\tt ArrayStack}'s version
of {\tt Push}:
\begin{verbatim}
Stack *s = new ArrayStack(17);
if (!s->Full()) // ArrayStack::Full
s->Push(5); // ArrayStack::Push
\end{verbatim}
\item {\tt Stack::NumPushed()} is not {\tt virtual}. That means
that it cannot be re-defined by {\tt Stack}'s derived classes.
Some people believe that you should mark {\em all} functions
in a base class as {\tt virtual}; that way, if you later want to
implement a derived class that redefines a function, you don't have
to modify the base class to do so.
\item Member functions in a derived class can explicitly invoke
public or protected functions in the base class, by the full
name of the function, {\tt Base::Function()}, as in:
\begin{verbatim}
void ArrayStack::Push(int value)
{
...
Stack::Push(); // invoke base class to increment numPushed
}
\end{verbatim}
Of course, if we just called {\tt Push()} here (without prepending
{\tt Stack::}, the compiler would think we were referring
to {\tt ArrayStack}'s {\tt Push()}, and so that would recurse,
which is not exactly what we had in mind here.
\end{enumerate}
Whew! Inheritance in C++ involves lots and lots of details.
But it's real downside is that it tends to spread implementation
details across multiple files -- if you have a deep inheritance
tree, it can take some serious digging to figure out what code
actually executes when a member function is invoked.
So the question to ask yourself before using inheritance is:
what's your goal? Is it to write your programs with the
fewest number of characters possible? If so, inheritance is
really useful, but so is changing all of your function and variable
names to be one letter long -- "a", "b", "c" -- and once you
run out of lower case ones, start using upper case, then two character
variable names: "XX XY XZ Ya ..." (I'm joking here.)
Needless to say, it is really easy to write unreadable code
using inheritance.
So when is it a good idea to use inheritance and when should it be
avoided? My rule of thumb is to only use it for representing
{\em shared behavior} between objects, and to never use it for
representing {\em shared implementation}. With C++, you can use
inheritance for both concepts, but only the first will lead to
truly simpler implementations.
To illustrate the difference between shared behavior and shared
implementation, suppose you had a whole bunch of different kinds
of objects that you needed to put on lists. For example, almost everything
in an operating system goes on a list of some sort: buffers, threads,
users, terminals, etc.
A very common approach to this problem (particularly among people new
to object-oriented programming) is to make every object inherit from
a single base class {\em Object}, which contains the forward and backward
pointers for the list. But what if some object needs to go on multiple
lists? The whole scheme breaks down, and it's because we tried to use
inheritance to share implementation (the code for the forward and backward
pointers) instead of to share behavior. A much cleaner (although slightly
slower) approach would
be to define a list implementation that allocated forward/backward
pointers for each object that gets put on a list.
In sum, if two classes share at least some of the same member function
signatures -- that is, the same behavior, {\em and} if there's code that
only relies on the shared behavior, then there {\em may}
be a benefit to using inheritance. In Nachos, locks don't inherit from
semaphores, even though locks are implemented using semaphores. The
operations on semaphores and locks are different. Instead, inheritance is
only used for various kinds of lists (sorted, keyed, etc.),
and for different implementations of the physical disk abstraction,
to reflect whether the disk has a track buffer, etc. A disk is used
the same way whether or not it has a track buffer; the only difference is
in its performance characteristics.
\subsection{Templates}
Templates are another useful but dangerous concept in C++.
With templates, you can parameterize a class definition
with a {\em type}, to allow you to write generic type-independent
code. For example, our {\tt Stack} implementation above only worked
for pushing and popping {\em integers}; what if we wanted a stack
of characters, or floats, or pointers, or some arbitrary data structure?
In C++, this is pretty easy to do using templates:
\begin{verbatim}
template <class T>
class Stack {
public:
Stack(int sz); // Constructor: initialize variables, allocate space.
~Stack(); // Destructor: deallocate space allocated above.
void Push(T value); // Push an integer, checking for overflow.
bool Full(); // Returns TRUE if the stack is full, FALSE otherwise.
private:
int size; // The maximum capacity of the stack.
int top; // Index of the lowest unused position.
T *stack; // A pointer to an array that holds the contents.
};
\end{verbatim}
To define a template, we prepend the keyword {\tt template} to
the class definition, and we put the parameterized type for the
template in angle brackets. If we need to parameterize the implementation
with two or more types, it works just like an argument list:
{\tt template <class T, class S>}. We can use the type parameters
elsewhere in the definition, just like they were normal types.
When we provide the implementation for each of the member functions
in the class, we also have to declare them as templates, and again,
once we do that, we can use the type parameters just like normal types:
\begin{verbatim}
// template version of Stack::Stack
template <class T>
Stack<T>::Stack(int sz) {
size = sz;
top = 0;
stack = new T[size]; // Let's get an array of type T
}
// template version of Stack::Push
template <class T>
void
Stack<T>::Push(T value) {
ASSERT(!Full());
stack[top++] = value;
}
\end{verbatim}
Creating an object of a template class is similar to creating
a normal object:
\begin{verbatim}
void
test() {
Stack<int> s1(17);
Stack<char> *s2 = new Stack<char>(23);
s1.Push(5);
s2->Push('z');
delete s2;
}
\end{verbatim}
Everything operates as if we defined two classes, one
called {\tt Stack<int>} -- a stack of integers, and one
called {\tt Stack<char>} -- a stack of characters.
{\tt s1} behaves just like an instance of the first;
{\tt s2} behaves just like an instance of the second.
In fact, that is exactly how templates are typically implemented --
you get a complete {\em copy} of the code for the template
for each different instantiated type. In the above example,
we'd get one copy of the code for {\tt ints} and one copy for {\tt chars}.
So what's wrong with templates? You've all been taught to make
your code modular so that it can be re-usable, so {\em everything}
should be a template, right? Wrong.
The principal problem with templates is that they can be {\em very}
difficult to debug -- templates are easy to use if they work, but
finding a bug in them can be difficult. In part this is because
current generation C++ debuggers don't really understand templates
very well. Nevertheless, it is easier to debug a template than
two nearly identical implementations that differ only in their types.
So the best advice is -- don't make a class into a template
unless there really is a near term use for the template. And if you
do need to implement a template, implement and debug a non-template
version first. Once that is working, it won't be hard to convert
it to a template. Then all you have to worry about code
explosion -- e.g., your program's object code is now megabytes
because of the 15 copies of the hash table/list/... routines, one for
each kind of thing you want to put in a hash table/list/...
(Remember, you have an unhelpful compiler!)
\section{Features To Avoid Like the Plague}
Despite the length of this note, there are numerous
features in C++ that I haven't explained. I'm sure each feature
has its advocates, but despite programming in C and C++ for over 15
years, I haven't found a compelling reason to use them in any code
that I've written (outside of a programming language class!)
Indeed, there is a compelling reason to avoid using these features -- they are
easy to misuse, resulting in programs that are harder to read and understand
instead of easier to understand. In most cases, the features are also
redundant -- there are other ways of accomplishing the same end. Why have
two ways of doing the same thing? Why not stick with the simpler one?
I do not use any of the following features in Nachos.
If you use them, {\it caveat hacker}.
\begin{enumerate}
\item {\bf Multiple inheritance.} It is possible in C++ to define
a class as inheriting behavior from multiple classes (for instance,
a dog is both an animal and a furry thing). But if programs
using single inheritance can be difficult to untangle, programs
with multiple inheritance can get really confusing.
\item {\bf References.} Reference variables are rather hard to
understand in general; they play the same role as pointers, with
slightly different syntax (unfortunately, I'm not joking!)
Their most common use is to declare some parameters to a function
as {\it reference parameters}, as in Pascal. A call-by-reference
parameter can be modified by the calling function, without the callee
having to pass a pointer. The effect is that parameters look
(to the caller) like they are called by value (and therefore can't change),
but in fact can be transparently modified by the called function.
Obviously, this can be a source of obscure bugs, not to mention
that the semantics of references in C++ are in general not obvious.
\item {\bf Operator overloading.} C++ lets you redefine the meanings
of the operators (such as {\tt +} and \verb+>>+) for class objects.
This is dangerous at best ("exactly which implementation of '+' does
this refer to?"), and when used in non-intuitive ways, a
source of great confusion, made worse by the fact that C++ does
implicit type conversion, which can affect which operator
is invoked. Unfortunately, C++'s I/O facilities
make heavy use of operator overloading and references, so you
can't completely escape them, but think twice before you redefine
'+' to mean ``concatenate these two strings''.
\item {\bf Function overloading.} You can also define different functions
in a class with the same name but different argument types. This is also
dangerous (since it's easy to slip up and get the unintended version),
and we never use it. We will also avoid using default arguments (for the
same reason). Note that it can be a good idea to use the same name for
functions in different classes, provided they use the same
arguments and behave the same way -- a good example of this is that
most Nachos objects have a {\tt Print()} method.
\item {\bf Standard template library.} An ANSI standard has emerged for a
library of routines implementing such things as lists, hash tables,
etc., called the standard template library. Using such a library
should make programming much simpler if the data structure you need
is already provided in the library. Alas, the standard template
library pushes the envelope of legal C++, and so virtually no
compilers (including g++) can support it today. Not to mention that
it uses (big surprise!) references, operator overloading, and
function overloading.
\item {\bf Exceptions.} There are two ways to return an error from
a procedure. One is simple -- just define the procedure to return
an error code if it isn't able to do it's job. For example,
the standard library routine {\tt malloc} returns NULL if there
is no available memory. However, lots of programmers are lazy and
don't check error codes. So what's the solution? You might think
it would be to get programmers who aren't lazy, but no, the C++ solution
is to add a programming language construct! A procedure can
return an error by ``raising an exception'' which effectively
causes a {\tt goto} back up the execution stack to the last
place the programmer put an exception handler. You would think
this is too bizarre to be true, but unfortunately,
I'm not making this up.
\end{enumerate}
While I'm at it, there are a number of features of C that you also
should avoid, because they lead to bugs and make your code less easy
to understand. See Maguire's "Writing Solid Code" for a more complete
discussion of this issue. All of these features are legal C;
what's legal isn't necessarily good.
\begin{enumerate}
\item Pointer arithmetic. Runaway pointers are a principal source
of hard-to-find bugs in C programs, because the symptom of this happening
can be mangled data structures in a completely different part of the program.
Depending on exactly which objects are allocated on the heap in which
order, pointer bugs can appear and disappear, seemingly at random.
For example, {\tt printf} sometimes allocates memory on the heap,
which can change the addresses returned by all future calls to {\tt new}.
Thus, adding a {\tt printf} can change things so that a pointer
which used to (by happenstance) mangle a critical data structure
(such as the middle of a thread's execution stack), now overwrites memory
that may not even be used.
The best way to avoid runaway pointers is (no surprise) to be
{\em very} careful when using pointers. Instead of iterating
through an array with pointer arithmetic, use a separate index
variable, and assert that the index is never larger than the size
of the array. Optimizing compilers have gotten very good, so that the
generated machine code is likely to be the same in either case.
Even if you don't use pointer arithmetic, it's still easy
(easy is bad in this context!) to have an off-by-one errror
that causes your program to step beyond the end of an array.
How do you fix this? Define a class to contain the array
{\em and its length}; before allowing any access to the array,
you can then check whether the access is legal or in error.
\item Casts from integers to pointers and back. Another source
of runaway pointers is that C and C++ allow you to convert
integers to pointers, and back again. Needless to say, using a
random integer value as a pointer is likely to result in unpredictable
symptoms that will be very hard to track down.
In addition, on some 64 bit machines, such as the Alpha, it is
no longer the case that the size of an integer is the same as the
the size of a pointer. If you cast between pointers and integers,
you are also writing highly non-portable code.
\item Using bit shift in place of a multiply or divide.
This is a clarity issue. If you are doing arithmetic, use
arithmetic operators; if you are doing bit manipulation,
use bitwise operators. If I am trying to multiply by 8, which is
easier to understand, {\tt x << 3} or {\tt x * 8}? In the 70's,
when C was being developed, the former would yield more efficient
machine code, but today's compilers generate the same code in both
cases, so readability should be your primary concern.
\item Assignment inside conditional. Many programmers have the attitude
that simplicity equals saving as many keystrokes as possible.
The result can be to hide bugs that would otherwise be obvious.
For example:
\begin{verbatim}
if (x = y) {
...
\end{verbatim}
Was the intent really {\tt x == y}? After all, it's pretty easy
to mistakenly leave off the extra equals sign. By never using
assignment within a conditional, you can tell by code inspection
whether you've made a mistake.
\item Using {\tt \#define} when you could use {\tt enum}.
When a variable can hold one of a small number of values,
the original C practice was to use {\tt \#define} to set up
symbolic names for each of the values. {\tt enum} does this
in a type-safe way -- it allows the compiler to verify
that the variable is only assigned one of the enumerated values,
and none other. Again, the advantage is to eliminate a class of
errors from your program, making it quicker to debug.
\end{enumerate}
\newpage
\section{Style Guidelines}
Even if you follow the approach I've outlined above, it is still
as easy to write unreadable and undebuggable code in C++ as it
is in C, and perhaps easier, given the more powerful features the
language provides. For the Nachos project, and in general, we suggest
you adhere to the following guidelines (and tell us if you catch us
breaking them):
\begin{enumerate}
\item Words in a name are separated SmallTalk-style (i.e., capital
letters at the start of each new word). All class names and member
function names begin with a capital letter, except for member
functions of the form {\tt getSomething()} and {\tt setSomething()},
where {\tt Something} is a data element of the class (i.e., accessor
functions). Note that you would want to provide such functions only
when the data should be visible to the outside world, but you want to
force all accesses to go through one function. This is often a good
idea, since you might at some later time decide to compute the data
instead of storing it, for example.
\item All global functions should be capitalized,
except for {\tt main} and library
functions, which are kept lower-case for historical reasons.
\item Minimize the use of global variables. If you find yourself
using a lot of them, try and group some together in a class in a
natural way or pass them as arguments to the functions that need them
if you can.
\item Minimize the use of global functions (as opposed to member
functions). If you write a function that operates on some object,
consider making it a member function of that object.
\item For every class or set of related classes, create a separate
{\tt .h} file and {\tt .cc} file. The {\tt .h} file acts as the {\it
interface} to the class, and the {\tt .cc} file acts as the
{\it implementation} (a given {\tt .cc} file should {\tt include} it's
respective {\tt .h} file). If using a particular {\tt .h} file requires
another {\tt .h} file to be included (e.g., {\tt synch.h} needs
class definitions from {\tt thread.h}) you should include the dependency
in the {\tt .h} file, so that the user of your class doesn't have to
track down all the dependencies himself.
To protect against multiple inclusion, bracket each {\tt .h}
file with something like:
\begin{verbatim}
#ifndef STACK_H
#define STACK_H
class Stack { ... };
#endif
\end{verbatim}
Sometimes this will not be enough, and you will have a circular
dependency. For example, you might have a {\tt .h} file that
uses a definition from one {\tt .h} file, but also defines something
needed by that {\tt .h} file. In this case, you will have to do
something ad-hoc. One thing to realize is that you don't always
have to completely define a class before it is used. If you
only use a pointer to class {\tt Stack} and do not access any
member functions or data from the class, you can write, in lieu of
including {\tt stack.h}:
\begin{verbatim}
class Stack;
\end{verbatim}
This will tell the compiler all it
needs to know to deal with the pointer. In a few cases this won't work,
and you will have to move stuff around or alter your definitions.
\item Use {\tt ASSERT} statements liberally to check that your program
is behaving properly. An assertion is a condition that if
FALSE signifies that there is a bug in the program;
{\tt ASSERT} tests an expression and aborts if the condition is
false. We used {\tt ASSERT} above in {\tt Stack::Push()} to check
that the stack wasn't full. The idea is to catch errors as early
as possible, when they are easier to locate, instead of waiting until
there is a user-visible symptom of the error (such as a segmentation
fault, after memory has been trashed by a rogue pointer).
Assertions are particularly useful at the beginnings and ends of
procedures, to check that the procedure was called with the right
arguments, and that the procedure did what it is supposed to.
For example, at the beginning of List::Insert, you could assert that
the item being inserted isn't already on the list, and at the end of
the procedure, you could assert that the item is now on the list.
If speed is a concern, ASSERTs can be defined to make the check
in the debug version of your program, and to be a no-op in the production
version. But many people run with ASSERTs enabled even in production.
\item Write a module test for every module in your program.
Many programmers have the notion that testing code means running
the entire program on some sample input; if it doesn't crash, that
means it's working, right? Wrong. You have no way of knowing
how much code was exercised for the test. Let me urge you to
be methodical about testing. Before you put a new module
into a bigger system, make sure the module works as advertised
by testing it standalone. If you do this for every module,
then when you put the modules together, instead of {\em hoping}
that everything will work, you will {\em know} it will work.
Perhaps more importantly, module tests provide an opportunity
to find as many bugs as possible in a localized context.
Which is easier: finding a bug in a 100 line program, or in a
10000 line program?
\end{enumerate}
\section{Compiling and Debugging}
The Makefiles we will give you works only with the GNU version of
make, called ``gmake''. You may want
to put ``alias make gmake'' in your .cshrc file.
You should use {\bf gdb} to debug your program rather than {\bf dbx}.
Dbx doesn't know how to decipher C++ names, so you will see function
names like \verb+Run__9SchedulerP6Thread+.
On the other hand, in GDB (but not DBX) when you do a stack backtrace
when in a forked thread (in homework 1), after printing out the
correct frames at the top of the stack, the debugger will sometimes
go into a loop printing the lower-most frame ({\tt ThreadRoot}), and
you have to type control-C when it says ``more?''. If you understand
assembly language and can fix this, please let me know.
\section{Example: A Stack of Integers}
We've provided the complete, working code for the stack example. You should
read through it and play around with it to make sure you understand
the features of C++ described in this paper.
To compile the simple stack test, type {\tt make all} --
this will compile the simple stack test ({\tt stack.cc}),
the inherited stack test ({\tt inheritstack.cc}), and
the template version of stacks ({\tt templatestack.cc}).
\section{Epilogue}
I've argued in this note that you should avoid using certain C++
and C features. But you're probably thinking I must be leaving
something out -- if someone put the
feature in the language, there must be a good reason, right? I believe that
every programmer should strive to write code whose behavior would be
immediately obvious to a reader;
if you find yourself writing code that would require someone reading the code
to thumb through a manual in order to understand it, you are almost certainly
being way too subtle. There's probably a much simpler and more obvious
way to accomplish the same end. Maybe the code will be a little longer
that way,
but in the real world, it's whether the code works and how simple it is for
someone else to modify, that matters a whole lot more than how many
characters you had to type.
A final thought to remember:
\begin{quote}
``There are two ways of constructing a software design: one way is to
make it so simple that there are {\em obviously} no deficiencies and
the other way is to make it so complicated that there are no {\em
obvious} deficiencies.'' \\ \hbox{} \hfill C. A. R. Hoare, ``The Emperor's
Old Clothes'', CACM Feb. 1981
\end{quote}
\section{Further Reading}
\begin{itemize}
\item[] James Coplien, ``Advanced C++'', Addison-Wesley.
This book is only for experts, but it has some good ideas in it,
so keep it in mind once you've been programming in C++ for a few years.
\item[] James Gosling. ``The Java Language.'' Online at
``http://java.sun.com/'' Java is a safe subset of C++. It's main
application is the safe extension of Web browsers by allowing
you to download Java code as part of clicking on a link to
interpret and display the document. Safety is key here, since
after all, you don't want to click on a Web link and have
it download code that will crash your browser. Java was defined
independently of this document, but interestingly, it enforces a
very similar style (for example, no multiple inheritance and
no operator overloading).
\item[] C.A.R. Hoare, ``The Emperor's Old Clothes.''
{\em Communications of the ACM}, Vol. 24, No. 2, February 1981,
pp. 75-83. Tony Hoare's Turing Award lecture. How do you build
software that really works? Attitude is everything -- you need
a healthy respect for how hard it is to build working software.
It might seem that addding this whiz-bang feature is only
``a small matter of code'', but that's the path to late, buggy
products that don't work.
\item[] Brian Kernighan and Dennis Ritchie, ``The C Programming Language'',
Prentice-Hall. The original C book -- a very easy read. But the
language has evolved since it was first designed, and this book doesn't
describe all of C's newest features. But still the best place for
a beginner to start, even when learning C++.
\item[] Steve Maguire, ``Writing Solid Code'', Microsoft Press.
How to write bug-free software; I think this should be required
reading for all software engineers. This really {\em will} change
your life -- if you don't follow the recommendations in this book,
you'll probably never write code that completely works, and you'll
spend your entire life struggling with hard to find bugs.
There is a better way! Contrary to the programming language types,
this doesn't involve proving the correctness of your programs, whatever
that means. Instead, Maguire has a set of practical engineering
solutions to writing solid code.
\item[] Steve Maguire, ``Debugging the Development Process'', Microsoft Press.
Maguire's follow up book on how to lead an effective team, and
by the way, how to be an effective engineer. Maguire's background is
that he is a turnaround artist for Microsoft -- he gets assigned to
floundering teams, and figures out how to make them effective.
After you've pulled a few all-nighters to get that last bug out
of your course project, you're probably wondering why in heck you're
studying computer science anyway. This book will explain how
to write programs that work, {\em and} still have a life!
\item[] Scott Meyers, ``Effective C++''. This book describes how
50 easy ways to make mistakes C++; if you avoid these, you will
be a lot more likely to write C++ code that works.
\item[] Bjarne Stroustrup, ``The C++ Programming Language'', Addison-Wesley.
This should be the definite reference manual, but it isn't.
You probably thought I was joking when I said the C++ language was
continually evolving. I bought the second edition of this
book three years ago, and it is already out of date.
Fortunately, it's still OK for the subset of C++ that I use.
\end{itemize}
\end{document}