CS50 Final exam: January 16, 1997
Suggested Solutions


1. Understanding C Constructs (10 pts)

Answer

A

B

SAME
int d1, d2;
if (!(d1 || d2))
   return 0;
int d1, d2;
if (d1 == 0 && d2 == 0)
   return 0;
DIFF
i = 0;
while (s[i] != '\0')
   putchar(s[i++]);
i = 0;
do {
   putchar(s[i++]);
} while (s[i] != '\0');
DIFF
char *null = NULL;
char *null = "";
SAME
char *s = "%s",
     *t = "CS50";
printf(s, t);
printf("%s", "CS50");
SAME
int c1, c2;
c1 = c2 = getchar();
int c1, c2;
c2 = getchar(); c1 = c2;

2. True/False (10 Points)

True/False Statement
F The program counter (PC) in a RISC machine counts the number of instructions that have been executed by a program.
F Machines named mach1.bigco.com and mach2.bigco.com will be located within 500 meters of each other.
F sizeof(int *) == sizeof(int) in all cases.
F The IP address 128.103.50.55 is on a Class A network.
T The hexadecimal constant 0xdeadbeef is an odd number.
F The | operator is the same as the || operator except that it doesn't guarantee "short-circuit" evaluation of its operands.
F It is legal to copy and sell source code that you find on the Internet so long as you change all variable names, all comments, and all printf strings.
F A C program compiled by gcc on course1 will run on fas without recompilation.
F The Java programming language was invented before C++ was.
F strlen("\0") is 1.

3. Short Answer (50 points)

1.
Val binary Decimal
0x56 0101 0110 86
0x2E 0010 1110 46

2. List two different C errors that can cause a segmentation fault or bus error.

3.The following code may produce different output on different systems or with different compilers. Explain why.

        int i = 10;
        printf("%d %d %d", --i, i, ++i);

The order of evaluation of side effects like ++ and -- is not defined in C, so different compilers can validly give different answers.

4. What is the worst-case time complexity, i.e., O(?), of this tree comparison routine as a function of the number of nodes n in the larger tree. Explain why.

int tree_eq(tree_t *t1, tree_t *t2) { if (t1 == NULL || t2 == NULL) return (t1 == t2); else return (t1->val == t2->val && tree_eq(t1->left, t2->left) && tree_eq(t1->right, t2->right)); }

O(n) because it examines each node in a tree at most once.

5. Attorney Gene Nelson talked about the three main legal mechanisms for protecting intellectual property like software. Name two of the

patent, trade secret, copyright

6. Jim Waldo of Javasoft explained how several aspects of the Java language and environnment make it less prone to error than C is. List two of these.

7. (6 points) You are given the source code for a version of the standard library function strchr(s,c), which returns a pointer to the first instance of character c in string s, or NULL if c does not occur. Write 3 of the simplest combinations of s and c values that you could easily trace through by hand that would help you find errors before you compile the code. For each, state very briefly what case you are checking for.

8. A C string is represented as an array of bytes terminated by '\0'. Another possible representation would be to store the string length in the first byte and the data in subsequent bytes. List 1 advantage and 1 disadvantage of this alternate representation.

9. You have to ship one CD-ROM of data (600 megabytes) from the Science Center to another building that is a five minute walk away. You can either send the data over the network at a guaranteed 10 megabits per second or you can carry the CD-ROM to the recipient. Which is faster, and by how much? Ignore errors, overhead and startup time.

Ten megabits per second means 600 megabits per minute. Eight bits per byte, so 8 minutes to transmit 600 megabytes. Walking wins, by 3 minutes. Takes 480 seconds by network, 300 seconds on foot.

10. "Quicksort is an O(n log n) sort, so it is always faster than Selection Sort, which is an O(n^2) sort." List two ways in which this statement is inaccurate or imprecise.

Quicksort is O(n log n) on the average, but if badly implemented or if input is perverse, it can be O(n^2). Selection sort might be faster for small n because it has lower overhead.

11. How many times will the body of this loop be executed, and why?

        int i;
        for (i = 0; i <= 1; i += 0.1) ...

infinite, because i is an integer, and the incremented value is truncated.

12. If you send e-mail to a friend at Princeton from fas, what Internet protocols will be used? She has a PC and an Ethernet connection in her room.

SMTP, TCP, IP, or POP. Telnet is not correct. Neither is sendmail.


4. Debugging (15 points)

This C function is supposed to read non-negative integer values from the keyboard until a negative value is entered, then print the maximum, minimum and count of data values, if any. Unfortunately, it contains 5 serious errors of C or logic. Find the errors and indi cate how you would fix each one. Be brief but specific; writing correct code in place of wrong code is fine.
void minmax(void)
{
        int count, min, max = -1;

        for (count = 0; (val = GetInteger()) > 0; count++) {
            if (val > max)
                max = val;
            else if (count == 0 || val < min)
                min = val;
        }
        if (count > 0)
            fprintf("max %d, min %d, count %d\n",
                    max, min, count);
        return 0;
}


5. Coding Reading (20 points)

The functions m1 and m2 below implement a version of a familiar algorithm. A sample input and call might look like this:
    char *pets = "dogcatratgnu";
    m1(pets, 4, 3);
Hint: look for familiar patterns in the code before trying to step through it.
1)    void m1(char *a, int n, int width) {
2)         char *pi, *pj, *pn;
3)
4)         if (n <= 1) return;
5)         m2(a, a + (rand() % n) * width, width);
6)         pn = a + n*width;
7)         for (pi = a+width, pj = a; pi < pn; pi += width)
8)              if (m3(pi, a, width) <= 0)
9)                         m2(pj += width, pi, width);
10)         m2(a, pj, width);
11)         for (pi = pj-width; pi >= a && m3(pi,pj,width) == 0; )
12)              pi -= width;
13)         m1(a, (pi-a)/width + 1, width);
14)         m1(pj+width, (pn-pj)/width - 1, width);
15)   }
16)
17)   void m2(char *i, char *j, int n) {
18)        char c;
19)        while (n-- > 0) {
20)             c = *i;
21)             *i++ = *j;
22)             *j++ = c;
23)        }
24)   }

A.(3 points) What algorithm does this code implement? quicksort

B. (4 points) What precisely does function m2 do?

swap: interchanges n bytes pointed to by i with n bytes pointed to by j

C.(3 points) What must be the role of the function m3, which is not shown here?

compare function: returns <=0, 0, >0 depending on whether data pointed at by pi is <, ==, > data pointed at by pj.

D. (4 points) What is the purpose of line 5?

picks a random element for partitioning or pivoting, swaps it to front

E. (6 points) What is the purpose of lines 11-12?

pj is the pivot. this decrements pi down past elements that are equal to the pivot value so they won't be included in subsequent partition steps.


6. Implementation (35 points)

A. (12 points) Write a version of the standard library function strncat:
char *strncat(char *s, char *t, int n)
The function strncat(s,t,n) concatenates at most n characters of the string t to the end of the string s, and returns s. The strings s and t are assumed to be non-NULL and s points to enough memory to hold the result. The resulting string contains the charac ters of s, at most n characters of t, and a '\0'. Write strncat. You may use other standard library functions but not Roberts functions.
char *strncat(char *s, char *t, int n)
{
        int i, j;

        for (i = 0; s[i] != '\0'; i++)        /* find end of s */
                ;

        for (j = 0; j < n && t[j] != '\0'; )  /* copy up to n chars of t */
                s[i++] = t[j++];
 
        s[i] = '\0';                          /* terminate s properly */
        return s;
}

B. (12 points) Write a list-append function lappend.

The function lappend(ls1, ls2) appends list ls2 to the end of list ls1 and returns the resulting list. The prototype is List_t *lappend(List_t *ls1, List_t *ls2) and the List_t declaration is

        typedef struct _List {
                int val;
                struct _List *next;
        } List_t;
Write a version of lappend.
List_t *lappend(List_t *ls1, List_t *ls2) 
{
        List_t *c;

        if (ls1 == NULL)                              /* handle empty list */
                return ls2;
        for (c = ls1; c->next != NULL; c = c->next)   /* find end of ls1 */
                ;
        c->next = ls2;                                /* link in ls2 */

        return ls1;
}
C. (11 points) ANT Programming

Suppose you are translating a large C program into ANT assembly language, and you encounter this statement:

	nl = (ch == '\n');
Write a sequence of ANT assembly language instructions that is equivalent to the C statement. The variables nl and ch are integers; assume that they are already declared in the _data_ section. Document your register usage and comment the code clearly but briefly. This should take approximately 10-12 instructions. (There is a brief ANT reference manual on the last page of this exam if you need it; you may tear it off.)


# r2  - ch
# r3  - '\n' character
# r4  - address of "done" label
# r14 - n1
# r15 - 1
#
        lc    r2, $ch
        ld    r2, r2, r0            # r2 = ch
        lc    r3, '\n'
        lc    r15, 1                # assume result is 1
        lc    r4, $done
        beq   r4, r2, r3            # skip if ch == '\n'
        lc    r15, 0                # result is 0
done:   lc    r14, $nl
        st    r15, r14, r0          # nl = ch == '\n'
The code segment needs commenting of registers usage to get full credit.

7. Design (40 points)

Imagine yourself designing Version 1.0 of oops, a simple program for detecting spelling mistakes in a document. A document will be plain ASCII text stored in one or more files; it may be arbitrarily long. The output of oops is a sorted list of words from the document that might be spelling mistakes, each with a count of how many times it occurred in the document.

The oops program uses a dictionary file that contains approximately 100,000 correctly- spelled words, one per line, in alphabetic order. (The dictionary is small enough to store in memory.) An oops user may also optionally give the name of a small file of no more than 25 words that are by definition not spelling mistakes; this "ok-words" file is not sorted. Thus oops might be invoked by a command line like this: oops -k myokwords doc1 doc2 doc3 Version 1.0 of the program defines a spelling mistake to be a word that appears in the document but does not appear in the dictionary or in the ok-words, after capitalization has been removed. Thus the basic program structure is

read the dictionary; read the ok-words if requested

check each word in the document to see if it's in the dictionary or ok-words

print the misspelled words and their counts in alphabetical order

Your task is to design this program precisely enough that one of your classmates could implement the program from your design. You can express your design in any mixture of English and C that is precise enough to explain the processes, algorithms, and data structures you use; you are not constrained to follow the syntactic rules of C.

You should choose algorithms and data structures that are appropriate for the task. They should be reasonably economical of time and space: the average time and space requirements should make sense for the sizes of the various inputs and outputs.

A. Data Structure Design (21 points total): For each of (1) the dictionary, (2) the ok-words, and (3) the misspelled words, briefly answer the following:

(1) Big dictionary (7 points): What data structure, what algorithm, why?

The key design issue is to find a structure that can be quickly (efficiently) searched for words, since that is the primary function, and easily constructed.

Using an array of pointers to characters (as in 800.c) is one of the fastest and easiest solutions to this "static dictionary problem."

The algorithm for constructing the data structure is as per 800.c. The dictionary is opened and the number of words is counted in the first pass through the file. In the second file, each word is read, and strsave is used to fashion a copy of the word in the correct place in the array. Note that the dictionary is already sorted and requires no other processing.

This structure is an appropriate one for this problem since the spell checker is required to lookup each word of the input in the dictionary. Just as in 800.c, this can be accomplished using a binary search, which takes O(log n) time per lookup. Compare this to O(n) for a linked list. Creating a binary tree is tricky since the "naive" approach would result in a degenerate linked list (since the input is sorted). Rebalancing from the file is tricky because we must keep track of the start of each line somehow to avoid having to "Read" into the middle of the file each time we wanted to get the "sub-tree root node."

There are clearly more efficient ways to approach this problem as well. For example, using a double hashing scheme, we can create a large array, hash each word into a bin, and hash all of the words that hit a particular bin into a second, smaller table. This algorithm gives us O(1) search. In fact, it takes only 5 memory reads and a few multiplies to perform one lookup. We did not talk about hash trees, and so this is answer was not expected.

(2) OK-words (7 points): What data structure, what algorithm, why?

There are several approaches to this problem. First, we could use the solution above for the same reasons. However, since the list of words is only 25 elements, we need not count them-- it suffices to just keep a static array of 25 pointers to characters. Keep in mind that we would have to sort the array is we intended on using a binary search as in the first problem.

However, this brings to mind the fact that there are only 25 words. Hence, we could just as easily implement a linked list here, and use a linear search, which would take at most 25 string compares-- a constant added to O(log n) from above.

A quicker approach would be to use a binary tree. If assume that the words in the OK list are in some random (unsorted) order, then the construction of the tree is trivial (via algorithm from lecture), and searches will be < O(log 25), a smaller constant. The critical observation is that the list is small.

(3) Misspelled words (7 points): What data structure, what algorithm, why?

The data structure must support fast inserts, and it would be nice if we could easily print the list out in alphabetical order. We also need to keep a track of the words frequency, so we will create a struct that has a string and an int. We do not know the number of misspelled words, so we must use a dynamic structure.

A good one would be a binary tree. We can assume that the misspelled words will arrive in a random order, and hence we can achieve a O(log m) insert time, where m is the number of misspelled words in the document. An inorder traversal algorithm yields the alphabetized list.

If we assume that the number of misspelled words is on average, very small, then we can use a linked list. However, we must make this assumption clear. Without it, our list could turn into a horrifyingly long structure. Each insert would take O(m) time to insert, and this leads to O(m^2) behavior for the "task" (again, nothing that 1+2+3...+m = m(m+1)/2).

B. Program Design (19 points):

The main function for oops must (either itself or by calling functions) process the document files, looking up words in one or two dictionaries, and recording misspelled words. Specify main and any functions it calls to handle these tasks; be sure to include all significant processing steps. For each function, you should clearly indicate inputs and outputs and briefly describe what steps it does. Assume that the data structures from part A are all global variables, and ignore error conditions like files that can't be opened.

Assume that you are given a function char *getstr(FILE *) that returns the next non-blank string of characters from the input, or NULL if the end of the file is reached. You can also assume that you have suitable lookup and insert functions for the data structures in part A.

To help you, we have written part of main in approximately the style and level of detail that we are looking for. You should expand the part indicated. Again, we are looking for a clear description of a hierarchical decomposition, not detailed C code.

    Dict_t dict                              big dictionary
    OK_t okwords                             ok words
    Misspell_t misspell                      misspelled words

    main(argc, argv)
                read dictionary file into dict data structure
                if ok-words file is provided
                        read ok-words file into okwords data structure
                
                /* your design goes here */

                print misspell data structure in alphabetic order, with counts

This part should not be too long: no matter how you divide things up, it should take no more than 20-25 lines.

Here is a simple approach:

        for each input file argument
                f = fopen(filename)
                while ((w = getstr(f)) != NULL)
                        w1 = remove punctuation(w)
                        if (mistake(w1))
                                insert(w1, misspell)

remove punctuation(w)
        start = first alphabetic
        end = last alphabetic
        return substring (start..end)


mistake(w):
        convert w to lower case
        if lookup(w, dict)
                return 0
        if ok-words were requested and lookup(w, okwords))
                return 0 
        return 1
Note that the string returned by getstr does not take into consideration punctuation marks left at the end of words. Also, we must convert each of the words to all-lower case before we do the lookups in the respective lists. However, note that the word that we insert into the misspelled list is the original word with punctuation removed, but case intact.
shelat@fas