6: String Processing

What We Will Cover


Continuations

Questions from last class?

  • Exam reminder
  • Special guest: Denise Moss
  • What will be printed after the following C++ statements have executed?
    int count = 1;
    while (count <= 3) {
        cout << count << " ";
        count++;
    }
    
    1. 1 2
    2. 1 2 3
    3. 2 3
    4. 1 2 3 4

Homework Questions?

6.1: More About Strings and Characters

Learner Outcomes

At the end of the lesson the student will be able to:

  • Iterate through a string and extract each character
  • Convert characters to digits
  • Use string functions

6.1.1: Strings Versus Characters

  • Remember that a string is a series of characters enclosed in double quotes such as:
    "Hello"  "b"  "3.14159"  "$3.95"  "My name is Ed"
  • We can store text in a variable of type string, like:
    string firstName;             // definition
    firstName = "Edward";         // assignment
    string lastName = "Parrish";  // definition + assignment
    cout << firstName << " " << lastName << endl;
    
  • On the other hand, a character is a single letter, number or special symbol
  • We enclose characters in a single quote, rather than a double quote, like:
    'a'   'b'   'Z'   '3'   'q'   '$'   '*'
  • To store a single character, we use a variable of type char, such as:
    char letterA = 'A';
    char letterB = 'B';
    
  • Each character is stored as a number, using its ASCII Table value
  • By declaring a char variable, or by using single quotes, C++ knows to treat the number as a character
  • Thus, when we print a character, we see a letter rather than a number:
    char letter = 'A';
    cout << letter << 'B' << endl;
    
  • As we can see, a string is made up of characters and characters are numerical codes
  • We can use this information to work with characters and strings

String Concatenation and Functions

  • Recall that we can join (concatenate) two strings or a string with a character
    string str = "abc";
    str = str + "1"; // allowed
    str = str + '1'; // allowed
    str = str + 1;   // NO
    str = str + 1.2; // NO
    
  • However, we cannot concatenate a string with a number
  • Because strings are objects, they have member functions
  • Two useful member functions we have studied are length() and substr()
  • length(): Returns the number of characters in a string
    string str = "Hello";
    cout << "The number of characters is " << str.length()
         << ".\n";
    
  • substr(i, n): Returns a substring of length n starting at index i
    string greeting = "Hello, World!\n";
    string sub = greeting.substr(0, 4);
    cout << sub << endl;
    
  • The position numbers in a string start at 0. The last character is always one less than the length of the string
    H e l l o , W o r l d !
    0 1 2 3 4 5 6 7 8 9 10 11 12
  • string w = greeting.substr(7, 5);
    H e l l o , W o r l d !
    0 1 2 3 4 5 6 7 8 9 10 11 12

Check Yourself

  1. True or false: strings are a sequence of characters.
  2. True or false: "A" and 'A' are the same.
  3. The following code is wrong because ________.
    cout << "3.14159" * 2;
    
    1. you cannot double PI
    2. 3.14159 is not exact enough to represent PI
    3. strings may be added but not multiplied
    4. "3.14159" is not a number
  4. After the following code executes, it displays ________.
    char ch;
    ch = 'd' - 'a' + 'A';
    cout << ch << endl;
    
    1. 'D'
    2. D
    3. 68
    4. d
  5. When placed after the starter code, which of the following will compile in C++-11?
    string str = "abc"; // starter code
    
    1. str = str + "1";
    2. str = str + '1';
    3. str = str + 1;
    4. str = str + 1.2;

6.1.2: Indexing a String

  • Strings are stored in a character sequence starting at 0 (zero)

    String character positions

  • We can access any individual character of a string variable using square brackets [ ]
  • The general syntax is:
    stringVariable[index];
    
  • Where:
    • stringVariable: the name of your string variable
    • index: the number of the character position
  • For example:
    string str = "Hello";
    char firstLetter = str[0];
    cout << firstLetter << str[1] << endl;
    
  • The above code displays:
    He
  • Notice that the square bracket notation returns a char data type

Check Yourself

  1. For the following string definition, answer the questions below:
    string str = "C++ Rules!";
    1. The value of str[0] is: ________
    2. The value of str[2] is: ________
    3. The value of str[4] is: ________
    4. The value of str[str.length() - 1] is: ________
  2. True or false: To extract a character from a string use square brackets [].
  3. True or false: To extract a string from a string use the substr() function.

6.1.3: Iterating Strings

  • Recall that member function length() returns the number of characters in a string variable:
    string s = "abcdef";
    unsigned n = s.length();
    
  • Since a string's length is always 0 or a positive number, the length() function returns an unsigned int type
  • After we know the length, it is easy to iterate through the individual characters of a string using a counting loop:
    cout << "Enter a word: ";
    string msg;
    cin >> msg;
    for (unsigned i = 0; i < msg.length(); i++) {
        cout << "Char[" << i << "]: " << msg[i] << endl;
    }
    

Using unsigned

  • Note the use of unsigned i in the for loop
  • Specifying unsigned assumes unsigned int by default
  • So rather than coding unsigned int we may just code unsigned
  • Recall from lesson 3.1.3 that unsigned ranges from 0 to 4294967295 rather than -2147483647 to 2147483647 for int
  • The length() function returns an unsigned number because the length of a string is never less than zero
  • If you compare a signed number with an unsigned number, the compiler may issue a warning:

    warning: comparison between signed and unsigned integer expressions

  • By using unsigned as the type for the counting variable in the for loop, we avoid the above warning

Try It: iterating Strings (4m)

  1. Copy the following program into a text editor, save it as test.cpp, and then compile and run the starter program to make sure you copied it correctly.
    #include <iostream>
    using namespace std;
    
    int main() {
        // Enter your code here
    
        return 0;
    }
    
  2. Add the code to prompt for and read a messages from the user:
    cout << "Enter a word: ";
    string msg;
    cin >> msg;
    
  3. Next add the following for-loop code to the main() function.
    for (unsigned int i = 0; i < msg.length(); i++) {
        cout << i << ": " << msg[i] << endl;
    }
    
  4. Compile and run your code. What do you see when you compile?
  5. Be prepared to answer the following Check Yourself questions when called upon.

Check Yourself

  1. True or false: the length() function of a string returns an unsigned integer.
  2. For the following code, the output the second time through the loop is ________
    string msg = "aeiou";
    for (unsigned i = 0; i < msg.length(); i++) {
        cout << "Char[" << i << "]: " << msg[i] << endl;
    }
  3. True or false: the compiler may give a warning if you compare an unsigned int with a signed int.
  4. Each character in the above loop is printed on it own line because of the ________.

6.1.4: String Input With Spaces

  • We have been using the >> operator to enter data into a string variable:
    string something;
    cout << "Enter something: ";
    cin >> something;
    cout << "You entered: " << something << "END OF OUTPUT\n";
    
  • However, there are some complications
  • >> skips whitespace and stops on encountering more whitespace
  • Thus, we only get a single word for each input variable
  • If a user types in "Hello Mom!", we would only read "Hello" and not " Mom!"
  • This is because cin >> s1 works as follows:
    1. Skips whitespace
    2. Reads non-whitespace characters into the variable
    3. Stops reading when whitespace is found

Input Using getline()

  • To read an entire line we use function getline()
  • Syntax:
    getline(cin, stringVariable);
    
  • Where:
    • stringVariable: the name of the string variable
  • For example:
    string line;
    cout << "Enter a line of input:\n";
    getline(cin, line);
    cout << line << "END OF OUTPUT\n";
    
  • Note that getline() stops reading when it encounters a '\n'

The Problem with Newlines

  • When you press the Enter key, a newline character ('\n') is inserted as part of the input
  • The newline character can cause problems when you mix cin >> with getline()
  • Recall that cin >> s1:
    1. Skips whitespace
    2. Reads non-whitespace characters into the variable
    3. Stops reading when whitespace is found
  • Since whitespace includes newline characters, using cin >> will leave a newline character in the input stream
  • However, getline() just stops reading when it first finds a newline character
  • This can lead to mysterious results in code like the following:
    cout << "Enter your age: ";
    int age;
    cin >> age;
    cout << "Enter your full name: ";
    string name;
    getline(cin, name);
    cout << "Your age: " << age << endl
         << "Your full name: " << name << endl;
    
  • To correct this problem we use cin >> ws just before getline()
    cin >> ws; // clear whitespace from input stream
    
  • We can see how to use this fix in the following example

Example Using cin >> ws

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <iostream>
using namespace std;

int main() {
    cout << "Enter your age: ";
    int age;
    cin >> age;
    cout << "Enter your full name: ";
    string name;
    cin >> ws; // clear whitespace from buffer
    getline(cin, name);
    cout << "Your age: " << age << endl
         << "Your full name: " << name << endl;
}

Check Yourself

  1. True or false: Using the >> operator with string variables only reads one word at a time.
  2. To read strings containing multiple words use the ________ function.
  3. True or false: before you switch from using the >> operator to using getline(), you must clear the next newline character from the input buffer.
  4. To clear whitespace from the input buffer use: ________.
  5. The following code has a problem.
    1  int age;
    2  string name;
    3  cout << "Enter your age: ";
    4  cin >> age;
    5  cout << "Enter your full name: ";
    6  getline(cin, name);
    
    To fix the problem, we insert the following statement between lines ________.
    cin >> ws;
    
    1. 1 and 2
    2. 2 and 3
    3. 3 and 4
    4. 4 and 5

6.1.5: Processing Text Input

  • Sometimes we need to read input as words and sometimes as lines
  • To input a sequence of words, use the loop:
    string word;
    while (cin >> word) {
       // process word
       cout << word << endl;
    }
    
  • cin >> word is the same test as cin.good() (see lesson 5.3.7)
  • To process input one line at a time, use the getline() function
    string line;
    while (getline(cin, line)) {
       // process line
       cout << line << endl;
    }
    
  • getline(cin, line) returns true as long as there is input remaining
  • The following example processes text input by counting words
  • When reading input in the while test, you need to close the stream using:
    • Windows: Ctrl+Z
    • Linux,OS-X: Ctrl+D
  • Closing the stream acts as a sentinel value for the loop
  • When the stream fails the loop exits

Example Program that Counts Words in a File

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>
#include <string>

using namespace std;

int main() {
    cout << "Enter a phrase followed by the Enter"
         << " key and Ctrl-Z/D.\n";
    string word;
    int count = 0;
    while (cin >> word) {
        count++;
    }
    cout << "Number of words: " << count << endl;

    return 0;
}

Redirection of Input and Output

  • We could use the above program by typing words at the command line
  • However, that quickly gets tedious
  • A better way is to use redirection of input (see textbook page 154)
  • The command line interfaces of most operating systems have a way to link a file to the input of a program
  • The content of the file gets fed into the program as if all the characters had been typed by a user
  • For example, after compiling the above program we type something like the following at the command line:
    ./words < input.txt
    
  • Where input.txt is the text file on which we want to count words
  • You can redirect program output to a file as well using something like:
    ./words > output.txt
    
  • You can combine input and output redirection in one command:
    ./words < input.txt > output.txt
    

Check Yourself

  1. True or false: the following code reads input one word at a time.
    string str;
    while (cin >> str) {
       cout << str << endl;
    }
    
  2. True or false: the following code reads input one line at a time.
    string str;
    while (getline(cin, str)) {
       cout << str << endl;
    }
    
  3. To close the cin input stream use the Ctrl key plus the ________ key.
  4. True or false: most operating systems let you redirect input and output at the command line.

Exercise 6.1: Finding Words (5m)

In this exercise we write code to find words in a text file. Compile and test after each step to verify your work.

Specifications

  1. Copy the following program into a text editor, save it to the home folder of Cygwin or your Terminal window as findword.cpp, and then compile and run the starter program to make sure you copied it correctly.
    #include <iostream>
    using namespace std;
    
    int main() {
        // Enter your code here
    
        return 0;
    }
    
  2. Inside main(), declare both a string variable named word and an integer variable named count, like:
    string word;
    int count = 0;
    
  3. Add a while loop to read one word at a time from cin, like:
    while (cin >> word) {
        // Add if statements here
    }
    
  4. Inside the while loop write code to add one to the count variable.
    count++;
    
  5. Add two if-statements, one to test for the word "Shazam" and one to test for the word "bogus", reporting the word count where the word was found. For example:
    if (word == "Shazam") {
        cout << "Shazam is word " << count << endl;
    }
    
  6. Test your program by saving the words.txt file into the home folder of Cygwin or your Terminal window.

    words.txt

  7. Run the program from the command line using input redirection:
    ./findword < words.txt
    
  8. Save your program source code to submit to Canvas as part of assignment 6.

Finding Words

Code to process strings in a loop

As time permits, read the following sections and be prepared to answer the Check Yourself questions in the section: 6.1.6: Summary.

6.1.6: Summary

  • A string is a series of characters enclosed in double quotes
  • We can store text in a variable of type string, like:
    string s1 = "Hello Mom!";
  • A character is a single letter, number or special symbol
  • We can store a a single character using a variable of type char, such as:
    char letterA = 'A';
    char letterB = 'B';
    
  • Each character is stored as a number, using its ASCII code
  • Strings are stored in a character sequence starting at 0 (zero)

    String character position

  • We can access individual characters of a string using []
  • Strings are a special type of variable called objects, just like a Turtle
  • Because a string is an object, it has member functions
  • We can iterate through a string using a loop and the length() member function:
    string s = "abcdef";
    for (unsigned i = 0; i < s.length(); i++) {
        cout << "Char[" << i << "]: " << s[i] << endl;
    }
    
  • To read an entire line, you need to use the getline() function:
    getline(cin, line);
  • Sometimes cin >> can leave a '\n' character in the input stream
  • To get around this problem you can use cin >> ws before getline()
    cin >> ws; // clear whitespace from buffer
    

Check Yourself

Answer these questions to check your understanding. You can find more information by following the links after the question.

  1. String are enclosed in double quotes. What type of quote marks enclose characters? (6.1.1)
  2. The characters of a string variable can be accessed using what brackets? (6.1.2)
  3. The leftmost character of a string is accessed using which index number? (6.1.2)
  4. To print the following string vertically down the page, what code do you write? (6.1.3)
    string str = "Hi mom!";
  5. To convert the following char variable to a number, what code do you write? (6.1.4)
    char ch = '7';
  6. What is the value of the expression: 'd' - 'a' + 'A'? (6.1.4)
  7. To convert the following string variable to a number, what code do you write? (6.1.4)
    string str = "7";
  8. How many words can you enter with the following code? (6.1.5)
    string something;
    cout << "Enter something: ";
    cin >> something;
    cout << "You entered: " << something << endl;
    
  9. How can you change the previous code to read a string that includes spaces? (6.1.5)
  10. What code can you use to clear newlines and other whitespace from the input stream? (6.1.5)

6.2: Preparing for Midterm 1

Learner Outcomes

At the end of the lesson the student will be able to:

  • Review possible exam questions
  • Know the study aids available for the exam

6.2.1: Reviewing the Exam Topics

  • Remember that we each chose a test topic last class meeting
  • Your homework, due two days before the exam, was to develop at least 5 possible test questions for your selected topic
  • We will break into groups and review the questions

Test Review Exercise (10m)

  • Within your group, review each set of test questions for accuracy and testing efficacy
  • Discuss ways to improve test questions and update postings in Canvas
  • Choose one excellent question to review with the entire class.
  • Write the question on the board.
Information + Practice are like Yin and Yang

Information + Practice

6.2.2: Practicing for an Exam

  • One of the purposes of an exam is to give us a chance to review the material we have covered
  • Practicing for an exam is important to doing our best
  • We do not want to wait until the night before to prepare
  • To help us practice we will review some problems during class
  • Treat the practice like homework -- because it is!
  • Anything not completed in class must be completed at home and turned in before the test

Midterm 1 Review Supplements

6.2.3: Summary of Test Preparation Materials

  • To help your understanding of how the midterm will operate, I have provided a practice exam in Canvas
  • The questions are intended to help you get a "feel" for taking an exam in Canvas
  • The questions are NOT intended to tell you everything that is on the exam
  • Suggestion: prepare first and then try the practice exam
  • This will give you a better understanding of how much more preparation you need

Summary of Test Preparation Resources

For a complete list see lesson 5.4.3: Test Preparation Resources.

  • Practice midterm
  • Review worksheet turned in as part of A5-Midterm 1 Preparation
  • Reviewing student-prepared study questions (in Canvas)
  • Reviewing lecture notes
  • Reviewing programming projects
  • Reviewing CodeLab questions

Wrap Up

Due Next:
A5-Midterm 1 Preparation (10/4/18)
Quiz 6 (Canvas) (10/9/18)
A6-Loopy Programs (10/11/18)
  • When class is over, please shut down your computer
  • You may complete unfinished exercises at the end of the class or at any time before the next class.
Last Updated: November 16 2018 @18:04:32