5. Repetition
Q-Tip: You on point, Phife?
Phife Dawg: All the time, Tip.
Q-Tip: You on point, Phife?
Phife Dawg: All the time, Tip.
Q-Tip: You on point, Phife?
Phife Dawg: All the time, Tip.
Q-Tip: Well, then grab the microphone and let your words rip.
5.1. Problem: DNA searching
The world of bioinformatics is the intersection between biology and computer science. Sequencing genomes, determining the function of specific genes, the analysis and prediction of protein structures, and biomedical imaging are just a few of the areas under the umbrella of bioinformatics. Much fascinating research is being done in this area as biologists become better programmers and computer scientists apply their techniques to biology.
Because of its fundamental importance and the incredible amount of information involved, with tens or hundreds of millions of base pairs of DNA in each human chromosome, DNA is a central focus of bioinformatics. As you may know, a DNA strand is made up of a sequence of four nucleotide bases: adenine, cytosine, guanine, and thymine. These bases are usually abbreviated as A, C, G, and T, respectively.
Searching for a specific DNA subsequence within a larger sequence is a common task for biologists to perform. Your goal is to write a program that will search for a subsequence and report how many times it was found within the sequence. For example, if you are given the sequence ATTAGACCATATA and asked to search for CAT, your program should output 1, since there is exactly 1 occurrence of CAT within ATTAGACCATATA. One feature of this problem that makes it more interesting is that occurrences can overlap. For example, given the sequence TATTATTAGATTA and asked to search for TATTA, the correct answer is 2. The sequence begins with a TATTA, but the third T in the sequence is also the first T in a second instance of TATTA.
In Chapter 4, you learned tools that allow you to do
comparisons and make choices based on the results. These tools will
become even more useful. For example, when you come across a char
in a
sequence, you know how to compare it to a char
in the subsequence you
are searching for. The tools you do not yet have are those that allow
repetition. Because this problem requires the program to process a DNA
sequence of arbitrary length, we will need some way to perform an
action repeatedly.
5.2. Concepts: Repetition
You now know how to write choices into a Java program, but so far, each choice can only be made once. So if you want the computer to do a lot of things, you have to type a lot of things. One of the big disadvantages of computers is that they have no intelligence: They can follow instructions blindly, but they can’t do anything else. One of the big advantages of computers is that they’re fast. Modern computers can perform mathematical operations billions of times faster than human beings. To take advantage of this speed, we need to give computers instructions to perform tasks over and over. Such an instruction must have two components to be useful: It must have a way to change the task slightly each time so that each task accomplishes something different. It must also have a way to decide when to stop, otherwise it will continue forever.
The first component is the more subtle one. Crafting a set of instructions so that each repetition of the task is appropriate will be different for every problem. The second component is easier to describe: We’re going to rely on Boolean logic, just as we did for conditional statements. The main tool for repetition in Java and many other languages is called a loop. The body of a loop contains the task to be performed. The rest of the loop, at the very least, contains a condition. Every time the task given in the body of the loop completes, the computer will check the condition. If the condition is true, it’ll do the task again. If the condition is false, the computer is done with the loop and can move on to the code that comes afterward.
One of the difficulties of programming a computer is that we must be very explicit. Even the most obvious tasks must be spelled out in meticulous detail. Let’s consider a simple task, one that we perform every day. If we’re in a room and we want to leave, we simply walk out the nearest door. Assuming there is only one door in the room, how can we describe this process by breaking it down into the steps we (literally) take? Perhaps we could say the following.
Walk toward the door until you reach it.
This statement is a little more specific than Leave the room, but it doesn’t conform nicely to the paradigm of a loop, that is, a clearly separated task and a condition. The following is better.
While you’re not at the door,
take a step toward the door.
Now we have good separation between the work done and the condition for repeating. What’s the task performed in the body of this loop, and what’s the condition? The task is taking a step toward the door. The condition is not being at the door. It seems a little awkward to include that “not,” but in our definition of loops, the body is executed as long as the condition is true.
In a loop, we call each execution of the body of the loop an iteration. When we say that a program iterates over the statements in a loop, we are referring to a single pass through the body of a loop. In this case, the loop will iterate however many times there are steps to the door from the starting position. It is even possible that the loop will iterate zero times: The person following this set of instructions might already be at the door!
It’s hard to get away from numbers in a computer program, especially since everything is fundamentally stored as numbers inside of a computer. So, the most common kind of loop is one that iterates a fixed number of times. For example, your morning exercise routine might include jumping rope 100 times. We could formulate a loop to do that like so.
Set your counter to 0.
While your counter is less than 100,
jump rope.
Increase your counter by 1.
This loop requires set up to start the counter at the right value. Then, the work done by the loop is the actual rope jumping and the counter increment. The condition is the counter being less than 100. Note that this is strictly less than 100. After the first jump, the counter will be incremented to 1. After the 100th jump, the counter will be incremented to 100. Since 100 is not less than 100, the loop will exit. If the condition was the counter being less than or equal to 100, the person following the instructions would jump 101 times.
Input can also be a factor in loop repetitions. For example, you might
be a soldier training in the U.S. Marine Corps. Perhaps your drill
sergeant has commanded you to do push-ups until he says you can stop. We
might formulate a loop to do this as follows.
Do:
Push-up.
Ask the drill sergeant if you can stop.
While the answer is “no.”
As is the case with user input, you must often go into the loop at least once to get the input. This loop requires the soldier to do at least one push-up before asking to stop. Some systems might use input but have other constraints. A more realistic version of this loop might be the following.
Do:
Push-up.
Ask the drill sergeant if you can stop.
While the answer is “no” and you haven’t collapsed.
Remember, the condition for a loop should be a Boolean, and the loop runs as long as the condition is true. However, there is no reason why the Boolean can’t be a complicated expression using all the Boolean logic we have come to know and love.
It’s also possible to nest loops. Nesting loops means putting one loop inside of another, similar to the way that conditional statements could be nested inside of other conditional statements. Just like any other statement, an inner loop will be run as many times as the outer loop runs. Of course, the statements inside of the inner loop will be run according to the conditions of that loop. So, if an outer loop runs 10 times and an inner loop runs 50 times, a statement in the body of an inner loop would run 500 times!
As an example, if you’re working out, you might do several sets of
bench presses with a fixed number of reps in each set. If you did 3 sets
of 15 bench presses each, your workout program might look like this:
Set your set counter to 0.
While your set counter is less than 3,
set your rep counter to 0.
While your rep counter is less than 15,
do a bench press.
Increase your rep counter by 1.
Rest for 2 minutes.
Increase your set counter.
This way of describing the workout program seems tedious. Most of the description is structural: conditions for the loops and increments for the counters. The only “real” activities are the bench press and the resting. As you can see, the bench press is inside the inner rep loop and will be executed 15 times for each complete execution of the inner rep loop. Since the inner rep loop sits inside the outer set loop, it’ll be executed 3 times, giving a grand total of 45 bench presses. Resting, however, is after the inner rep loop but still contained in the outer set loop and will be executed 3 times, totaling 6 minutes of rest.
As with conditionals, writing out loops in English is tedious and imprecise. In the next section, we’ll discuss the tools for writing loops in Java. Because Java was designed with loops as a central tool, we can write loops more succinctly than in English, squeezing a lot of information into a small space. Because we pack so much information into them, loops can look daunting at first. Remember that the syntax we’ll introduce is only the formal Java way of expressing a condition and a list of instructions to execute repeatedly.
5.3. Syntax: Loops in Java
The Java programming language contains three differently named kinds of
loops: while
loops, for
loops, and do
-while
loops. All of them
allow you to write code that will be executed repeatedly. In fact, any
program that uses one kind of loop to solve a problem could be
converted to use either of the other two kinds. The three kinds are
provided in Java partly so that it’s easy to code certain typical forms
of repetition and partly because the C language, an ancestor of Java,
contained these three. We’ll begin by describing while
loops because
they have the simplest form and then move on to the other two kinds. We’ll
then explain the syntax for nesting together multiple loops and
finally discuss several of the common pitfalls encountered by
programmers when coding loops.
5.3.1. while
loops
Superficially, the syntax of a while
loop resembles an if
statement.
It starts with the keyword while
followed by a boolean
condition in
parentheses with a block of code surrounded by braces ({ }
)
afterward. This similarity is not accidental. The only difference
between the two is that the body of the if
statement will run a only
single time, while the body of the while
loop will run as long as the
condition remains true
. Figure 5.1 shows the pattern of
execution for while
loops.
true
, all of the statements in the body of the loop are executed, and then the condition is checked again. When the check is false
, execution skips past the body of the loop.If we assume that the boolean
value atDoor
says whether or not we
have reached the door and the method walkTowardsDoor()
allows us to
take one step closer to the door, we could formulate our example from
the beginning of the previous section as follows.
while (!atDoor) {
atDoor = walkTowardsDoor();
}
Here we assume that the walkTowardsDoor()
method gives back a
boolean
value that is true
if we have reached the door and false
otherwise. Unless the walkTowardsDoor()
method is able to change the
value of atDoor
, the loop will repeat forever, a phenomenon known as
an infinite loop.
while (true) {
System.out.println("Help me!");
}
This code gives an example of an infinite loop. If you run this code inside
of a program, it’ll print out an endless succession of Help me!
messages. Be prepared to stop the program by typing Ctrl+C
(hold down
the Control
key and press C
) because it won’t end otherwise. Not
all infinite loops are this obvious. A programmer will not usually use
true
as the condition of a loop, but doing so is not always wrong.
Some loops are expected to continue for quite some time with no definite
end. To leave a loop abruptly, you can use the break
command.
while (true) {
System.out.println("Help me!");
break;
}
This loop will only print out a single Help me!
before exiting. A
break
command can be used with an if
statement to make a loop that
repeats more than once.
int counter = 0;
while (true) {
System.out.print("the loop ");
counter++;
if(counter >= 3)
break;
}
System.out.println("is on fire!");
This loop will print out the loop the loop the loop is on fire!
Of
course, the break
statement unnecessarily complicates the code. We
could have written equivalent code as follows.
int counter = 0;
while (counter < 3) {
System.out.print("the loop ");
counter++;
}
System.out.println("is on fire!");
Now, we move on to a more complicated example that can print out the binary representation of a number.
As we discussed in Chapter 1, binary numbers
are the building blocks of every piece of data inside a modern
computer’s memory. Integers are stored in binary. The representation of
floating-point numbers is more complicated, but it also uses 1s and 0s.
Even the char
data type and the String
values built from them are
fundamentally stored as binary numbers. For this reason, computer
scientists tend to be familiar with the base 2 number system and how to
convert between it and base 10, our usual number system.
In base 10, the number 379 is equal to 3 · 100 + 7 · 10 + 9 · 1 = 3 · 102 + 7 · 101 + 9 · 100. Moving from right to left, the value of each place increases by a factor of 10. A binary number is the same, except that the increase is by a factor of 2 and no single digit is greater than 1. Thus, the number 1010112 = 1 · 25 + 0 · 24 + 1 · 23 + 0 · 22 + 1 · 2 + 1 · 20 = 1 · 32 + 0 · 16 + 1 · 8 + 0 · 4 + 1 · 2 + 1 · 0 = 43. In binary, the number 379 = 1011110112.
To convert a number n to binary, we first find the largest power of 2 that is not larger than n. Then, we begin a repetitive process that stops when the power of 2 under consideration is 0. If 2 raised to the current power is bigger than n, we print out a 0 because that power is too big for n. Otherwise, we print out a 1, subtract 2 raised to that power from n, and move on to the next smaller power of 2. This process will print a 0 for every power of 2 that’s not in n and a 1 for every one that is, giving exactly the definition of a number written in base 2.
import java.util.*;
public class DecimalToBinary {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a base 10 number: ");
int number = in.nextInt();
int power = 1;
while(power <= number/2)
power *= 2;
while(power > 0) {
if(power > number)
System.out.print(0);
else {
System.out.print(1);
number -= power;
}
power /= 2;
}
}
}
The first while
loop in this program doubles the value of power
until doubling it again would make it larger than number
. We go up to
and including number/2
, otherwise we’d stop when power
was
larger than number
. After that loop, we begin repeatedly checking to
see if a given power of 2 is bigger than the value left in number
. If
it is, we know that we do not use that power. If it’s not, we do and
must remove that power from the value of number
.
You may have been tempted to solve this problem by determining if a
given number is even or odd. If it’s even, then you record a 0, and if
it’s odd, then you record a 1. You could then divide the number by two
and repeat the process of determining whether it is even or odd. You
would continue this process until the number became 0. This procedure
requires only a single while
loop and would give the digits of the
number in base 2. Unfortunately, you would get the digits in reverse
order. Because we write our numbers with the most significant digit on
the left, we had to use the code given above to first find the largest
value and work backward, in order to determine the binary digits in the
correct sequence.
5.3.2. for
loops
Let’s return to our code that prints out
the loop the loop the loop is on fire!
int counter = 0;
while (counter < 3) {
System.out.print("the loop ");
counter++;
}
System.out.println("is on fire!");
This code involves some initialization, a condition, and an update, as
many loops do. The initialization sets counter
to 0
, the condition
checks to make sure that counter
is less than 3
, and the update
increments counter
by 1 every iteration of the loop. These three
elements are so common that a special kind of loop called the for
loop
was designed with them explicitly in mind. Most for
loops are
dependent on a single counting variable. To make the loop easy to read,
the initialization, condition, and update, all of which relate to this
variable, are pulled into the header of the loop. We could code the
previous while
loop example more cleanly, using a for
loop, as
follows.
for(int i = 0; i < 3; i++) {
System.out.print("the loop ");
}
System.out.println("is on fire!");
The header of a for
loop consists of those three parts: the
initialization, the condition, and the update, all separated by
semicolons. Figure 5.2 shows the pattern of execution for
for
loops.
true
, all of the statements in the body of the loop are executed, followed by the increment step. Then the condition is checked again. When the check is false
, execution skips past the body of the loop.You may have noticed that we changed the variable name used within
the loop from counter
to i
. Doing so doesn’t change the function of
the code. We did so because using the variables i
, j
, and sometimes
k
is a very common practice with for
loops. By using variables named
like this, we are indicating that the variable is just a dummy counter
that we are using to make the loop work, not some variable with a
grander purpose. Also, with three uses of a single variable in the
header of a for
loop, a long variable name takes up a lot of space.
for
loops are used in Java programs more than the other two loops.
They work well when you know how many times you want to iterate through
the loop, which you often do. You can think of the first part of the
for
loop header as the starting point, the second part as the ending
point, and the third part as how you get from the start to the end. Many
beginning programmers get stuck on the idea that every for
loop starts
with int i = 0
and ends with i++
. While this pattern is often true,
there are many other ways to use a for loop. For example, we could print
the powers of 2 that are less than 1000.
for(int i = 1; i < 1000; i *= 2) {
System.out.println(i);
}
This segment of code prints out 1
, 2
, 4
, 8
, 16
, 32
, 64
,
128
, 256
, and 512
on separate lines, which are the powers of
2 from 20 up to 29.
Both of the examples of for
loops we
have given have only had a single executable line in the body of the
loop. Like if
statements, loops only require braces if their bodies
have more than one executable line. Many of the while
loops from the
previous subsection could have been written without braces.
Just because a for
loop already has a counting mechanism doesn’t mean
that we won’t need other variables to perform useful tasks. For
example, given a String
, we could try to find the letter of the
alphabet in the String
which is closest to the end of the alphabet.
For the String
"Pluto is no longer a planet"
, the latest letter in
the alphabet is 'u'
. To write code that will do this job, we must use
the counting variable from the for
loop as an index into the
String
. Then, we must also have a temporary variable where we keep the
latest letter found so far. To get the ith char
from a
String
, we can use the charAt()
method. Recall that the index of the first
char
in a String
is 0, and the index of the last char
is one less
than the length of the String
.
String s = "The quick brown fox jumps over the lazy dog.";
String lower = s.toLowerCase();
char latest = ' ';
char c;
for(int i = 0; i < lower.length(); i++) {
c = lower.charAt(i);
if(c >= 'a' && c <= 'z' && c > latest)
latest = c;
}
System.out.println("The latest character in the alphabet from your message is: '"
+ latest + "'.");
The first thing we do in this example is convert s
to lowercase, so
that we are comparing all char
values in the same case. Next, we run
through lower
, starting at index 0 and going until we reach the end of
the String
. For each char
, we check to see if it is an alphabetic
character and then if it is later in the alphabet than our current
latest. If it is, we store it into latest
. After the loop, we print
out the value in latest
. We have chosen the char
' '
because it is
numerically earlier than all the letters in the alphabet. If the output
is a space, we’d know that none of the characters in s
were
alphabetic.
For the example given, the latest character in the alphabet is 'z'
because of the word "lazy"
. One weakness in this code is that it will
always search through the entire String
, even if the letter 'z'
has
already been found. For the String
"The quick brown fox jumps over the lazy dog."
, we’re not wasting too
much time. However, if the String
were "Zanzibar!"
followed by the
full text of War and Peace, we’d be wasting thousands and
thousands of operations reading characters when we knew that 'z'
was
going to be the latest letter, no matter what. We can rewrite our
for
loop so that it quits early if it reaches a 'z'
.
for(int i = 0; i < lower.length(); i++) {
c = lower.charAt(i);
if(c >= 'a' && c <= 'z' && c > latest)
latest = c;
if(latest == 'z')
break;
}
This version of the for
loop will break out immediately if the latest
is already a 'z'
. This code will work efficiently, but many
professional programmers discourage the use of break
except when
absolutely necessary (like in a switch
statement). If a break
is
used to exit the loop, this logic can be encoded into the condition of
the loop. Thus, the same loop written with better style would be the
following.
for(int i = 0; i < lower.length() && latest != 'z'; i++) {
c = lower.charAt(i);
if(c >= 'a' && c <= 'z' && c > latest)
latest = c;
}
For this final version of the loop, we have made the conditional portion
of the header more complex. The comparison using <
gives a boolean
that we combine using &&
with the boolean
from the comparison
using !=
. As always, remember that the loop will continue iterating as
long as the condition is true
. Since we need both parts of the
condition to be true
to continue executing, we use the &&
operator
to connect them.
We apologize to international readers for focusing on the Latin alphabet
used by English and many other Western European languages. It should be
possible to make a localized version of this example with any alphabet
by checking the return value of Character.isLetter(c)
, which is valid
for all single-character Unicode values, although the idea of
alphabetical order doesn’t really apply to some character systems like
the hanzi and kanji of Chinese and Japanese. Regardless, using the
Character.isLetter()
method is recommended for almost all
applications, since it’s more general and more readable.
Prime numbers are integers greater than 1 whose only factors are 1 and themselves. If you’ve encountered prime numbers before, they probably seemed like a mathematical curiosity and nothing more. In fact, prime numbers are the basis of a very practical application of mathematics: cryptography. With the use of some math and very large prime numbers, computer scientists have devised techniques that make messages sent over the Internet more secure.
These techniques are beyond the scope of this book, but we can at least write some code to determine if a number n is prime. To do so, we can simply divide n by all the numbers between 2 and n - 1. If none of the numbers divide it evenly, it must be prime. Here is this basic solution.
import java.util.*;
public class PrimalityTester0 {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a number: ");
long number = in.nextLong();
boolean prime = true;
for(long i = 2; i < number && prime; i++)
if(number % i == 0)
prime = false;
if(prime)
System.out.println(number + " is prime.");
else
System.out.println(number + " is not prime.");
}
}
This program has a for
loop that runs from 2
up to number - 1
,
provided that we don’t find a number that evenly divides number
. This
optimization means that the program will output the moment that it knows
that the number is not prime, but we’ll have to wait for it to
check all the other possibilities before it is sure that the number is
prime.
One insight that we can use to make the program more efficient is that, after checking 2, we don’t have to divide it by any even numbers. So, we can do half the checking with a few simple modifications.
import java.util.*;
public class PrimalityTester1 {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a number: ");
long number = in.nextLong();
boolean prime = number == 2 || number % 2 != 0;
for(long i = 3; i < number && prime; i += 2 )
if(number % i == 0)
prime = false;
if(prime)
System.out.println(number + " is prime.");
else
System.out.println(number + " is not prime.");
}
}
This version of the program sets the boolean
variable prime
to
false
if number
is divisible by 2 (unless it’s 2 itself) and true
otherwise. Then, it
starts the search at 3 and continues in jumps of 2. Although we save half the checks,
we can do much better. Note that if a number n is divisible by 2, then it’s also divisible by
n/2. So, if a number is not divisible by 2, it’s also not divisible by any
number larger than n/2. If it’s not divisible by 2 or 3, then it’s also not divisible by any number
larger than n/3. If it’s not divisible by 2 or 3
or 4, it’s not divisible by any number larger than
n/4, and so on. Thus, we don’t have to check all
the way up to n - 1. If we’re checking to see if
n is divisible by x and learning that
n is not divisible by anything larger than
n/x, the point where x = n/x
is as follows.
Thus, we only need to search up to the square root of n, which will save much more time.
import java.util.*;
public class PrimalityTester2 {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a number: ");
long number = in.nextLong();
boolean prime = number == 2 || number % 2 != 0;
long root = (long)Math.sqrt(number);
for(long i = 3; i <= root && prime; i += 2 )
if(number % i == 0)
prime = false;
if(prime)
System.out.println(number + " is prime.");
else
System.out.println(number + " is not prime.");
}
}
Note in this version of the program we do go up to and including root
,
because there’s the possibility that number
is a perfect square.
DNA is usually double stranded, with each base paired to another specific base, called its complementary base. The following table shows the association between each base and its complementary base.
Base | Abbreviation | Complementary Base |
---|---|---|
Adenine |
A |
T |
Cytosine |
C |
G |
Guanine |
G |
C |
Thymine |
T |
A |
A simple but common task is finding the reverse complement of a DNA sequence. The reverse complement of a DNA sequence is its sequence of complementary bases given in reverse order. For example, the reverse complement of ACATGAG is CTCATGT. This sequence is found by first finding the complement of ACATGAG, which is TGTACTC, and then reversing its order.
We’ll write a program that finds the reverse complement of a DNA
sequence entered by a user. This sequence will be entered as a sequence
of characters made up of the four abbreviations for the bases: A, C, G,
and T. We’ll store this sequence as a String
and perform some
manipulations on it to get the reverse complement.
import java.util.*;
public class ReverseComplement {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a DNA sequence: ");
String sequence = in.next().toUpperCase();
String complement = "";
for (int i = 0; i < sequence.length(); i++) (1)
switch (sequence.charAt(i)) { // Get complements
case 'A': complement += "T"; break;
case 'C': complement += "G"; break;
case 'G': complement += "C"; break;
case 'T': complement += "A"; break;
}
String reverseComplement = "";
// Reverse the complement
for (int i = complement.length() - 1; i >= 0; i--) (2)
reverseComplement += complement.charAt(i);
System.out.println("Reverse complement: " + reverseComplement);
}
}
1 | This example first creates a String filled with the complement of the
base pairs from the input String . |
2 | Then, it creates a new String that is the reverse of the complement
sequence. |
Note how
complement
is created by appending the char
corresponding to the
complementary base at the end of complement
. If we inserted each
char
at the beginning of complement
, we wouldn’t need to reverse in
a separate step.
import java.util.*;
public class CleverReverseComplement {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a DNA sequence: ");
String sequence = in.next().toUpperCase();
String reverseComplement = "";
for (int i = 0; i < sequence.length(); i++)
switch (sequence.charAt(i)) { // Get complements
case 'A': reverseComplement = "T" + reverseComplement; break;
case 'C': reverseComplement = "G" + reverseComplement; break;
case 'G': reverseComplement = "C" + reverseComplement; break;
case 'T': reverseComplement = "A" + reverseComplement; break;
}
System.out.println("Reverse complement: " + reverseComplement);
}
}
Since String
values are immutable, they can never be changed. Thus, the +=
operator doesn’t change the String
; instead, it creates a new String
with
the new information concatenated onto it. Unfortunately, it’s inefficient to do
so if there are many repeated concatenations. DNA sequences could contain
billions of bases, and building up the reverse complement of a sequence that
long would generate billions of new String
objects, putting a strain on memory
management. In situations where repetitive String
concatenation occurs, it’s
more efficient to use a StringBuilder
object, which is similar to a String
but mutable. StringBuilder
objects have an append()
method that will insert
data at the end of the current text representation. They also have an insert()
method that will insert data at the given index. Once the text has been built,
calling the toString()
method on the StringBuilder
object will return the
String
object that contains the text represented inside the StringBuilder
.
The following code is a version of the clever reverse complement using a
StringBuilder
instead of String
concatenation.
StringBuilder
.import java.util.*;
public class StringBuilderReverseComplement {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Please enter a DNA sequence: ");
String sequence = in.next().toUpperCase();
StringBuilder reverseComplement = new StringBuilder();
for (int i = 0; i < sequence.length(); i++)
switch (sequence.charAt(i)) { // Get complements
case 'A': reverseComplement.insert(0, "T"); break;
case 'C': reverseComplement.insert(0, "G"); break;
case 'G': reverseComplement.insert(0, "C"); break;
case 'T': reverseComplement.insert(0, "A"); break;
}
System.out.println("Reverse complement: " + reverseComplement.toString());
}
}
Some IDEs will prompt you to automatically convert String
concatenation inside
of a loop into equivalent StringBuilder
code.
5.3.3. do
-while
loops
Use this rule of thumb for deciding which kind of loop to use: If you
know how many times you want the loop to execute, use a for
loop. If
you don’t know how many times you want it to execute, use a while
loop. Clearly, this rule is not iron-clad. In the previous example, we
used a for
loop even though it would stop executing as soon as a 'z'
was encountered. Nevertheless, it seems like we have covered all of the
possible situations with while
and for
loops. When should we use
do
-while
loops? The simple answer is: never.
You never have to use a do
-while
loop. With a little bit of effort,
you could use a single kind of loop for every job. The key difference between
a do
-while
loop and a regular while
loop is that a do
-while
loop
will always run at least once. Neither of the other two loops give you
that guarantee. The syntax for a do
-while
loop is a do
at the top of
a loop body enclosed in braces, with a normal while
header at the end,
including a condition in parentheses, followed by a semicolon.
Figure 5.3 shows the pattern of execution for do
-while
loops.
false
, execution skips past the body of the loop. A do
-while
loop is guaranteed to run at least once.We can use a do
-while
loop to print out the first 10 perfect squares
as follows.
int x = 1;
do {
System.out.println(x*x);
x++;
} while (x <= 10);
This loop behaves exactly the same as the following loop.
int x = 1;
while (x <= 10) {
System.out.println(x*x);
x++;
}
The time when a do
-while
loop is really going to shine is when your
program will work incorrectly if the loop doesn’t run at least once.
This situation often occurs with input, when the loop must run at least
once before checking the condition. For example, imagine that you want
to write a program that picks a random number between 1 and 100 and lets
the user guess what it is until the user gets it right. You need a loop
because it’s a repetitive activity, but you need to let the user guess
at least once so that you can check if he or she was right. The
following program fragment does exactly that.
Scanner in = new Scanner(System.in);
Random random = new Random();
int guess;
int number = random.nextInt(100) + 1;
do {
System.out.print("What's your guess? ");
guess = in.nextInt();
} while (guess != number);
System.out.println("You got it! The number was " + number + ".");
You could perform the same function with a while
loop, but you’d
need to get input from the user before the loop starts. Using the
do
-while
loop is a little more elegant.
5.3.4. Nested loops
Just as you can nest if
statements, it’s possible to nest loops inside of other
loops. In the simplest case, you may have some repetitive activity that
itself needs to be performed several times. For example, when you were
younger, you probably had to learn your multiplication tables. For each
number, a multiplication table gave the value of the product of that
number by every integer between 1 and 12. We can write code to print out
out the multiplication table for every number from 1 to 10 by simply
repeating the process.
for(int number = 1; number <= 10; number++) {
for(int factor = 1; factor <= 12; factor++) {
System.out.println(number + " x " + factor +
" = " + number*factor);
}
System.out.println();
}
The outer loop incrementing number
runs 10 times. The inner loop
incrementing factor
will run 12 times for each iteration of the outer
loop. Thus, the code in the inner loop will run a total of 120 times.
Every 12 iterations, the inner loop will stop, and an extra blank line
will be added by the System.out.println()
method in the outer loop.
The sequence consisting of 1, 3, 6, 10, 15, and so on is known as the triangular numbers. The ith triangular number is the sum of the first i integers. They are called triangular numbers because they can be drawn as equilateral triangles in a very natural way, if you use a number of dots equal to the number.
We can use nested loops to print out the first n triangular numbers, where n is specified by the user.
import java.util.*;
public class TriangularNumbers {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("How many triangular numbers? ");
int n = in.nextInt();
int sum;
for(int i = 1; i <= n; i++) {
sum = 0;
for(int j = 1; j <= i; j++)
sum += j;
System.out.println(sum);
}
}
}
As you can see, the outer loop iterates through each of the n
different triangular numbers. Then, the inner loop does the summation
needed to compute the given triangular number. However, producing a sequence of
triangular numbers this way is inefficient. Nested loops are an effective way to solve many problems,
particularly certain types of problems using arrays, but we can generate
triangular numbers using only a single for
loop. The key insight is
that we can keep track of the previous triangular number and add
i to it, as i increases.
import java.util.*;
public class CleverTriangularNumbers {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("How many triangular numbers? ");
int n = in.nextInt();
int triangular = 0;
for(int i = 1; i <= n; i++) {
triangular += i;
System.out.println(triangular);
}
}
}
By removing the inner for
loop, the total amount of work needed is
greatly reduced.
5.3.5. Common pitfalls
With great power comes great responsibility. The power to repeat things a large number of times means that we can also repeat our mistakes a large number of times. Many classic bugs occur as a result of logical or typographical errors in loops. We list a few of the most common below.
Pitfall: Infinite loops
It’s possible to create a loop that never terminates. Your program may take a long time to finish, but if it takes much longer than you expect, an infinite loop might be the culprit. Infinite loops might occur because you forgot to include an appropriate statement to advance a counter.
This code is presumably intended to print out the first 100 integers,
but there’s no code that increases the value of
One might expect this code to print out 20 lines of output. However,
remember that |
Pitfall: Almost infinite loops
Many loops are truly infinite; others take a really long time. For example, if you intended to run a loop down from 10 to 0, but increment your counter instead of decrementing it, overflow means that you will eventually get to a number less than 0, but it will take more than 2 billion increments instead of the expected 10 decrements.
This loop will significantly slow your code. Everyone will be so tired of waiting that they might leave the rocket launch. Of course, another problem with almost infinite loops is that you are dealing with the wrong values. No one expects to hear the number 2,147,483,647 in a countdown. |
Pitfall: Fencepost errors
Perhaps the most common loop errors are fencepost errors, often known as off-by-one errors. The name “fencepost” comes from a related mistake that someone might make when putting up a fence. Imagine that you want to erect a 10 meter long chain link fence with a support post every meter, how many posts do you need? In fact, we haven’t given you enough information to answer the question correctly. If your fence is built in a straight line, you’ll need 11 posts so that you have a post at each end. However, if your fence is a rectangular enclosure, say 3 meters by 2 meters, you’ll only need 10 posts. In loops, fencepost errors are often due to zero-based counting. A
Of course, sometimes we need one-based counting instead. After being used to zero-based counting, a programmer might make the following loop that incorrectly iterates 9 times.
The correct version that iterates 10 times is below.
If you want to iterate n times, start at 0 and go up to
but not including n or alternately start at 1 and go up to and
including n. To keep loop headers consistent, some
programmers always start at 0 and then adjust the values inside the
loop, printing out |
Pitfall: Skipped loops
A loop runs as long as its condition is For example, we can write a program that will add any number of positive values. When the user is finished using the adder, he or she enters a negative number. This negative number, called a sentinel value, tells the program to stop executing the loop. Below is an incorrect implementation of such a program.
This loop will never be executed because |
Pitfall: Misplaced semicolons
The idea of a statement in Java is often amorphous in the minds of beginning programmers. An entire loop (with any number of loops nested inside of it) is considered one statement. An executable statement ending with a semicolon is one statement as well, even when that executable statement is empty. Thus, the following is a legal (but infinite) loop.
This code was supposed to count down from 100, just like in the game of
Hide and Seek; however, there is a semicolon after the condition of the
This error is common especially for those new to loops and conditional
statements and are in the habit of putting semicolons after everything.
A misplaced semicolon doesn’t always result in an infinite loop. Here
is the
This version of the code will execute similarly, except the decrement is
built into the header of the loop. So, the loop will execute the empty
statement, but it will also decrement There are some cases when an empty statement for a loop body is useful although it is never necessary. In future chapters, we’ll point out situations in which you may wish to use an empty statement this way. |
5.4. Solution: DNA searching
Below we give a solution to the DNA searching problem posed at the
beginning of the second half of this chapter. Our solution prints out
the index within the sequence when it finds a match with the
subsequence it’s looking for. Afterward, it prints out the total number of
matches. Our code also does error checking to make sure that the user
only enters valid DNA sequences containing the letters A, C, G, and T.
We begin our code with the standard import
statement and class
definition.
import java.util.*;
public class DNASearch {
public static void main(String[] args) {
Scanner in = new Scanner(System.in); (1)
String sequence, subsequence;
boolean valid; (2)
char c;
1 | The main() method instantiates a Scanner object and declares both of
the String variables we’ll need to store the DNA sequences. |
2 | The method
also declares a boolean and a char we’ll use for input checking. |
do {
System.out.print("Enter the DNA sequence you wish to search in: "); (1)
sequence = in.next().toUpperCase(); (2)
valid = true;
for(int i = 0; i < sequence.length() && valid; i++) { (3)
c = sequence.charAt(i);
if(c != 'A' && c != 'C' && c != 'G' && c != 'T') { (4)
System.out.println("Invalid DNA sequence!");
valid = false;
}
}
} while(!valid); (5)
1 | Next, the user is prompted for a DNA sequence to search in. |
2 | This
String stored in sequence is converted to uppercase just in case
the user is not being consistent. |
3 | The inner for loop in this code checks each char inside of sequence . |
4 | If any char is not an
'A' , 'C' , 'G' , or 'T' , then valid is set to false . As a
result, the for loop terminates. Also, the do -while loop repeats the
prompt and gets a new String for sequence from the user. |
5 | This outer
do -while loop continues as long as the user keeps entering invalid DNA
sequences. |
do {
System.out.print("Enter the subsequence you wish to search for: ");
subsequence = in.next().toUpperCase();
valid = true;
for(int i = 0; i < subsequence.length() && valid; i++) {
c = subsequence.charAt(i);
if(c != 'A' && c != 'C' && c != 'G' && c != 'T') {
System.out.println("Invalid DNA sequence!");
valid = false;
}
}
} while(!valid);
The code used to input subsequence
while doing error checking is
virtually identical to the code to input sequence
.
int found = 0;
for(int i = 0; i < sequence.length() - subsequence.length() + 1; i++) { (1)
for(int j = 0; j < subsequence.length(); j++) { (2)
if(subsequence.charAt(j) != sequence.charAt(i + j)) (3)
break;
if(j == subsequence.length() - 1) { //matches (4)
System.out.println("Match found at index " + i);
found++;
}
}
}
1 | The workhorse of the search is found in these nested for loops.
The outer loop iterates through every index in sequence , until it
comes to an index that is too late to be the start of a new subsequence
(since the subsequence would be too long to fit anymore). This happens
to be when the value of i is greater than or equal to
sequence.length() - subsequence.length() + 1 . It may take some thought
to verify that this condition is the correct one. One way to think about
this problem is by noting that, when sequence and subsequence have
the same length, you need to check starting at index 0 of sequence
but not any later indexes. Also, if subsequence is one char longer
than sequence , there can never be a match. In that case, the value of
sequence.length() - subsequence.length() + 1 would be 0 . Since 0
is not less than 0 , the outer for loop would never execute. |
2 | The inner for loop iterates through the length of subsequence ,
making sure that every char in sequence , starting at the appropriate
offset, exactly matches a char in subsequence . |
3 | If, at any point, the
two char values do not match, the inner for loop will immediately
exit, using the break command. |
4 | However, on the last iteration of the
inner for loop, when j is one less than the length of subsequence ,
we know that all of subsequence matched a part of sequence . As a
result, we print out the index of sequence where subsequence started
and increment the found counter. |
If you know the String
class well, you can use the indexOf()
method
to replace the inner for
loop. We leave that approach as an exercise.
Finally, we print out the total number of matches found. In order to
avoid awkward output like 1 matches found.
, we used an if
-else
to
customize the output based on the value of found
.
if(found == 1)
System.out.println("One match found.");
else
System.out.println(found + " matches found.");
}
}
The ideas needed to correctly implement the solution are not difficult, but catching all the off-by-one errors and getting every detail right takes care. There’s also more than one way to code this solution. For example, we could have written the nested loops that do the searching as follows.
int found = 0;
for(int i = 0; i < sequence.length() - subsequence.length() + 1; i++) {
for(int j = 0; j < subsequence.length() &&
subsequence.charAt(j) == sequence.charAt(i + j); j++)
if(j == subsequence.length() - 1) { // Matches
System.out.println("Match found at index " + i);
found++;
}
}
}
This design is preferred by many since it removes the break
. By using
an empty statement, it’s possible to move the check to see if the
matching process is done outside of the inner for
loop.
int found = 0;
int j;
for(int i = 0; i < sequence.length() - subsequence.length() + 1; i++) {
for(j = 0; j < subsequence.length() &&
subsequence.charAt(j) == sequence.charAt(i + j); j++);
if(j == subsequence.length()) { // Matches
System.out.println("Match found at index " + i);
found++;
}
}
In this case, note that we must declare j
outside of the inner for
loop, since it will be used outside. This approach is more efficient
because we only need to perform the check once. Note that the
condition of the if
statement has also changed. Now, we know that all of
subsequence
matches because the loop ran to completion. If the loop
did not run to completion, then j
would be smaller than
subsequence.length()
and the loop must have terminated because the two
char
values did not match. Although more efficient, some programmers
would avoid this approach because it uses confusing syntax in which
the body of the for
loop is a single empty statement followed by a
semicolon. Likewise, the logic about exiting the loop and the condition
of the if
statement is murkier.
5.5. Concurrency: Loops
Many programmers use concurrency for speedup, to make their programs to run faster. Most programs that run for a long time use loops to do repetitive tasks. If these loops are doing the same operation to many different pieces of data, we may be able to speed up the process by splitting up the data and letting different threads operate on their own segment of the data. Splitting up data this way is called domain decomposition which allows us to achieve data parallelism. These topics are discussed further in Section 14.3.
Performing repetitive tasks is one of the great strengths of computers. For most programs that run a long time, incredible amounts of computation are being done inside of (usually nested) loops. Domain decomposition will not work for all of these programs. Some cannot be parallelized at all, but this book is about finding problems that can have parallel and concurrent solutions.
In Chapter 14, we’ll introduce tools for writing a concurrent program with different threads of execution running at the exactly the same time and potentially interacting. Using only the power of loops, you can see parallelism in action now.
Consider the problem of computing the sum of the sines of a range of integers. At its heart is a loop from the start of the range to the end.
for(int i = start; i <= end; i++)
sum += Math.sin(start);
If we want to allow the user to specify the start and the end and print out the sum, we need to make a program with a little bit of input and output around this loop.
import java.util.Scanner;
public class SumSines {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter starting value: ");
int start = in.nextInt();
System.out.print("Enter ending value: ");
int end = in.nextInt();
double sum = 0;
for(int i = start; i <= end; i++)
sum += Math.sin(start);
System.out.println("Sum of sines: " + sum);
}
}
If you compile and run this program with 1
as the start value and
100000000
as the end, the answer should be 1.7136493465700542
. One
hundred million values is a lot to find the sine for. Depending on your
machine, this task should take between 10 seconds and over a minute. Try
to time how long this takes as accurately as possible.
Now, open a total of four terminal windows and navigate them all to the
directory with SumSines.class
in it. Run SumSines
in each one. For
the first terminal, enter 1
as the start and 25000000
as the end. For
the second, enter 25000001
and 50000000
. For the third, enter
50000001
and 75000000
. For the last, enter 75000000
and
100000000
. Once they have run, you should get, respectively,
1.4912473269134603
, -0.6795491754132104
, -0.2893142602684644
, and
1.1912654553381272
. If you add these together using a calculator, you
should get 1.7136493465699127
, which is almost exactly the same answer
we got before. (Floating-point rounding errors cause the slight
difference.)
If you try to start them computing at about the same time, you can try to see how long it takes for all of them to complete. Did it take less time than before? If you have a single core processor, it might have taken just as long or longer. If you have a dual-core processor, it should have taken less time, and if you have a quad core processor, even less. Since we’re dividing the problem into four pieces, we don’t expect to see any improvement with more than four cores.
Most operating systems provide a graphical way of viewing the load on each processor. If you examine your CPU usage while running those programs, you should see it spike up when the programs start and then come down when they finish. For multiple cores, how did we say which core we wanted each program to run on? We didn’t. In general, it’s difficult to specify which core we want to run a program, process, or thread on. The OS does the job of scheduling and picks a free processor when it needs to run a program. It’s even possible for programs and threads to change from one core to another while running if the OS needs to balance out the workload.
This sines example is similar to Example 14.10 in Chapter 14. As you may have noticed, running four programs simultaneously is not convenient. You have to open several windows, you have to type starting and ending points very carefully, and you have to combine the answers at the end since your programs cannot interact directly with each other. Features of Java will make this job easier, allowing us to run more than one thread of execution at a time without the need to run multiple programs by hand.
5.6. Exercises
Conceptual Problems
-
If you have a
String
containing a long text and you want to count the number of words in the text that begin with the letter'm'
, which of the three kinds of loops would you use, and why? -
In Example 5.2, our last version of the primality tester
PrimalityTester2
computes the square root of the number being tested. Instead of computing this value before the loop, how would performance be affected by changing the head of thefor
loop to the following?for(long i = 3; i <= Math.sqrt(number) && prime; i += 2)
-
How many different DNA sequences of length n are there?
-
There are three different errors in the following loop intended to print out the numbers 1 through 10. What are they?
for(int i = 1; i < 10; i--); { System.out.println(i); }
-
Consider the following code containing nested
for
loops.Scanner in = new Scanner(System.in); int n = in.nextInt(); int count = 0; for(int i = 1; i <= n; i++) for(int j = 1; j <= i; j++) count++;
In terms of the value of
n
, how many times iscount
incremented? If it’s not immediately obvious, trace through the execution of the program by hand or run the code for several different values ofn
and try to detect a pattern.
Programming Practice
-
Write a program that converts base 10 numbers into base 3 numbers. If you find that task too easy, write a program that will convert base 10 numbers to any base in the range 2 to 16. Hint: Use letters A through Z, in order, to represent digits larger than 9.
-
The greatest common divisor (GCD) of two integers is the largest integer that divides both of them evenly. The GCD for any two positive integers is at least 1 and at most the smaller of the two numbers. Write a program that prompts a user for two
int
values and finds their GCD. Although there are more efficient methods, you can count down from either number. If the counter ever divides both numbers evenly, it’s the GCD. The counter is guaranteed to divide them both if it reaches 1. -
In the solution to the DNA searching problem given in Section 5.4, we used two
for
loops to find occurrences of a DNA subsequence inside of a larger sequence. Professional Java developers would have used a singlefor
loop and theindexOf()
method in theString
class. One version of this method returns the index of a substring within aString
object, starting from a particular offset, as shown below.String text = "fun dysfunction"; String search = "fun"; System.out.println("Location: " + text.indexOf(search, 4));
This code will output
Location: 7
since the first occurrence of"fun"
from index4
or later starts at index7
. If there are no more occurrences of the substring beyond the starting index, the method will return-1
. Rewrite the solution to the DNA searching problem, replacing the inner searchingfor
loop with theindexOf()
method. -
Write a program that reads a number n from a user and then prints all possible DNA sequences of length n. Be careful not to supply too large of a value when you run this program. Hint: Represent the sequence as a
String
. On each iteration, focus on the lastchar
in theString
. If it is an'A'
, change it to a'C'
. If it is a'C'
, change it to a'G'
. If it is a'G'
, change it to a'T'
. If it is a'T'
, change it back to an'A'
, but “carry” the increment over to the nextchar
, like a rolling odometer. You will have to design loops that can deal with carries that cascade across multiple indexes. -
Re-implement the solution to the DNA searching program given in Section 5.4 using
JOptionPane
to generate GUIs for input and output.
Experiments
-
Using a
for
loop, record the Monty Hall simulation so that you can run it 100 times, always choosing to switch doors. Keep a record of how many times you win. Change your code again to run the Monty Hall simulation 100 more times, always choosing to keep your initial choice. Again, keep a record of how many times you win. Compare the two records. Choosing to switch should perform roughly twice as well as sticking with your initial choice. Increase the number of iterations to 1,000 and then 10,000 times. Does the performance of switching get closer to twice the performance of not switching? -
Write three nested
for
loops, each of which run 1,000 times. Increment a counter in the innermostfor
loop. If that counter starts at 0, its final value should be 1,000,000,000. Time how long your program takes to run to completion using either a stopwatch or, if you’re on a Unix or Linux system, thetime
command. Feel free to increase and decrease the amount that each loop runs to see the effect on the time. However, if you increase the values of all three loops too much, you may have to wait a long while. -
In Section 5.3.5, one of the common loop mistakes we discuss is an almost infinite loop. Create your own almost infinite loop that runs from
10
to0
, incrementing instead of decrementing. Time the execution of your program. Unlike our example, do not use an output statement or your code will take too long to run. How much longer would your code take to run if you used along
instead of anint
? -
In Example 5.2, we gave three programs to test a number for primality. Run each of these prime testers on a large prime such as 982,451,653 and time them. Is there a significant difference in the running time of
PrimalityTester0
andPrimalityTester1
? What aboutPrimalityTester1
andPrimalityTester2
? -
In Example 5.6, we ran four programs at the same time to solve a problem in parallel. Use the same framework (combined with your knowledge of primes from Example 5.2) to write a program that can see how many prime numbers are in a user specified range of integers. Then, use it to find the total number of primes between 2 and 500,000,000. Now, run two copies of the program with one starting at 2 and going up to 250,000,000 and the other starting at 250,000,001 and going up to 500,000,000. If you add the numbers together, do you get the same answer? (If not, there is a bug in your program.) Now, divide the work into four pieces. How much quicker, if at all, is running all four programs instead of one? Does one of the four pieces run significantly faster or slower than the others?