21. File I/O
Kira: What are those funny marks?
Jen: This is all writing.
Kira: What’s writing?
Jen: Words that stay, my master said.
21.1. Problem: A picture is worth 1,000 bytes
If you’re familiar with bitmap (bmp
) files, you know that they can take
up a lot of space compared to other popular image formats. People often use
a technique called data compression to reduce the size of large files for
storage. There are many different kinds of compression and many which are
tailored to work well on images. Your task is to write a program that will
do a particular kind of compression called run length encoding (RLE), which
we’ll test on bitmaps. The idea behind RLE is simple: Imagine a file as a
stream of bytes. As you look through the stream, replace repeating sequences
of a single byte with a count telling how many times it repeats followed by
the byte that repeats. Consider the following sequence.
215 7 7 7 7 7 7 7 7 7 123 94 94 94 71
Its RLE compressed version could be shown as follows.
1 215 9 7 1 123 3 94 1 71
Since there’s no simple way to keep track of which numbers are counts and which ones are actual bytes of data, we have to keep a count for every byte, even unrepeated ones. In this example, we went from 15 down to 10 numbers, a savings of a third. In the worst case, a file that has no repetition at all would actually double in size after being “compressed” with this kind of RLE. Nevertheless, RLE compression is used in practice and performs well in some situations.
Your job is to write a Java program that takes two arguments from the
command line. The first is either -c
for compress or -d
for
decompress. The second argument is the name of the file to be compressed
or decompressed. When compressing, append the suffix .compress
to the
end of the file name. When decompressing, remove that suffix.
Executing the following on the command line should generate an RLE compressed
file called test.bmp.compress
.
java BitmapCompression -c test.bmp
Likewise, executing the following should create an uncompressed file called
test.bmp
.
java BitmapCompression -d test.bmp.compress
Be sure to make a backup of the original file (in this case test.bmp
) because
decompressing will overwrite a file of the same name.
To perform the compression, go byte by byte through the file. For each
repeating sequence of a byte (which can be as short as a single byte
long), write the length of the sequence as a byte and then the byte
itself into the compressed file. If a repeating sequence is more than
127 bytes, you break the sequence into more than one piece since
the largest value the byte
data type can hold in Java is 127. (The
byte
type in Java is always signed, giving a range of -128 to 127.)
Decompression simply reads the count then the byte value and writes the
appropriate number of copies of the byte value into the decompressed
file.
21.1.1. Command-line arguments
You might be wondering how to read the command-line arguments such as
-c test.bmp
or -d test.bmp.compress
. These arguments can’t be read
using Scanner
as we read most text. Instead, they’re passed directly into
your program. By this point, you’ve written so many main()
methods that you
might have stopped paying attention to their syntax. Remember that main()
methods always take a single parameter of type String[]
. We’ve always called
this parameter args
in this book, but you’re free to call it whatever you
like.
This array of String
values is how command-line arguments are passed into
your program. A Java program invoked by typing
java BitmapCompression -c test.bmp
will have an array of length 2 passed in.
The first String
stored in this array (args[0]
) will be "-c"
. The second
String
stored in this array (args[1]
) will be "test.bmp"
. The operating
system passes in these command-line arguments to the JVM which passes them along
to your program as String
values.
Command-line programs often take these arguments to specify which options the
program will be run with. Although they’re useful, we didn’t focus on these
arguments in early chapters partly because they involve arrays and partly because
all arguments, even those that look like numbers, will be passed in as String
values. Furthermore, it’s cumbersome to specify command-line arguments when
using IDEs like IntelliJ or Eclipse.
21.2. Concepts: File I/O
Before you can tackle the problem of compressing, or even reading from and writing to, files, some background on files is necessary. By now, you’ve had many experiences with files: editing Java source files, compiling those files, and running them on a virtual machine, at the very least. You’ve probably done some word processing, looked at photos, listened to music, and watched videos, all on a computer. Each of these activities centers on one or more files. In order to reach some files, you probably had to look through directories (often called folders), which are a special kind of file as well. But what is a file?
21.2.1. Non-volatile storage
A computer program is like a living thing, always moving and restless. The variables in a program are stored in RAM, which is volatile storage. The data in volatile memory will only persist as long as there’s electricity powering it. But most programs don’t run constantly, and neither do most computers. We need a place to store data between runs of a particular program. Likewise, we need a place to store data when our computer isn’t turned on. Both scenarios share a common solution: secondary storage such as flash drives, hard drives, and optical media like DVD and Blu-ray discs.
Files are not always stored in non-volatile memory. It’s possible to load entire files into RAM and keep them there for long periods of time. Likewise, all input and output on Unix and Linux systems is viewed as file operations. Nevertheless, the characteristics of non-volatile memory are often associated with file I/O: slow access times and the possibility of errors in the event of inaccessible files or hardware problems.
21.2.2. Stream model
While discussing RLE encoding, we described a file as a stream of bytes, and that’s a good definition for a file, especially in Java. Since Java is platform independent, and different operating systems and hardware will deal with the nitty gritty details of storing files in different ways, we want to think about files as abstractly as possible. Reading from and writing to a stream of bytes is not so different from the other input and output you’ve done so far. For the most part, file I/O will be similar to command-line I/O and, in fact, can use some of the same classes.
Although reading and writing from the files can be like reading from the
keyboard and writing to the screen, there are a few additional complications.
For one thing, you must open a file before you
can read or write. Sometimes opening the file will fail: You could try
to open a file for reading which doesn’t exist or try to open a file
for writing which you don’t have permissions for. When reading data,
you might try to read past the end of the file or try to read an int
when the next item is a String
. Unlike reading from the keyboard, you
can’t ask the user to try again if there’s a mistake in input. To deal
with these possible errors, exception handling will accompany many
different file I/O operations in Java.
21.2.3. Text files and binary files
When talking about files, many people divide files into two categories: text files and binary files. A text file can be read by humans. That is, when you open a text file with a simple text editor, it won’t be filled with gibberish and nonsense characters. A Java source file is an excellent example of a text file.
In contrast, a binary file is a file meant only to be read by a computer. Instead of printing out characters meant to be read by a human, the raw bytes of memory for specific pieces of data are written to binary files. To clarify, if we wanted to write the number 2,127,480,645 in a text file, the file would contain the following.
2127480645
However, if we wanted to write the same number in a binary file, the file would contain the following.
~ÎÇE
If you recall, an int
in Java uses four bytes of storage. There’s a
system of encoding called the ASCII table which maps each of the 256
(0–255) numerical bytes to a character. The four characters given above
are the extended ASCII representation of the four bytes of the number
2,127,480,645.
In some sense, the idea of a text file is artificial. All files are binary in the sense that they’re readable by a computer. You’ll take different steps and create different objects depending on whether you want to do file I/O in the text or binary paradigms, but the overall process will be similar in either case.
21.3. Syntax: File operations in Java
21.3.1. The Path
interface
An important tool for interacting with files in Java is the Path
interface. A
Path
object allows you to interact with a file at the operating system level.
You can create a new file, test to see if a file is a directory, find out the
size of a file, and so on. A number of file I/O classes require a Path
object
as a parameter. To use the Path
interface, import java.nio.file.Path
or
java.nio.file.*
.
It’s a little confusing that we’re going to use an object that implements
the Path
interface yet not know what its true type is. In general, it’s not
important to know the object’s type, since it will probably be a type
specialized for your operating system. Instead, all that matters is that the
object has the Path
methods.
In most situations, the Path
interface has replaced the File
class
(java.io.File
), the older way of representing files in Java. The Path
interface works with other tools in the newer, non-blocking java.nio
package,
resulting in file I/O that is generally faster than the older system. For
interoperability, a File
can be turned into a Path
by calling its toPath()
method, and a Path
can be turned into a File
by calling its toFile()
method.
Because Path
is an interface, not a class, we need another class to create
objects that implement the Path
object. In this case, the confusingly named
Paths
class is how we can create an appropriate Path
. Note that s
on
the end of the class’s name. To create a Path
object, call the Paths
class’s
static get()
method with a String
specifying the name of the file.
Path file = Paths.get("file.txt");
Doing so will create a virtual file object associated with the name
file.txt
(which might not exist yet) in the working directory of the
Java program. In this case, the extension txt
doesn’t have any real
meaning. On many systems, the extension (like pdf
or html
) is used
by the operating system to guess which application should open the file.
To Java, however, the extension is just part of the file name. A file
name passed to the get()
method can have any number of periods in it (or
none).
A file name without a directory is all well and good, but file systems are
useful in part because of their hierarchical structure. If we want to create a
file in a particular location, we specify the path in the String
before the name of the file.
Path file = Paths.get("/homes/owilde/documents/file.txt");
In this case, the prefix /homes/owilde/documents/
is the path, and
file.txt
is still the file name. Each slash (/
) separates a
parent directory from the files or directories inside of it. This path
specifies that we start at the root, go into the homes
directory, then
the owilde
directory, and then the documents
directory. Note that we
can also use a single period (.
) in a path to refer to the current
working directory and two periods (..
) to refer to a parent directory.
This is one of those sticky places where Java’s trying to be platform
independent, but the platforms each have different needs. The example we
gave above is for a macOS or Linux system. In Windows, the way to specify
the path is slightly different. Creating a similar Path
object on
Windows might be done as follows.
Path file = Paths.get("C:\\Users\\owilde\\Documents\\file.txt");
Then, the path specifies that we start in the C
drive, go into the
Users
directory, the owilde
directory, and then the Documents
directory.
Windows systems use a backslash (\
) to separate a parent directory from its
children. But in Java a backslash isn’t allowed to be by itself in a
string literal, and so each backslash must be escaped with another
backslash. To simplify things somewhat, Java allows Windows paths to be
separated with regular slashes as well, so we’ll use this style for
the rest of the book.
A further complication is that file and directory names are case sensitive in Linux, aren’t case sensitive in Windows, and could be either in macOS depending on file system settings.
Returning to Path
objects, they aren’t particularly useful on their own. The
best way to think of a Path
object is as a String
broken into parts where
each part represents a directory or a file name. You can go up a directory by
calling the getParent()
method. If the current Path
represents a directory,
you can select a file or another directory inside it by using the resolve()
method. In addition, the Files
class (java.nio.file.Files
) has methods that
can test if a file associated with a Path
exists, if it’s readable, if it’s
writable, if it’s a directory, and many other things. Because there are so many
classes associated with file I/O and each class has so many methods, now’s a
good time to remind you of the usefulness of the Java API. If you visit
the Java API documentation site,
you can get detailed documentation for the entire standard library, including
file I/O classes.
21.3.2. Reading text files
Once you have a Path
object, most of its usefulness comes from combining
it with other classes. You’re already familiar with the Scanner
class. The Scanner
constructor can take a Path
object (instead of
System.in
), creating a Scanner
that reads from a text file instead of the
keyboard.
Scanner in = null;
try {
in = new Scanner(file);
while (in.hasNextInt()) {
process(in.nextInt());
}
} catch (IOException e) {
System.out.println("File " + file + " not found!");
} finally { if (in != null) { in.close(); } }
Assuming that file
is linked to a file which the program has read
access to, this block of code will extract int
values from the file
and pass them to the process()
method. If the file doesn’t exist or
isn’t readable to the program, an IOException
will be thrown
and an error message printed. Creating a Scanner
from a Path
object
instead of System.in
can throw a checked exception, so the try
and catch
are needed before the program will compile. Note that you’ll need to import
java.util.Scanner
or java.util.*
just like any other time you use the
Scanner
class.
And that’s all there is to it. After opening the file, using the
Scanner
class will be almost the same as before. One difference is
that you should close the Scanner
object (and by extension the file)
when you’re done reading from it, as we do in the example. Closing
files is key to writing robust code.
21.3.3. Using try
-with-resources
In the reading example above, you’ll notice that we put in.close()
in a
finally
block. File operations could fail for any number of reasons, but you
still need to close the file afterward. We put in the null
check in case the
file didn’t exist and the reference in
never pointed to a valid object.
As you can see, adding this finally
block is cumbersome, but if you forget to
add it or write the code inside incorrectly, you could leave the file open. Open
files are a drain on operating system resources, and there’s a limit to how many
open files a program can have at once. More importantly, if you don’t close a
file your program has been writing to, some of the data written to the file
might be lost.
To make it easier to write code that correctly closes files, Java 7 extended the
syntax of try
blocks, adding a version called try
-with-resources. This
version adds parentheses after the try
where variables can be declared and
instantiated. These variables will only be available during the try
block, and
their close()
methods will automatically be called afterwards, even if an
exception is thrown.
The code below shows how we can rewrite the earlier example of reading from a
text file using try
-with-resources.
try (Scanner in = new Scanner(file)) {
while (in.hasNextInt()) {
process(in.nextInt());
}
} catch (IOException e) {
System.out.println("File " + file + " not readable!");
}
As you can see, the code is both shorter and less prone to errors. In situations
where there are multiple I/O objects that need to be used within a try
block,
their declarations can be separated by semicolons. For the rest of the book, we
will use this try
-with-resources style for file and network I/O code, as it is
preferred by professionals.
21.3.4. Writing text files
Writing information to a file is similar to using System.out
. First,
you need to create a PrintWriter
object. Unlike Scanner
, you can’t
create a PrintWriter
object directly from a Path
object. Since PrintWriter
was designed for the older File
class, we have to call the toFile()
method
on our Path
object first.
If we want to write a list of 100 random numbers to the file we were reading from earlier, we could do it as follows.
try (PrintWriter out = new PrintWriter(file.toFile())) {
Random random = new Random();
for (int i = 0; i < 100; ++i) {
out.println(random.nextInt());
}
} catch (FileNotFoundException e) {
System.out.println("File " + file + " not writable!");
}
Again, once you have a PrintWriter
object, the methods for outputting
data are just like using System.out
. Be sure to import java.io.*
in order
to have access to the PrintWriter
class.
Pitfall: Destroying file contents
Programmers new to file I/O are sometimes unsure what will happen when an existing file is opened for writing. Will new content be written at the end of the old file? Will it overwrite the data, line by line? While there are Java tools that will allow output to be appended to the end of a
file, the default for most output, including Be especially careful when opening data files that can’t easily be recreated since there might not be any way to retrieve the data. Remember: Opening a file for reading is safe, but opening a file for writing will usually delete all its existing contents. |
21.3.5. Reading and writing binary files
We covered text files first because their input and output is similar to command-line I/O. When reading and writing text files, you can visually verify that file reading and writing operations were successful. Although it’s harder to check the contents of binary files, they have other advantages. Data can often be stored more compactly in binary files, as in the example with the integer 2,127,480,645. Even better, Java provides facilities for easily dumping (and later retrieving) primitive data types, objects, and even complex data structures to binary files.
The simplest object for reading input from a binary file is a
FileInputStream
object. As with a Path
object, you can create a
FileInputStream
object from a String
specifying the file path and name.
FileInputStream in = new FileInputStream("file.bin");
Unfortunately, you can’t do much with a FileInputStream
object.
Its methods allow you to read single bytes, either one at a time or into an
array as a group. The basic read()
method returns the next byte in the file
or a -1 if the end of the file has been reached. Working only at the level of
bytes, we can still write useful code like the following method that prints the
size of a file.
public static void printFileSize(String fileName) {
try (FileInputStream in = new FileInputStream(fileName)) {
int count = 0;
while (in.read() != -1) {
++count;
}
System.out.println("File size: " + count + " bytes");
} catch (IOException e) {
System.out.println("File " + fileName + " not readable!");
}
}
To output a sequence of bytes, you can create a FileOutputStream
object.
Its write()
methods are the mirror images of the read()
methods in
FileInputStream
. It would be convenient if there were ways to read
and write any primitive type instead of just byte
values, and
DataInputStream
and DataOutputStream
provide exactly that functionality.
For output, a DataOutputStream
chops up primitive data types into their
component bytes and sends those bytes to a FileOutputStream
. For input, a
DataInputStream
reads a sequence of bytes from a FileInputStream
and
reassembles them into whatever kind of primitive data they’re supposed to be.
To create an DataInputStream
, you supply a FileInputStream
to its
constructor, usually one that you’ve just created on the fly for this purpose.
DataInputStream in = new DataInputStream(new FileInputStream("baseball.bin"));
Now, let’s assume that baseball.bin
contains baseball statistics. The
first thing in the file is an int
indicating the number of records it
contains. Then, for each record, it’ll list home runs, RBI, and
batting average, as an int
, an int
, and a double
, respectively.
We can read these statistics into three arrays with the following code.
try (DataInputStream in = new DataInputStream(new FileInputStream("baseball.bin"))) {
int records = in.readInt();
int[] homeRuns = new int[records];
int[] rbi = new int[records];
double[] battingAverage = new double[records];
for (int i = 0; i < records; ++i) {
homeRuns[i] = in.readInt();
rbi[i] = in.readInt();
battingAverage[i] = in.readDouble();
}
} catch (IOException e) {
System.out.println("File reading failed.");
}
When opening the file in the FileInputStream
constructor, a
FileNotFoundException
will be thrown if the file doesn’t exist or is
inaccessible. If the readInt()
or readDouble()
methods fail, they’ll throw
an IOException
. If the DataInputStream
object tries to read past the end
of a file, it’ll throw an EOFException
exception. If you want to deal with
these exceptions separately, you can, but since FileNotFoundException
and
EOFException
are both children of IOException
, a single catch
clause
for IOException
handles all three.
As expected, the DataOutputStream
methods for writing to a file match
DataInputStream
methods for reading from a file. If you substitute write
for
read
, DataOutputStream
methods are almost the same as DataInputStream
methods. Below is a companion piece of code which assumes that homeRuns
,
rbi
, and battingAverage
are filled with data and writes them to a file.
try (DataOutputStream out = new DataOutputStream(new FileOutputStream("baseball.bin"))) {
out.writeInt(homeRuns.length);
for (int i = 0; i < homeRuns.length; ++i) {
out.writeInt(homeRuns[i]);
out.writeInt(rbi[i]);
out.writeDouble(battingAverage[i]);
}
} catch (IOException e) {
System.out.println("File writing failed.");
}
Using DataInputStream
and DataOutputStream
in this way isn’t
too difficult, but it seems cumbersome. The programmer has the responsibility
to read and write every piece of primitive data separately. It would be
convenient if there was a way to read an entire object at once, including any
references to other objects that it contains. If a tool exists for reading
an entire object, we’d also want a matching tool for writing an entire
object at once.
Such tools can be found in the ObjectInputStream
and
ObjectOutputStream
classes, respectively. These file I/O objects provide
methods that elegantly allow you to read or write a whole object at a time. To
use them with our baseball data example, we need to define a new class.
import java.io.Serializable;
public class BaseballPlayer implements Serializable {
private int homeRuns;
private int rbi;
private double battingAverage;
public BaseballPlayer(int homeRuns, int rbi, double battingAverage) {
this.homeRuns = homeRuns;
this.rbi = rbi;
this.battingAverage = battingAverage;
}
public int getHomeRuns() { return homeRuns; }
public int getRbi() { return rbi; }
public double getBattingAverage() { return battingAverage; }
}
The new class BaseballPlayer
encapsulates the three pieces of
information we want. Note that it also implements the interface
Serializable
, but it doesn’t seem to implement any special methods to
conform to the interface. We’ll discuss this interface more after we show how
using this new class can simplify file I/O. Our input code will change to
the following. In addition to the IOException
that could be caused by a
missing or unreadable file, we must also catch a ClassNotFoundException
in the
event that the data file contains a class that our program doesn’t recognize.
try (ObjectInputStream in = new ObjectInputStream(new FileInputStream("players.bin"))){
int records = in.readInt();
BaseballPlayer[] players = new BaseballPlayer[records];
for(int i = 0; i < players.length; ++i) {
players[i] = (BaseballPlayer)in.readObject();
}
}
catch (IOException | ClassNotFoundException e) {
System.out.println("File reading failed.");
}
The corresponding output code will become the following.
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("players.bin"))) {
out.writeInt(players.length);
for (int i = 0; i < players.length; ++i) {
out.writeObject(players[i]);
}
} catch (IOException e) {
System.out.println("File writing failed.");
}
This process of outputting an entire object at a time is called
serialization. The BaseballPlayer
class is very simple, but even
complex objects can be serialized, and Java takes care of almost
everything for you. The only magic needed is for the class that’s
going to be serialized to implement Serializable
. There are no methods
in Serializable
. It’s just a tag for a class that can be packed up
and stored. The catch is that, if there are any references to other
objects inside of the object being serialized, they must also be
serializable. Otherwise, a NotSerializableException
will be thrown
when the JVM tries to perform the serialized output. Many classes are
serializable, including the vast majority of the Java API.
However, objects that have some kind of special system-dependent state,
like a Thread
or a FileInputStream
object, can’t be serialized. If you
need to serialize a class with references to objects like these, add the
transient
keyword to the declaration of each unserializable reference. That
said, these should be few and far between. For BaseballPlayer
, adding
implements Serializable
was all we needed, and we can still get
more mileage out of serialization! An array can be treated liked an
Object
and is also serializable. We can further simplify the input as
shown below.
try (ObjectInputStream in = new ObjectInputStream(new FileInputStream("players.bin"))){
BaseballPlayer[] players = (BaseballPlayer[])in.readObject();
}
catch (IOException | ClassNotFoundException e) {
System.out.println("File reading failed.");
}
And the corresponding output code can be simplified as well.
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("players.bin"))) {
out.writeObject(players);
} catch (IOException e) {
System.out.println("File writing failed.");
}
When you go to write your code, which binary file I/O classes should you use?
It depends on the situation. FileInputStream
and FileOutputStream
are very
low level. You’ll use those classes to construct DataInputStream
,
DataOutputStream
, ObjectInputStream
, and ObjectOutputStream
objects, but
you probably won’t use them on their own unless your application is focused on
byte-level input and output.
If you only want to read and write primitive types, you can use
DataInputStream
and DataOutputStream
objects. These classes have methods
to read and write all primitive types. Ultimately, all objects are made up of
primitive types, though those primitive types might be buried inside of other
objects. Most languages provide binary I/O tools like DataInputStream
and
DataOutputStream
, so code using these objects will be similar to code in other
languages that writes individual pieces of primitive data.
Finally, if you want to read and write whole objects (or even arrays of objects)
at a time, ObjectInputStream
and ObjectOutputStream
are powerful tools.
Using them leverages Java serialization, making the JVM do the work of dividing
up your objects into primitive types and writing them (for output) or
reading that primitive data and reassembling it into objects (for input). Even
complex objects with many (potentially circular) references to other objects,
like linked list classes, can be serialized. The Java implementation of
serialization is smart enough to write each unique object only once and then
refer to it later. This power feels like magic, but serialization has
limitations. Only classes that implement the Serializable
interface can
be serialized. Also, serialization carries with it some overhead: The files
must contain additional metadata describing the class being stored. Using
DataInputStream
and DataOutputStream
can allow you to write only the
necessary member data, resulting in a smaller file. You can also run into
trouble if you make changes to a class. If one version of an object is
serialized and then you try to read that object back after making the smallest
change to the class, your code will fail. One last issue is that Java
serializes objects according to its own rules, making files written in this
way difficult (if not impossible) to use with code written in other languages.
This consideration is not insignificant, since files are often written by one
program and read by another, perhaps on a completely different computer.
The following table summarizes the three approaches to binary I/O we’ve discussed. Be sure to consult each class’s documentation for more information.
Class | Use | Purpose | Limitations |
---|---|---|---|
|
Input |
Simplest binary I/O |
Can only read and write |
|
Output |
||
|
Input |
Binary I/O for primitive types |
Can’t read and write whole objects |
|
Output |
||
|
Input |
Powerful binary I/O for all types |
Depends on Java serialization and can’t work with files created in other ways |
|
Output |
21.3.6. Using JFileChooser
One of the more tedious aspects of working with files in command-line programs
is typing the name of the file correctly. Although most command-line shells
have time-saving autocomplete features to help users choose the right file name,
typing the name of a file directly into a prompt so that it can be read by
a Scanner
object is error prone. Navigating long directory paths can also
be a headache when using a command-line interface.
Indeed, most users are used to selecting files to open or to save via a GUI
file chooser instead of typing file names explicitly. The Java Swing library
provides such a file chooser called JFileChooser
.
We discussed fully featured GUI programs in Chapter 16, but like the
JOptionPane
class covered in Chapter 7,
JFileChooser
can be used with or without a complex GUI.
Unlike the JOptionPane
class whose functionality is accessed through static
methods, you must create a JFileChooser
object to use it.
JFileChooser chooser = new JFileChooser();
Once you’ve created the JFileChooser
object, you can call either the
showOpenDialog()
method to show a dialog to open existing files or the
showSaveDialog()
to save a potentially new file. The dialog looks similar
in either case, with only minor differences such as title and buttons names.
int result = chooser.showOpenDialog(null);
Both methods take a Component
object as an argument. If you’re creating a
GUI program, you can pass in a JFrame
or a JDialog
for this argument to pop
up a modal file chooser dialog that must be dealt with before returning control
to the parent frame or dialog. If your program doesn’t otherwise use a GUI,
you can pass in null
.
Both methods also return an int
value indicating the result of user input.
A return value of JFileChooser.APPROVE_OPTION
means that the user selected a
file. The value JFileChooser.CANCEL_OPTION
means that the user canceled
instead of picking a file. Finally, JFileChooser.ERROR_OPTION
means some
error occurred.
Once the user has selected a file for opening or for saving, you can call the
getSelectedFile()
method to retrieve the File
that the user selected.
File file = chooser.getSelectedFile();
If the user canceled or an error occurred, this File
object could be null
.
Because JFileChooser
is an older tool, it gives back a File
instead of a
Path
, but we can call the toPath()
method on the File
to get an equivalent
Path
.
In many cases, a programmer will want to focus the user on files of a certain
kind. For example, a program that plays audio files might display only
files that have extensions associated with audio formats such as wav
, mp3
,
and flac
. To include this functionality, you can create a
FileNameExtensionFilter
object and set it as a file filter on your
JFileChooser
.
FileNameExtensionFilter filter = new FileNameExtensionFilter("Audio", "wav", "mp3", "flac");
chooser.setFileFilter(filter);
The first argument to the FileNameExtensionFilter
constructor is a
user-friendly description of the kinds of files displayed by the filter. After
the description, the constructor takes a variable number of arguments, each of
which gives one of the included file extensions. Extensions are case
insensitive and should not include a dot (.
) at the beginning. You should set
the file filter before displaying a dialog with either showOpenDialog()
or
showSaveDialog()
.
FileNameExtensionFilter
covers most of what people want from a file filter,
but it’s possible to create your own class that extends FileFilter
if you
need to filter files based on more complex criteria.
Next, we’ll give a short example that uses a JFileChooser
.
JFileChooser
The short program below allows a user to select a file using
JFileChooser
. Only image files with jpg
or png
extensions will be
displayed. Once the file has been selected, the program will print out the
number of bytes of storage that the file uses.
JFileChooser
.import javax.swing.JFileChooser; (1)
import javax.swing.filechooser.FileNameExtensionFilter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
public class FileChooserExample {
public static void main(String[] args) {
JFileChooser chooser = new JFileChooser(); (2)
FileNameExtensionFilter filter = new FileNameExtensionFilter("Images", "jpg", "png");
chooser.setFileFilter(filter); (3)
int result = chooser.showOpenDialog(null); (4)
if (result == JFileChooser.APPROVE_OPTION) { (5)
Path file = chooser.getSelectedFile().toPath(); (6)
if (Files.exists(file)) { (7)
try {
long size = Files.size(file);
System.out.println("The file contains " + size + " bytes.");
} catch (IOException e) {
System.out.println("There was a problem accessing the file.");
}
} else {
System.out.println("The file doesn't exist.");
}
} else {
System.out.println("The user probably canceled.");
}
}
}
1 | Imports are needed for the JFileChooser , the FileNameExtensionFilter ,
the Path object, the Files class that we can use to interact with the
Path , and an appropriate exception type. |
2 | First, we create the JFileChooser object. |
3 | Next, we create and then set a file filter appropriate for image files. |
4 | We show the open dialog to allow the user to select a file. |
5 | This constant tells us that the user selected a file rather than canceling. |
6 | We get a File object from the file chooser and then convert it into a
Path . |
7 | If the file exists, we get its size in bytes and print that out. |
If the file doesn’t exist, it’s inaccessible, or the user hits cancel, we print
appropriate messages in those cases as well. Note that the Files
class is used
for many interactions with Path
objects.
Figure 21.1 shows the dialog displayed by this program.
JFileChooser
showing an open dialog.The purpose of JFileChooser
is to allow users to select a file. It
doesn’t guarantee that the file exists or that the user has rights to read
from the file or to write to it. Most commercial software asks the user if
he or she wants to overwrite an existing file when saving. JFileChooser
doesn’t have that functionality built in, requiring additional program logic to
to prompt the user if an existing file is about to be overwritten.
Like most of the Swing library, JFileChooser
has many options and features
that we don’t have time to cover. Its display can be customized, and it can be
configured to interact with the file system in a number of ways, such as
displaying only normal files, displaying only directories, or both. There are
even settings that allow the user to select multiple files at once.
21.4. Examples: File examples
Let’s return to the Path
interface and look at another example of how to
use it. It’s often useful to know the contents of a directory. At the
Windows command prompt, this is usually done using the dir
command; in
Linux and macOS, the ls
command is generally used. In a few lines of
code, we can write a directory listing tool that lists all the files in
a directory, the date each file was last modified, and whether or not a file
is a directory.
import java.io.IOException;
import java.nio.file.*;
import java.text.DateFormat;
import java.util.Date;
import java.util.stream.Stream;
public class Directory {
public static void main(String[] args) {
Path directory = Paths.get("."); (1)
try (Stream<Path> files = Files.list(directory)) { (2)
files.forEach(file -> printFile(file));
}
catch (IOException e) {
System.out.println("Files in the directory could not be listed.");
}
}
public static void printFile(Path file) {
try {
long milliseconds = Files.getLastModifiedTime(file).toMillis(); (3)
String date = DateFormat.getDateInstance().format(new Date(milliseconds));
System.out.print(date + "\t");
if (Files.isDirectory(file)) { (4)
System.out.print("directory");
} else {
System.out.print("\t");
}
System.out.println("\t" + file.getFileName()); (5)
}
catch (IOException e) {
System.out.println("Could not get last modified time from " + file);
}
}
}
1 | The code first creates a Path object using "." to
specify the current working directory. |
2 | The Files.list() method returns a stream of File objects which we can
process. Refer to Section 13.4 for more information about streams. |
3 | We use two more method calls in Files to get the time each file was
last modified and then convert to the number of milliseconds since January 1, 1970.
This time can then be formatted as a date. |
4 | We then use the isDirectory() method to see if the file is a directory. |
5 | Finally, we print the name of the file without any preceding path, given by
getFileName() . |
The output for this program might look like the following.
Aug 5, 2024 AreaFromRadiusBinary.java Aug 1, 2024 AreaFromRadiusText.java Aug 5, 2024 BaseballPlayer.java Aug 5, 2024 BitmapCompression.java Aug 5, 2024 ConcurrentFileAccess.java Aug 8, 2024 Directory.class Aug 8, 2024 Directory.java Aug 1, 2024 FileChooserExample.java Aug 8, 2024 directory Images Aug 5, 2024 areas.bin Aug 5, 2024 areas.txt Aug 5, 2024 radiuses.bin Aug 5, 2024 radiuses.txt
Now, let’s look at a data processing application of files. Let’s assume
that there’s a file called radiuses.txt
which holds the radiuses of a
number of circles formatted as text, one on each line of the file. It’s
our job to read each radius r, compute the areas of each circle using the
formula Area = πr2, and write those areas to a file called areas.txt
.
import java.io.*;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
public class AreaFromRadiusText {
public static void main(String[] args) {
Path inFile = Paths.get("radiuses.txt");
Path outFile = Paths.get("areas.txt");
try (var in = new Scanner(inFile); (1)
var out = new PrintWriter(outFile.toFile())) { (2)
while (in.hasNextDouble()) { (3)
double radius = in.nextDouble(); (4)
out.format("%.3f%n", Math.PI*radius*radius); (5)
}
}
catch (IOException e) { (6)
System.out.println(e.getMessage());
}
}
}
1 | Inside the try -with-resources, we create a Scanner to read text from a
file. |
2 | We also create a PrintWriter to write text to a file. Because both file
I/O objects are in the header of the try , they’ll both be closed after the
try block. |
3 | We continue reading as long as there’s another piece of text formatted
as a legal double . |
4 | We read in the double , just as we would from a user typing on the
keyboard. |
5 | We use the format() method to output to a file a double formatted with
exactly three digits after the decimal point followed by a newline, just as
we’ve used System.out.format() in the past. |
6 | As is typical with file I/O, we have to catch exceptions. An
IOException would have been thrown if either file was inaccessible. |
Perhaps the input file radiuses.txt
contains the following 10 values.
33.675 4.156 8.608 60.350 86.501 78.581 23.935 2.263 26.827 73.358
Then, the output file areas.txt
would be filled with these corresponding
10 values. Note that formatting the output to have three digits after the
decimal point is easier to read, but it loses some precision.
3562.584 54.263 232.785 11442.065 23506.725 19399.252 1799.769 16.089 2260.966 16906.155
The previous class did all of its input and output with text files. We’ll
also implement this program to read from a binary file called radiuses.bin
and write to a binary file called areas.bin
.
import java.io.*;
public class AreaFromRadiusBinary {
public static void main(String[] args) {
try (var in = new DataInputStream(new FileInputStream("radiuses.bin")); (1)
var out = new DataOutputStream(new FileOutputStream("areas.bin"))) { (2)
while (true) { (3)
double radius = in.readDouble();
out.writeDouble(Math.PI*radius*radius);
}
}
catch (EOFException e) {} // End of file reached (4)
catch (IOException e) {
System.out.println(e.getMessage());
}
}
}
1 | We change the Scanner from the text version of this program to a
DataInputStream . |
2 | We change the PrintWriter to a DataOutputStream . Again, both objects are
in the header of the try so that they’ll be closed after the try block. |
3 | We make what appears to be the strange choice of changing the while
loop to an infinite loop. We do this because the easiest way to see if there’s
any more data in a binary file is to keep reading until an
EOFException is thrown. |
4 | When the EOFException is thrown, we do nothing to handle it because, in
this case, it’s the signal to stop reading. |
21.5. Solution: A picture is worth 1,000 bytes
Now, we’ll give the solution to the problem posed at the beginning of
the chapter. First, let’s look at the class definition and main()
method.
import java.io.*;
public class BitmapCompression {
public static void main(String[] args) {
if (args.length != 2) { (1)
System.out.println("Usage: java BitmapCompression (-c|-d) file");
} else {
try (var in = new DataInputStream(new FileInputStream(args[1]))) { (2)
if (args[0].equals("-c")) { (3)
compress(in, args[1]);
} else if(args[0].equals("-d")) {
decompress(in, args[1]);
}
}
catch (IOException e) { (4)
System.out.println("File not found: " + e.getMessage());
}
}
}
1 | We first check that there are exactly two command-line arguments. Otherwise, our program would crash if we try to access invalid array locations. For the wrong number of arguments, we print a usage message. |
2 | If we have the right number of arguments, we open a DataInputStream
based on the file named passed as the second command-line parameter. |
3 | Then, we either compress or decompress the file depending on which switch was passed as the first command-line parameter. |
4 | The catch will print an error message if a file can’t be opened, read, or
written. |
public static void compress(DataInputStream in, String file) {
String compressed = file + ".compress";
try (var out = new DataOutputStream(new FileOutputStream(compressed))) { (1)
byte current = 0;
int count = 1;
try {
current = in.readByte(); (2)
while (true) {
byte temp = in.readByte(); (3)
if (temp == current && count < 127) {
++count; (4)
} else { (5)
out.writeByte(count);
out.writeByte(current);
count = 1;
current = temp;
}
}
}
catch (EOFException e) { // Last bytes (6)
out.writeByte(count);
out.writeByte(current);
}
}
catch (IOException e) { (7)
System.out.println("Compression failed: " + e.getMessage());
}
}
1 | In the compress() method, we first open a new DataOutputStream for
a file whose name is the input file name with .compress tacked on the end. |
2 | We read in the first byte of data. |
3 | Then, we keep reading bytes of data from the input file. |
4 | As long as we keep seeing the same byte, we increment a counter. |
5 | When we run into a new byte (or when we reach the limit of 127 of the same consecutive byte), we write the count and the byte we’ve been reading and move on, resetting the counter. |
6 | When an EOFException is thrown, we’ve reached the end of the file.
Because of the way our code is structured, we’ll always have at least one byte
(and possibly a long sequence of matching bytes) that we haven’t yet written
to the file. Consequently, we have to write the counter and the current byte
to finish. |
7 | The method finishes with the usual catch for error cases. |
public static void decompress(DataInputStream in, String file) {
String original = file.substring(0, file.lastIndexOf(".compress")); (1)
try (var out = new DataOutputStream(new FileOutputStream(original))){ (2)
while (true) {
int count = in.readByte(); (3)
byte temp = in.readByte();
for (int i = 0; i < count; ++i) {
out.writeByte(temp);
}
}
}
catch (EOFException e) {} // Input finished (4)
catch (IOException e) { (5)
System.out.println("Decompression failed: " + e.getMessage());
}
}
}
1 | The decompress() method is simpler than compress() . It begins
by finding the original name of the file by removing .compress . Note that
this code will crash if the file being decompressed doesn’t end with
.compress . |
2 | Then, we open a new DataOutputStream for a file with the original name
we’ve found. |
3 | Next, we read a counter, read a byte value, and write the byte value as many times as the count specifies. |
4 | As before, an EOFException signals the end of the input file. |
5 | An additional catch deals with errors. |
21.6. Concurrency: File I/O
By now, you’ve seen threads behave in unpredictable ways because of the way they’re reading and writing to shared variables. Isn’t a file a shared resource as well? What happens when two threads try to access a file at the same time? If both threads are reading from the file, everything should work fine. If the threads are both writing or doing a combination of reading and writing, there can be problems.
As we mentioned in Section 21.3, file operations are operating system dependent. Although Java tries to give a uniform interface, different system calls are happening at a low level. Consequently, the results may be different as well.
Consider the following program that spawns two threads that both print a
series of numbers to a file called concurrent.out
. The first thread
prints the even numbers between 0 and 9,999 while the second thread
prints the odd ones.
import java.io.*;
import java.nio.file.Paths;
public class ConcurrentFileAccess implements Runnable {
private boolean even;
public static void main(String args[]) {
Thread writer1 = new Thread(new ConcurrentFileAccess(true)); (1)
Thread writer2 = new Thread(new ConcurrentFileAccess(false));
writer1.start(); (2)
writer2.start();
}
public ConcurrentFileAccess(boolean even) {
this.even = even;
}
public void run() {
int start = even ? 0 : 1; (3)
try (var out = new PrintWriter(Paths.get("concurrent.out").toFile())) { (4)
for (int i = start; i < 10000; i += 2) { (5)
out.println(i);
}
}
catch (FileNotFoundException e) {
System.out.println("concurrent.out not accessible!");
}
}
}
1 | The code in this program should have few surprises. The main() method
creates two Thread objects from ConcurrentFileAccess objects,
each with a different value for its even field. |
2 | Then, the main()
method starts the threads running. |
3 | In each thread’s run() method, it first decides on a starting point of
0 or 1 depending on whether it’s supposed to be even or odd. Note the use of
the ternary operator. |
4 | Each thread opens the file concurrent.out . |
5 | Then, it starts printing out even or odd numbers, depending on its starting point. |
What do you expect the file concurrent.out
to look like after the
program’s completed? Run it several times, on Windows, Linux, and macOS
systems if you can. The file might contain runs switching back
and forth between even numbers and odd numbers, but it’s likely that half the
numbers, either the evens or the odds, will be missing.
Why are half the numbers getting lost? When you open a file for writing,
it overwrites the contents of the file. Thus, entire sequences of numbers are
getting saved and then lost. We can change this behavior by changing the code
inside the header of the try
given below.
var out = new PrintWriter(Paths.get("concurrent.out").toFile())
We replace it with the following.
var out = new PrintWriter(new FileOutputStream(Paths.get("concurrent.out").toFile(), true))
The PrintWriter
constructor that takes a File
object actually calls another
constructor internally that builds an output stream object. Instead, we can
pass in a FileOutputStream
object created with a second boolean
parameter
set to true
. Doing so creates a FileOutputStream
stream (and consequently a
PrintWriter
) whose output will be appended to the file instead of
overwriting it.
After this change, what does the file look like when we run the program?
Since we’re going to append to the file instead of overwriting, make sure that
you delete concurrent.out
before running the program again. As usual, the
file might look different on different systems. The file probably contains long
runs of numbers from each thread. In fact, it’s possible to have the complete
output from one thread followed by the complete output from the other.
For performance reasons, file operations are usually done in batches.
Instead of writing each number to the file as the thread produces it,
output is usually stored in a buffer which is written as a whole. By calling
out.flush()
after each out.println()
call, we could flush
the buffer to the file after each number is generated. Doing so won’t
be as efficient, but it may give us some insight into how concurrent
writes on files work.
Using flushes, the output from the two threads should be thoroughly intermixed. On a Windows machine, if you copy the data from the file and sort it, it’s possible that you’ll see some numbers missing. This lost output is similar to situations where updates to variables were lost because they were overwritten by another thread. On the other hand, most Linux systems have better concurrent file writing and won’t lose any numbers. (Even on Linux, it’s possible for a number to be printed in the middle of another number, but no digits should be lost.)
Under ideal circumstances, no two threads or processes should be writing
to the same file. However, this situation is sometimes unavoidable, as
with a database program that must support concurrent writes for the sake
of speed. If you need to enforce file locking, you can prevent threads
within your own program from accessing a file concurrently by using
normal Java synchronization tools. If you expect other programs to
interact with the same files that your program will use, Java provides a
FileLock
class which allows the user to lock (portions of) a file, either in
an exclusive way for writing or in a shared way for reading. Using FileLock
requires use of the FileChannel
class, a different way of opening and
interacting with files.
21.7. Exercises
Conceptual Problems
-
What’s the difference between volatile and non-volatile memory? Which is usually associated with files and why?
-
What’s the difference between text and binary files? What are the pros and cons of using each?
-
We can define compression ratio to be the size of the uncompressed data in bytes divided by the size of the compressed data in bytes. What’s the theoretical maximum compression ratio you could get out of the RLE encoding we used? What’s the theoretical lowest compression ratio you could get out of the RLE encoding we used?
-
What’s serialization in Java? What do you have to do to serialize an object?
Programming Practice
-
Write methods with the following signatures.
-
public static int readInt(FileInputStream in)
-
public static long readLong(FileInputStream in)
-
public static short readShort(FileInputStream in)
In each case, the method should read the appropriate number of bytes (4 for
int
, 8 forlong
, and 2 forshort
) using theFileInputStream
object and reassemble those bytes into the integer type specified. Your methods should be compatible with integers written by aDataOutputStream
object. Note that such data is written in big-endian format. In other words, the first byte of data corresponds to the most significant byte in an integer, the second byte of data corresponds to the second-most significant byte, and so on. -
-
Program 21.5 from Example 21.3 computes the areas of circles whose radiuses are given as
double
values stored in a binary file. Although we provided a sample text file, we didn’t show a sample binary file since the contents would look like gibberish. Write a program that reads a file filled withdouble
values stored as text and writes those same values into another file, storing thedouble
values in binary. Afterward, you should be able to convert our sample text file into a sample binary file that can be used with Program 21.5. -
Re-implement the RLE bitmap compression program from Section 21.5 using only
FileInputStream
andFileOutputStream
for file input and output. In some ways, doing so is simpler since you only needbyte
input and output for this program. -
Update the RLE bitmap compression program from Section 21.5 to use
JFileChooser
to allow the user to select a file with a GUI instead of using command-line arguments. -
Re-implement the maze solving program from Section 20.5 to ask the user for a file instead of reading from standard input.
-
An HTML file contains many tags such as
<p>
, which marks the beginning of a paragraph, and</p>
, which marks the end of a paragraph. A lesser known feature of HTML is that ampersand (&
) can mark special HTML entities used to produce symbols on a web page. For example,π
is the entity for the Greek letter π. Because of these features of the language, raw text that’s going to be marked up in HTML should not contain less than signs (<
), greater than signs (>
), or ampersands (&
).Write a program that reads in an input text file specified by the user and writes to an output text file also specified by the user. The output file should be exactly the same as the input file except that every occurrence of a less than sign should be replaced by
<
, every occurrence of a greater than sign should be replaced by>
, and every occurrence of an ampersand should be replaced by&
. -
Write a program that prompts the user for an input text file. Open the file and read each word from the file, where a word is defined as any
String
made up only of upper- and lowercase letters. You can use thenext()
method in theScanner
class to break up text by whitespace, but your code will still need to examine the input character by character, ending a word when any punctuation or other characters are reached. Store each word (with a count of the number of times you find it) in a binary search tree such as those described in Example 20.10. Then, traverse the tree, printing all the words found (and the number of times found) to the screen in alphabetical order. -
Expand the program from Exercise 21.12 so that it also prompts for a second file containing a dictionary in the form of a word list with one word on each line. Store the words from the dictionary in another binary search tree. Then, for each word in the larger document that you can’t find in the dictionary tree, add it to a third binary search tree. Finally, print out the contents of this third binary search tree to the screen, and you will have implemented a rudimentary spell checker. You can test the quality of your implementation by using a novel from Project Gutenberg and a dictionary file from an open-source spell checker or a Scrabble word list.
-
Files can become corrupted when they’re transferred over a network. It’s common to make a checksum, a short code generated using the entire contents of a file. The checksum can be generated before and after file transmission. If both of the checksums match, there’s a good chance that there were no transmission errors. Of course, there can be problems sending checksums, but checksums are much smaller and therefore less likely to be corrupted. Modern checksums are often generated using cryptographic hash functions, which are more complex than we want to deal with here. An older checksum algorithm works in the following way. Although we use mathematical notation, the operations specified below are integer modulus and integer division.
-
Add up the values of all the bytes, storing this sum in a
long
variable -
Set sum = sum mod 232
-
Let r = (sum mod 216) + (sum ÷ 216)
-
Let s = (r mod 216) + (r ÷ 216)
-
The final checksum is s
Remember that finding powers of 2 is easy with bitwise shifts. Write a program that opens a file for binary reading using
FileInputStream
and outputs the checksum described. On Linux systems, you can check the operation of your program with thesum
utility, using the-s
option. The following is an example of the command used on a file calledwombat.dat
. The first number in the output below it,6892
, is the checksum. -
sum -s wombat.dat 6892 213 wombat.dat
Experiments
-
Reading single bytes using either
FileInputStream
orDataInputStream
is slow. It’s much faster to read a block of bytes all at once. Re-implement the RLE bitmap compression program from Section 21.5 using theint read(byte[] b)
method from theDataInputStream
class, which tries to fill the arrayb
with as manybyte
values as it can. If there are enough bytes left in the file, the array will be filled completely. If the array is longer than the remaining bytes, only the first part of the array will contain valid bytes. In either case, this method will return the number ofbyte
values successfully read into the array.Using a
byte
array of length 1,024, time the original program against the new version on files with sizes of about 500 KB, 1 MB, and 2 MB. There’s also avoid write(byte[] b, int off, int len)
method inDataOutputStream
that can write an entire array ofbyte
values at once. Using it for output would further increase the speed of your program at the price of greater complexity. -
Write the RLE bitmap compression program from Section 21.5 in parallel so that a file is evenly divided into as many pieces as you have threads, compressed, and then each compressed portion is output in the correct order. Compare the speed for 2, 4, and 8 threads to the sequential implementation. Are any of the threaded versions faster? Why or why not? Run some experiments to see how long it takes to read 1,000,000 bytes from a file compared to the time it takes to compress 1,000,000 bytes which are already stored in an array.