Overview

Today, I will share with you how to unzip (extract) a ZIP file into a complete directory. Recently, I need a code snippet for extracting a ZIP file for QA purposes. However, the top results shown on the search engine did not work. So I decided to share my implementation with you. After reading this article, you will understand:

  • How to unzip a given ZIP file?
  • Required and optional parameters before launching the unzip command
  • Limitations

Now let’s get started.

TL;DR

If you don’t have time to read the entire article, here is the summary. You can copy-paste the following code snippet. Then, you have to complete 2 parameters: the source file path (ZIP) to extract (sourceZip) and the target directory to store the extracted files (targetDir). Note that a new directory without the “.zip” suffix will be created in that target directory. For example, extracting zip file tomcat.zip to ~/Downloads target directory, the extracted files will be stored at ~/Downloads/tomcat.

/**
 * Execute the unzip command.
 *
 * @throws IOException if any I/O error occurs
 */
public void exec() throws IOException {
  Path root = targetDir.normalize();
  try (InputStream is = Files.newInputStream(sourceZip);
      ZipInputStream zis = new ZipInputStream(is)) {
    ZipEntry entry = zis.getNextEntry();
    while (entry != null) {
      Path path = root.resolve(entry.getName()).normalize();
      if (!path.startsWith(root)) {
        throw new IOException("Invalid ZIP");
      }
      if (entry.isDirectory()) {
        Files.createDirectories(path);
      } else {
        try (OutputStream os = Files.newOutputStream(path)) {
          byte[] buffer = new byte[1024];
          int len;
          while ((len = zis.read(buffer)) > 0) {
            os.write(buffer, 0, len);
          }
        }
      }
      entry = zis.getNextEntry();
    }
    zis.closeEntry();
  }
}

Now, if you are interested in the complete version, let my explain the longer story for you.

Usage

My unzip command implementation uses the builder pattern so that you can pass arguments as named parameters before launching the unzip command. There are currently 3 parameters:

Parameter Description
sourceZip (REQUIRED) Source filepath to unzip.
targetDir (REQUIRED) Target directory where the unzipped files should be placed. The given input has to be an existing directory.
bufferSize (OPTIONAL) Byte-size for the unzip buffer. The value must be positive. Default to 1024 bytes.

Here are two examples of usage:

UnzipCommand cmd =
    UnzipCommand.newBuilder()
        .sourceZip(sourceZip)
        .targetDir(targetDir)
        .build();
cmd.exec();
UnzipCommand cmd =
    UnzipCommand.newBuilder()
        .sourceZip(sourceZip)
        .targetDir(targetDir)
        .bufferSize(2048)  // optional
        .build();
cmd.exec();

Any I/O failure will be thrown as I/O exception (java.io.IOException).

Implementation

Here is my implementation (see it on GitHub):

package io.mincongh.io;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Objects;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

/**
 * @author Mincong Huang
 * @since 0.1
 */
public class UnzipCommand {

  public static Builder newBuilder() {
    return new Builder();
  }

  public static class Builder {
    private Path targetDir;
    private Path sourceZip;
    private int byteSize = 1024;

    private Builder() {}

    /**
     * (REQUIRED) Source filepath to unzip.
     *
     * @param zip the filepath to unzip
     * @return this
     */
    public Builder sourceZip(Path zip) {
      this.sourceZip = zip;
      return this;
    }

    /**
     * (REQUIRED) Target directory where the unzipped files should be placed. The given input has to
     * be an existing directory.
     *
     * <p>Example: Unzipping "/source/foo.zip" to target directory "/target/", the results will be
     * found in directory "/target/foo/".
     *
     * @param dir existing target directory
     * @return this
     */
    public Builder targetDir(Path dir) {
      this.targetDir = dir;
      return this;
    }

    /**
     * (OPTIONAL) Byte size for the unzip buffer. The value must be positive. Default to 1024 bytes.
     *
     * @param byteSize byte size for the unzip buffer
     * @return this
     */
    public Builder bufferSize(int byteSize) {
      this.byteSize = byteSize;
      return this;
    }

    public UnzipCommand build() {
      Objects.requireNonNull(sourceZip);
      Objects.requireNonNull(targetDir);
      if (byteSize <= 0) {
        throw new IllegalArgumentException("Required positive value, but byteSize=" + byteSize);
      }
      return new UnzipCommand(this);
    }
  }

  private final int byteSize;
  private final Path sourceZip;
  private final Path targetDir;

  private UnzipCommand(Builder builder) {
    this.byteSize = builder.byteSize;
    this.sourceZip = builder.sourceZip;
    this.targetDir = builder.targetDir;
  }

  /**
   * Execute the unzip command.
   *
   * @throws IOException if any I/O error occurs
   */
  public void exec() throws IOException {
    Path root = targetDir.normalize();
    try (InputStream is = Files.newInputStream(sourceZip);
        ZipInputStream zis = new ZipInputStream(is)) {
      ZipEntry entry = zis.getNextEntry();
      while (entry != null) {
        Path path = root.resolve(entry.getName()).normalize();
        if (!path.startsWith(root)) {
          throw new IOException("Invalid ZIP");
        }
        if (entry.isDirectory()) {
          Files.createDirectories(path);
        } else {
          try (OutputStream os = Files.newOutputStream(path)) {
            byte[] buffer = new byte[byteSize];
            int len;
            while ((len = zis.read(buffer)) > 0) {
              os.write(buffer, 0, len);
            }
          }
        }
        entry = zis.getNextEntry();
      }
      zis.closeEntry();
    }
  }
}

In my implementation, the file input stream and ZIP input stream are used to read and extract entries. They are automatically and safely closed at the end using the try-with-resources statement. Each entry in the ZIP file is considered as a ZIP entry (java.util.zip.ZipEntry) and is visited using ZIP input stream. The entry list will be exhausted when all entries are visited once. In other words, the list will be exhauste when the next entry will be null. Note that ZIP entry can be either a directory or a regular file, they need to be treated differently. The size of the output buffer (byte array) is controlled by the parameter bufferSize. It defaults to 1024 bytes.

Update: my friend Florent Guillaume pointed out that the previous version was vulnerable for Zip Slip attack. Now the source code has been updated and the problem has been fixed.

Limitations

  • The file permissions are not preserved. When the ZIP file contains an executable entry, such as rwxr-xr-x, the access permission for the executable is lost.
  • The source code is tested manually on Windows (Windows 10), because Travis CI does not support Windows build for Java project. Let me know if there is any bug.

Conclusion

Today, we saw how to unzip a ZIP file in Java 8+ using java.util.zip.*, more precisely using Zip Entry and Zip Input Stream. The source code is available on GitHub in mincong-h/java-examples as UnzipCommand.java. Interested to know more about Java? You can subscribe to my feed, follow me on Twitter or GitHub. Hope you enjoy this article, see you the next time!

References