Glob Expression Understanding

Glob expression syntax, and its usage in Java through Path Matcher and Directory Stream.

Overview

In computer programming, glob patterns specify sets of filenames with wildcard characters. There are many cases you can use glob expression, when using Bash, your IDE, or other programs for file searching. The origin of glob comes from the glob command, and was provided as a library function, glob() later on. In this article, we will take look together the glob expression in Java.

After reading this article, you will understand:

  • Basic Glob Syntax
  • Glob in Path Matcher
  • Glob in Directory Stream

Now, let’s get started!

Basic Glob Syntax

Wildcard Description
* matches any number of any characters including none
? matches any single character
[abc] matches one character given in the bracket
[a-z] matches one character from the (locale-dependency) range given in the bracket.

In all cases, the path separator (/ on Unix or \ on Windows) will never be matched. Now, let’s take a look on some examples:

The * Character

The * character matches ≥ 0 characters of a name component without crossing directory boundaries. For example, given the following expression:

*.txt

The matched / unmatched items are:

  • /bar.txt
  • /bar.md
  • /foo/bar.txt
  • /foo/bar.md
  • bar.txt
  • bar.md

The ** Characters

The ** characters matches ≥ 0 characters crossing directory boundaries. For example, given the following expression:

**.txt

The matched / unmatched items are:

  • /bar.txt
  • /bar.md
  • /foo/bar.txt
  • /foo/bar.md
  • bar.txt
  • bar.md

The ? Character

The ? character matches exactly one character of a name component. For example, given the following expression:

?.txt

The matched / unmatched items are:

  • /foo/a.txt
  • /foo/b.txt
  • a.txt
  • b.txt
  • .txt
  • foo.txt

The [] Characters

The [ ] characters are a bracket expression that match a single character of a name component out of a set of characters. For example, [abc] matches “a”, “b”, or “c”. The hyphen (-) may be used to specify a range so [a-z] specifies a range that matches from “a” to “z” (inclusive). These forms can be mixed so [abce-g] matches “a”, “b”, “c”, “e”, “f” or “g”. If the character after the [ is a ! then it is used for negation so [!a-c] matches any character except “a”, “b”, or “c”.

For example, given the following expression:

[abc].txt

The matched / unmatched items are:

  • /foo/a.txt
  • /foo/b.txt
  • /foo/c.txt
  • a.txt
  • b.txt
  • c.txt
  • d.txt
  • ab.txt

Another example, with the following expression:

/foo/[!abc]*.txt

The matched / unmatched items are:

  • /foo/a.txt
  • /foo/b.txt
  • /foo/c.txt
  • /foo/d.txt
  • /foo/e.txt
  • /foo/efg.txt
  • a.txt
  • b.txt
  • c.txt
  • d.txt

Within a bracket expression the *, ? and \ characters match themselves. The (-) character matches itself if it is the first character within the brackets, or the first character after the ! if negating.

Wildcard Expressions

After all these examples, we have a basic understanding of how different glob syntax works individually. But it’s still not clear how their combination works. In particular, the wildcard expressions looks very similiar and confusing. Here is a table of comparison to clarify the usage of wildcard expressions *.txt, **.txt, **/*.txt, and /**/*.txt. Character “M” means matched and “-“ means unmatched:

Path *.txt **.txt **/*.txt /**/*.txt
/bar.txt - M M -
/foo/bar.txt - M M M
/foo/bar/baz.txt - M M M
foo/bar.txt - M M -
bar.txt M M - -

Glob in Path Matcher

In Java, you can match the path with glob expression via java.nio.file.PathMatcher. A Path Match can be created using FileSystem#getPathMatcher(String), which accepts a syntax (glob / regex) and pattern as input parameter:

syntax:pattern

For example, using **.java to find all Java files in current directory and all sub-directories:

PathMatcher m = FileSystems.getDefault().getPathMatcher("glob:**.java");
m.matches(Paths.get("/src/Foo.java")); // true
m.matches(Paths.get("/src/Bar.java")); // true

You can combine it with many other use cases.

Glob in Directory Stream

Glob expression can also be applied to directory stream, an object to iterate over the entries in a directory. DirectoryStream extends Iterable, so you can iterate all the paths in the stream. For example:

try (DirectoryStream<Path> paths = Files.newDirectoryStream(dir, "*.txt")) {
  for (Path path : paths) {
    ...
  }
}

However, you should be aware that the directory stream is a listing on the target directory, without going through the child directories recursively.

  • a.txt
  • b.txt
  • sub/a.txt
  • sub/b.txt

Going Further

How to go further from here?

You can also find the source code of this article on GitHub.

Conclusion

In this article, we learnt the basic syntax of glob expression with examples, we compared different wildcard expressions, we saw how to use glob via PathMatcher, and how to use glob via DirectoryStream. Interested to know more? You can subscribe to the feed of my blog, follow me on Twitter or GitHub. Hope you enjoy this article, see you the next time!

References