[SATLUG] character classes with *

Christopher Lemire christopher.lemire at gmail.com
Mon Oct 27 18:50:39 CDT 2014


On Thu, Oct 2, 2014 at 2:44 PM, Wes Henderson <whendersonii at gmail.com> wrote:
> Sorry for not addressing the question, I hope the examples help
> none-the-less; feel free to hit me up off list if you like. As for the
> question, Bruce is correct that Bash is interrupting the special characters
> and quoting is needed. As for the examples that worked without quotes, I
> would advise that you use the quotes anyway as doing so is POSIX compliant
> and thus you can ensure that your command/script will work between
> environments (being explicit is never a bad thing).

Do not confuse globing with regular expressions.

http://tldp.org/LDP/abs/html/globbingref.html

> Lets do some testing just to be 100% sure what is happening. To do this
> lets run the same command in a new bash instance in debug mode:
>
> [wes at localhost bash-test]$ bash -x -c 'grep [a-z]\* test'
> + grep '[a-z]*' test
> aaLLyLayLaaya*

I'll use -o here to show what is and isn't being matched.

$ echo 'aaLLyLayLaaya*' | grep -o '[a-z]*'
aa
y
ay
aaya

You can also use escape sequences similar to that of for strings in
Java and C (actually character pointer to the first character to the
ending null character \0 as there is not a string keyword in C)
instead of quotes.

* is a quantifier and certain metacharacters need to be escaped. To
get the literal meaning, escape the metacharacter.

$ echo 'aaLLyLayLaaya*' | grep -o '[a-zA-Z]\*'
a*

> [wes at localhost bash-test]$ bash -x -c 'grep "[a-z]*" test'
> + grep '[a-z]*' test
> aaLLyLayLaaya*
> [wes at localhost bash-test]$ bash -x -c 'grep [a-z]* test'
> + grep test test

Both bash and grep interpret * with a special meaning. Bash interprets
it as a file name expansion globbing wild card character, not a
regular expression quantifier.

In your above example, grep never sees the * char as it's already been
interpreted by bash.

Here's an example with another character that has special meaning, $
for variables.

$ echo "$BASH"
/usr/bin/bash

$ echo '$BASH'
$BASH

$ bash -xvc 'echo "aaLLyLayLaaya'*'" | grep -o "'\[a-zA-Z\]\*'"'
echo "aaLLyLayLaaya*" | grep -o "[a-zA-Z]*"
+ grep -o '[a-zA-Z]*'
+ echo 'aaLLyLayLaaya*'
aaLLyLayLaaya

It looks like I just produced a bug in bash with the -v flag. The
first line of the above output does not use double quotes and could
fail to produce the desired results.

$ touch aaLLyLayLaaya{1,2,3}
$ ls aaLLyLayLaaya*
aaLLyLayLaaya1  aaLLyLayLaaya2  aaLLyLayLaaya3
$ echo aaLLyLayLaaya*
aaLLyLayLaaya1 aaLLyLayLaaya2 aaLLyLayLaaya3
$ rm aaLLyLayLaaya{1..3}
$ echo aaLLyLayLaaya*
aaLLyLayLaaya*             # interesting that bash is interpreting a
special character in two ways based on a condition, wasn't expecting
that. It must be because echo is a bash built-in and ls is not.
$ ls aaLLyLayLaaya*
ls: cannot access aaLLyLayLaaya*: No such file or directory

It appears in your first example, you are attempting to match the
entire line, and I thought you were having the issue, my mistake.

> [wes at localhost bash-test]$ bash -x -c 'grep [a-z]\* test'
> + grep '[a-z]*' test
> aaLLyLayLaaya*

I didn't notice the \ in your command that is not shown in the
debugging output. In that case, you can escape the escape :P

$ bash -x -c 'grep -o [a-z]\\* test'
+ grep -o '[a-z]\*' test
a*

That will give different results from your command. I am just adding the -o.

$ bash -x -c 'grep -o [a-z]\* test'
+ grep -o '[a-z]*' test
aa
y
ay
aaya

> As you can see the third example did NOT work and bash interrupted the
> command to be 'grep test test'. Odd. Lets make some new files and test some
> more:

Maybe it seems odd, but in bash file expansion globbing, [a-z]*
matches your file test. bash has interpreted that, so grep never sees
it. If you wanted to aim for that, nesting single quotes with double
quotes should do it.

$ bash -x -c "grep -o '[a-z]*' test"
+ grep -o '[a-z]*' test
aa
y
ay
aaya

> [wes at localhost bash-test]$ touch test2 test3 another_file
>
> [wes at localhost bash-test]$ bash -x -c 'grep [a-z]* test'
> + grep another_file test test2 test3 test
>
> So it is pretty clear that bash, rather than grep, is interrupting the
> special characters. I believe that the big take away here should be that
> being explicit is the safest and most reliable way to achieve your goal.
>
> [wes at localhost bash-test]$ python -c 'import this'|grep Explicit
> Explicit is better than implicit.

Agreed. However python is less explicit as it is a loosely typed language.

> http://www.gnu.org/software/bash/manual/bashref.html#Double-Quotes

Christopher Lemire


More information about the SATLUG mailing list