How to determine the number of grouped numbers in a string in bash
How to determine the number of grouped numbers in a string in bash
I have a string in bash
string="123 abc 456"
Where numbers that are grouped together are considered 1 number.
"123" and "456" would be considered numbers in this case.
How can i determine the number of grouped together numbers?
so
"123"
is determined to be a string with just one number, and
"123 abc 456"
is determined to be a string with 2 numbers.
2 Answers
2
egrep -o '[0-9]+' <<<"$string" | wc -l
egrep:
-o
'[0-9]+':
<<<
|
egrep
wc
wc:
-l
Is there any way to adapt your solution to work with floats?
The regular expression that matches both integer numbers and floating point decimal numbers would be something like this: '[0-9]*.?[0-9]+'
. Inserting this into the command above in place of its predecessor, forms this command chain:
'[0-9]*.?[0-9]+'
egrep -o '[0-9]*.?[0-9]+' <<<"$string" | wc -l
Focussing now only on the regular expression, here's how it works:
[0-9]
*
[0-9]
"2"
"26"
"4839583"
"9.99"
"9"
"99"
*
.
"."
"28s"
"s"
?
*
[0-9]+
+
Applying this to the following string:
"The value of pi is approximately 3.14159. The value of e is about 2.71828. The Golden Ratio is approximately 1.61803, which can be expressed as (√5 + 1)/2."
"The value of pi is approximately 3.14159. The value of e is about 2.71828. The Golden Ratio is approximately 1.61803, which can be expressed as (√5 + 1)/2."
yields the following matches (one per line):
3.14159
2.71828
1.61803
5
1
2
And when this is piped through the wc -l
command, returns a count of the lines, which is 6
, i.e. the supplied string contains 6 occurrences of number strings, which includes integers and floating point decimals.
wc -l
6
If you wanted only the floating point decimals, and to exclude the integers, the regular expression is this:
'[0-9]*.[0-9]+'
If you look carefully, it's identical to the previous regular expression, except for the missing ?
operator. If you recall, the ?
made the decimal point an optional feature to match; removing this operator now means the decimal point must be present. Likewise, the +
operator is matching at least one instance of a digit following the decimal point. However, the *
operator before it matches any number of digits, including zero digits. Therefore, "0.61803"
would be a valid match (if it were present in the string, which it isn't), and ".33333"
would also be a valid match, since the digits before the decimal point needn't be there thanks to the *
operator. However, whilst "1.1111"
could be a valid match, "1111."
would not be, because +
operator dictates that there must be at least one digit following the decimal point.
?
?
+
*
"0.61803"
".33333"
*
"1.1111"
"1111."
+
Putting it into the command chain:
egrep -o '[0-9]*.[0-9]+' <<<"$string" | wc -l
returns a value of 3
, for the three floating point decimals occurring in the string, which, if you remove the | wc -l
portion of the command, you will see in the terminal output as:
3
| wc -l
3.14159
2.71828
1.61803
For reasons I won't go into, matching integers exclusively and excluding floating point decimals is harder to accomplish with Perl-flavoured regular expression matching (which egrep
is not). However, since you're really only interested in the number of these occurrences, rather than the matches themselves, we can create a regular expression that doesn't need to worry about accurate matching of integers, as long as it produces the same number of matched items. This expression:
egrep
'[^.0-9][0-9]+(.([^0-9]|$)|[^.])'
seems to be good enough for counting the integers in the string, which includes the 5
, 1
and 2
(ignoring, of course, the √
symbol), returning these approximately matches substrings:
5
1
2
√
√5
1)
/2.
I haven't tested it that thoroughly, however, and only formulated it tonight when I read your comment. But, hopefully, you are beginning to get a rough sense of what's going on.
d
grep -P
[[:digit:]]
d
should be supported by ERE, hence the use of egrep
rather than grep
, which does work in my version of bash on macOS. But, that said, I can always update the answer to use [0-9]
instead.– CJK
Aug 20 at 15:32
d
egrep
grep
[0-9]
The POSIX ERE spec doesn't mention
d
; it looks like it's a BSD extension to ERE. grep -Eo '[[:digit:]]+'
would be fully portable, I think; [0-9]
potentially expands to something other than [0123456789]
in some locales.– Benjamin W.
Aug 20 at 15:41
d
grep -Eo '[[:digit:]]+'
[0-9]
[0123456789]
You still need the
-E
or egrep
for +
to work, though.– Benjamin W.
Aug 20 at 15:41
-E
egrep
+
Thanks for the tips and corrections, @BenjaminW. Most helpful.
– CJK
Aug 20 at 15:46
In case you need to know the number of grouped digits in string then following may help you.
string="123 abc 456"
echo "$string" | awk 'print gsub(/[0-9]+/,"")'
Explanation: Adding explanation too here, following is only for explanation purposes.
string="123 abc 456" ##Creating string named string with value of 123 abc 456.
echo "$string" ##Printing value of string here with echo.
| ##Putting its output as input to awk command.
awk ' ##Initializing awk command here.
print gsub(/[0-9]+/,"") ##printing value of gsub here(where gsub is for substituting the values of all digits in group with ""(NULL)).
it will globally substitute the digits and give its count(how many substitutions happens will be equal to group of digits present).
' ##Closing awk command here.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
d
is neither supported by BRE or ERE; you'd probably have to usegrep -P
for this to work - or use[[:digit:]]
instead.– Benjamin W.
Aug 20 at 15:29