How to determine the number of grouped numbers in a string in bash

How to determine the number of grouped numbers in a string in bash



I have a string in bash


string="123 abc 456"



Where numbers that are grouped together are considered 1 number.
"123" and "456" would be considered numbers in this case.



How can i determine the number of grouped together numbers?



so


"123"



is determined to be a string with just one number, and


"123 abc 456"



is determined to be a string with 2 numbers.




2 Answers
2


egrep -o '[0-9]+' <<<"$string" | wc -l


egrep:


-o


'[0-9]+':


<<<


|


egrep


wc


wc:


-l



Is there any way to adapt your solution to work with floats?



The regular expression that matches both integer numbers and floating point decimal numbers would be something like this: '[0-9]*.?[0-9]+'. Inserting this into the command above in place of its predecessor, forms this command chain:


'[0-9]*.?[0-9]+'


egrep -o '[0-9]*.?[0-9]+' <<<"$string" | wc -l



Focussing now only on the regular expression, here's how it works:


[0-9]


*


[0-9]


"2"


"26"


"4839583"


"9.99"


"9"


"99"


*


.


"."


"28s"


"s"


?


*


[0-9]+


+



Applying this to the following string:



"The value of pi is approximately 3.14159. The value of e is about 2.71828. The Golden Ratio is approximately 1.61803, which can be expressed as (√5 + 1)/2."


"The value of pi is approximately 3.14159. The value of e is about 2.71828. The Golden Ratio is approximately 1.61803, which can be expressed as (√5 + 1)/2."



yields the following matches (one per line):


3.14159
2.71828
1.61803
5
1
2



And when this is piped through the wc -l command, returns a count of the lines, which is 6, i.e. the supplied string contains 6 occurrences of number strings, which includes integers and floating point decimals.


wc -l


6



If you wanted only the floating point decimals, and to exclude the integers, the regular expression is this:


'[0-9]*.[0-9]+'



If you look carefully, it's identical to the previous regular expression, except for the missing ? operator. If you recall, the ? made the decimal point an optional feature to match; removing this operator now means the decimal point must be present. Likewise, the + operator is matching at least one instance of a digit following the decimal point. However, the * operator before it matches any number of digits, including zero digits. Therefore, "0.61803" would be a valid match (if it were present in the string, which it isn't), and ".33333" would also be a valid match, since the digits before the decimal point needn't be there thanks to the * operator. However, whilst "1.1111" could be a valid match, "1111." would not be, because + operator dictates that there must be at least one digit following the decimal point.


?


?


+


*


"0.61803"


".33333"


*


"1.1111"


"1111."


+



Putting it into the command chain:


egrep -o '[0-9]*.[0-9]+' <<<"$string" | wc -l



returns a value of 3, for the three floating point decimals occurring in the string, which, if you remove the | wc -l portion of the command, you will see in the terminal output as:


3


| wc -l


3.14159
2.71828
1.61803



For reasons I won't go into, matching integers exclusively and excluding floating point decimals is harder to accomplish with Perl-flavoured regular expression matching (which egrep is not). However, since you're really only interested in the number of these occurrences, rather than the matches themselves, we can create a regular expression that doesn't need to worry about accurate matching of integers, as long as it produces the same number of matched items. This expression:


egrep


'[^.0-9][0-9]+(.([^0-9]|$)|[^.])'



seems to be good enough for counting the integers in the string, which includes the 5, 1 and 2 (ignoring, of course, the symbol), returning these approximately matches substrings:


5


1


2



√5
1)
/2.



I haven't tested it that thoroughly, however, and only formulated it tonight when I read your comment. But, hopefully, you are beginning to get a rough sense of what's going on.





d is neither supported by BRE or ERE; you'd probably have to use grep -P for this to work - or use [[:digit:]] instead.
– Benjamin W.
Aug 20 at 15:29


d


grep -P


[[:digit:]]





d should be supported by ERE, hence the use of egrep rather than grep, which does work in my version of bash on macOS. But, that said, I can always update the answer to use [0-9] instead.
– CJK
Aug 20 at 15:32


d


egrep


grep


[0-9]





The POSIX ERE spec doesn't mention d; it looks like it's a BSD extension to ERE. grep -Eo '[[:digit:]]+' would be fully portable, I think; [0-9] potentially expands to something other than [0123456789] in some locales.
– Benjamin W.
Aug 20 at 15:41


d


grep -Eo '[[:digit:]]+'


[0-9]


[0123456789]





You still need the -E or egrep for + to work, though.
– Benjamin W.
Aug 20 at 15:41


-E


egrep


+





Thanks for the tips and corrections, @BenjaminW. Most helpful.
– CJK
Aug 20 at 15:46



In case you need to know the number of grouped digits in string then following may help you.


string="123 abc 456"
echo "$string" | awk 'print gsub(/[0-9]+/,"")'



Explanation: Adding explanation too here, following is only for explanation purposes.


string="123 abc 456" ##Creating string named string with value of 123 abc 456.
echo "$string" ##Printing value of string here with echo.
| ##Putting its output as input to awk command.
awk ' ##Initializing awk command here.
print gsub(/[0-9]+/,"") ##printing value of gsub here(where gsub is for substituting the values of all digits in group with ""(NULL)).
it will globally substitute the digits and give its count(how many substitutions happens will be equal to group of digits present).
' ##Closing awk command here.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế