Find the number of files for each extension in a directory
Find the number of files for each extension in a directory
I want to count the number of files for each extension in a directory as well as the files without extension.
I have tried a few options, but I haven't found a working solution yet:
find "$folder" -type f | sed 's/.*.//' | sort | uniq -c
is an option but doesn't work if there is no file extension. I need to know how many files do not have an extension.
find "$folder" -type f | sed 's/.*.//' | sort | uniq -c
I have also tried a find loop into an array and then sum the results, but at this time that code throws an undeclared variable error, but only outside of the loop:
declare -a arr
arr=()
echo $arr[@]
This throws an undeclared variable, as well as once the find loop completes.
5 Answers
5
find "$path" -type f | sed -e '/.*/[^/]*.[^/]*$/!s/.*/(none)/' -e 's/.*.//' | LC_COLLATE=C sort | uniq -c
Explanation:
find "$path" -type f
"$path"
sed -e '/.*/[^/]*.[^/]*$/!s/.*/(none)/' -e 's/.*.//'
/.*/[^/]*.[^/]*$/!s/.*/(none)/
s/.*.//
LC_COLLATE=C sort
uniq -c
Using Python:
import os
from collections import Counter
from pprint import pprint
lst =
for file in os.listdir('./'):
name, ext = os.path.splitext(file)
lst.append(ext)
pprint(Counter(lst))
The output:
Counter('': 7,
'.png': 4,
'.mp3': 3,
'.jpg': 3,
'.mkv': 3,
'.py': 1,
'.swp': 1,
'.sh': 1)
ext = [ f.split('.')[-1] for f in os.listdir('./') ]
Thanks for suggestion, I was just trying to write it as clear as I could ...
– Ravexina
Sep 3 at 18:26
Clarity is the virtue :) Especially when it comes to code and engineering documentation.
– Sergiy Kolodyazhnyy
Sep 3 at 18:29
If you have GNU awk, you could do something like
printf '%s' * | gawk 'BEGINRS=""; FS="."; OFS="t"
a[(NF>1 ? $NF : "(none)")]++
ENDfor(i in a) print a[i],i
'
i.e. construct / increment an associative array keyed on the last .
separated field, or some arbitrary fixed string such as (none)
if there is no extension.
.
(none)
mawk
doesn't seem to allow a null-byte record separator - you could use mawk
with the default newline separator if you are confident that you don't need to deal with newlines in your file names:
mawk
mawk
printf '%sn' * | mawk 'BEGINFS="."; OFS="t" a[(NF>1 ? $NF : "(none)")]++ ENDfor(i in a) print a[i],i'
With basic /bin/sh
or even bash
the task can be a little difficult, but as you can see in other answers the tools that can work on aggregate data can deal with such task particularly easy. One such tool would be sqlite
database.
/bin/sh
bash
sqlite
The very simple process to use sqlite
database would be to create a .csv
file with two fields: file name and extension. Later sqlite
can use simple aggregate statement COUNT()
with GROUP BY ext
to perform counting of files based on extension field
sqlite
.csv
sqlite
COUNT()
GROUP BY ext
$ printf "file,extn"; find -type f -exec sh -c 'f=$1##*/;printf "%s,%sn" "$1" "$1##*."' sh ; ; > files.csv
$ sqlite3 <<EOF
> .mode csv
> .import ./files.csv files_tb
> SELECT ext,COUNT(file) FROM files_tb GROUP BY ext;
> EOF
csv,1
mp3,6
txt,1
wav,27
files_tb
table I think is being referenced but the table columns are not defined anywhere I can see?– WinEunuuchs2Unix
Sep 3 at 20:34
files_tb
@WinEunuuchs2Unix They're defined in csv file itself. That's what the first
printf
does. And SQLite will default to treating first line of csv file as column names.– Sergiy Kolodyazhnyy
Sep 3 at 20:38
printf
Very impressive! +1
– WinEunuuchs2Unix
Sep 3 at 21:53
Using PowerShell if that's an option:
Get-ChildItem -File | Group-Object Extension -NoElement
or shorter, using aliases:
ls -file | group -n Extension
Wow! Great first answer! I didn't even know PowerShell existed for Linux... +1
– Fabby
Sep 3 at 22:25
Thanks. It has existed cross-platform and open-source for a while, but there's been a pattern on SO and SU where questions for shell scripting on Windows have often been answered with "Well, install cygwin and use bash, then you can do the following", so I've been hesitant to do the same for Linux SE sites with tools that originated on Windows. But this has been a nice task that shows PowerShell's strengths quite nicely without inviting the old argument about verbosity.
– Joey
Sep 4 at 7:46
Thanks for contributing an answer to Ask Ubuntu!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You probably can get away with list comprehension, like
ext = [ f.split('.')[-1] for f in os.listdir('./') ]
Thatll make it couple lines shorter and perhaps more Pythonic– Sergiy Kolodyazhnyy
Sep 3 at 18:24