BashScript move files based on matching part of filenames from a list
BashScript move files based on matching part of filenames from a list
I have millions of xml files. The name of the xml file follows this pattern:
ABC_20180912_12345.xml
ABC_20180412_98765.xml
ABC_20180412_45678.xml
From this I want to copy files to a different folder based on the name it has after the underscore. To identify the files, I have a list which I have saved in a csv file which provides me with the required names. An example:
vcfile="/home/mycomp/Documents/wd/vehicles.csv"
vcpvr=`cat $vcfile`
echo $vcpvr provides me with this list:
2894 4249 5464
I am able to loop through the xmlfiles in the folder, open each file and grep to see if the file contains the string and if it is, the move the files to a new location. This is working.
The complete code:
#filesToExtract is the interim folder
fold="/home/mycomp/filesToExtract";
query=$fold/*.xml
vcfile="/home/mycomp/Documents/wd/vehicles.csv"
vcpvr=`cat $vcfile`
#xmlfiles - keep all tar.gz files here
cd ~/xmlfiles/
COUNTER=1
for f in *.tar.gz
do
echo " $COUNTER "
tar zxf "$f" -C ~/filesToExtract
for k in $query
do
file $k | if grep -q "$vcpvr"
then
mv $k ~/xmlToWork/
fi
done
#xmltowork is the final folder
#rm -r ~/filesToExtract/*.xml
COUNTER=$((COUNTER + 1))
done
But since this looks for the string inside the file, instead of filename, it takes longer to process millions of files. Instead, I want to look for the string in the filename and if it is there, move the files. This is what I have tried:
target="/home/mycomp/xmlToWork"
for k in $query
do
if [[ $k =~ "$vcpvr" ]]; then
cp -v $k $target
fi
done
But this gives me an error tarextract.sh: 12: tarextract.sh: [[: not found
tarextract.sh: 12: tarextract.sh: [[: not found
[[: not found
bash
can u share how the vehiles.csv file looks like is it just numbers separated by spaces??
– Inder
Aug 21 at 10:17
@Index yes..it is just numbers with each number in a subsequent cell / line. Think of excel - all the numbers are in first column without header
– Apricot
Aug 21 at 10:26
1 Answer
1
This will work just fine, although I was hesitant to suggest as it will be a slower approach as it involve iteration, but certainly faster than looking into the files.
nn=($(cat vehicles.csv));for x in "$nn[@]";do ls *.xml|grep "$x"|xargs -I '' mv folder/;done
multiline version of the same will be:
nn=($(cat test.csv))
for x in "$nn[@]"
do
ls *.xml|grep "$x"|xargs -I '' mv /home/inderss/dumps/
done
Thank you...when I tried the code....I got Bad Substitution error for line 2..and when I replaced $nn[@] with $nn, i could echo all the numbers of $x....but when I used the 4th line..ie...ls *.xml I could sense the machine takes time to read each file...but the output is missing...however, when I try by replacing $x with just a single number...the code in line 4 works....for example if i hard code $x=2252...it moves all the files that has 2252....I just could not figure out why the for loop is not working
– Apricot
Aug 22 at 3:34
Can you please just copy past few lines of the csv file???? @Apricot
– Inder
Aug 22 at 5:38
Thanks a ton for your support. I have created few dummy files and a test csv here drive.google.com/open?id=1SJ-zQz-JFHR_scmr7ZAkfIQxwP-yTslV
– Apricot
Aug 22 at 5:47
@Apricot -I argument after xargs was off there will be no space in that
– Inder
Aug 22 at 6:52
I ran the edited code with the test.csv as provided in the link, it worked perfectly fine. bash version=
GNU bash, version 4.4.19(1)-release (x86_64-pc-linux-gnu)– Inder
Aug 22 at 6:52
GNU bash, version 4.4.19(1)-release (x86_64-pc-linux-gnu)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
[[: not found-> you are not runningbash.– PesaThe
Aug 21 at 8:29