Extract domain from file name
Extract domain from file name
I am given one zip file which includes exactly 4 files with the following naming convention:
2018.foo.abc.example.co.uk.20183740273
2018.foo.bcd.example.co.uk.20183740474
2018.foo.dce.example.co.uk.20183749769
2018.foo.def.example.co.uk.20183746483
My task is to extract the domain from any filename (any, because all of the domains are the same) and store it into a bash variable. These file names are just examples. The key point is the domain. Split takes place after third dot.
The domain is: example.co.uk. Sorry if I don’t make it clear enough in question
– NarrowVision
Sep 11 '18 at 18:46
How do you determine where the domain ends?
– melpomene
Sep 11 '18 at 18:46
That’s the tricky part, I can specify const array with all points of sales that I will allow. + I did try it, and it’s not a problem if you don’t help me, I can do it myself, I wanted opinion and help and not hate from the start, the attempt is in my office. I can’t access both files, laptop.
– NarrowVision
Sep 11 '18 at 18:49
2 Answers
2
If you want to extract domains with different numbers of fields (e.g. example.co.uk and example.com) you can use sed:
example.co.uk
example.com
sed
sed 's/([^.]*.)3//;s/.[0-9]*$//' filename
or a combination of sed and cut, which is more readable:
sed
cut
sed 's/.[0-9]*$//' filename | cut -d '.' -f4-
If the files are all named with the same convention, you can use some basic AWK to find the answer:
awk -F'.' 'print$3,$4,$5' <filename>
-F = field delimiter. In this case it is a .. Now that we have broken up the text into essentially columns of data, split on the ., we tell AWK which fields to print out.
-F
.
.
Or you can use cut:
cut -d '.' -f3-5 <filename>
Same principle as AWK, just a different way to do it.
Or
< filename cut -d '.' -f3-6 (saves a cat).– melpomene
Sep 11 '18 at 18:56
< filename cut -d '.' -f3-6
Oh my. It actually worked like a dream, I will use <filename> + piped cut. Do you have any suggestions for different cases like .com and co.uk, the .com won’t give me correct value
– NarrowVision
Sep 11 '18 at 19:02
@NarrowVision: if you want to handle such cases, you should say so in your question.
– Beta
Sep 11 '18 at 22:32
So, it is deliminating on the "." The x.x.com has only two ".", Which means there are essentially three columns (if you think of it like a spreadsheet). Where the x.x.co.uk has three ".", Which has four columns if you think of it like a spreadsheet. You just have to change the code to reflect the columns. You could add some regex, or a grep, or use SED. There are amillion ways to do it. If that doesn't make sense I'll write a better explanation when I have time.
– Ambrose
Sep 12 '18 at 19:32
But @Beta has a better answer. Always trust in SED for string manipulation.
– Ambrose
Sep 12 '18 at 21:03
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Which part is the domain?
– melpomene
Sep 11 '18 at 18:45