Search a word in docx file and copy the file in keyword folder
Search a word in docx file and copy the file in keyword folder
I am doing a task in which I have a text file containing specific keyword likes (C#,Angular,Python,Rest,MySQL) and have to create folder. After this I have to search these keyword in resumes given and if found copy it to each folder.
For example A has skill of C# and Angular so his/her Resume will be in both folder.
I have completed the folder creation , I need help in search the word in .docx file and copying to requested folder. I have looked for online stuffs but unable to proceed. Can anyone provide me some lead how to search word/string in docs. file
Here is my code:
Folder creation
InputFile = "ResumeKeyword.txt"
fileOpen = open(InputFile)
for keyword in fileOpen.readline().split(','):
print(keyword)
os.makedirs(keyword)
fileOpen.close()
And for reading the docx
from docx import Document
document = Document('A.docx')
word = "Angular"
for x in document.paragraphs:
print(x.text)
1 Answer
1
Hope this helps:
from docx import Document
from shutil import copyfile
import os, re, random
# Folder which contains all the resumes
ALL_RESUMES = "all_resumes/"
# The folder which will contain the separated resumes
SEGREGATED_RESUMES = "topic_wise_resumes/"
def get_keywords(keywords_file, create_new = False):
"""
Get all keywords from file keywords_file. We get all keywords in lower case to remove confusion down the line.
"""
fileOpen = open(keywords_file, "r")
words = [x.strip().lower() for x in fileOpen.readline().split(',')]
keywords =
for keyword in words:
keywords.append(keyword)
if(not(os.path.isdir(SEGREGATED_RESUMES))):
os.makedirs(SEGREGATED_RESUMES + keyword)
return keywords
def segregate_resumes(keywords):
"""
Copy the resumes to the appropriate folders
"""
# The pattern for regex match
keyword_pattern = "|".join(keywords)
# All resumes
for filename in os.listdir(ALL_RESUMES):
# basic sanity check
if filename.endswith(".docx"):
document = Document(ALL_RESUMES + filename)
all_texts =
for p in document.paragraphs:
all_texts.append(p.text)
# The entire text in the resume in lowercase
all_words_in_resume = " ".join(all_texts).lower()
# The matching keywords
matches = re.findall(keyword_pattern, all_words_in_resume)
# Copy the resume to the keyword folder
for match in matches:
copyfile(ALL_RESUMES + filename, SEGREGATED_RESUMES + match + "/" + filename)
def create_sample_resumes(keywords, num = 5):
"""
Function to create sample resumes for testing
"""
for i in range(num):
document = Document()
document.add_heading('RESUME'.format(i))
skills_ = random.sample(keywords, 2)
document.add_paragraph("I have skills - and ".format(*skills_))
document.save(ALL_RESUMES + "resume.docx".format(i))
keywords = get_keywords("ResumeKeyword.txt")
print(keywords)
# create_sample_resumes(keywords)
segregate_resumes(keywords)
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thank You. It's works perfectly well.
– Sunny Prakash
Sep 1 at 7:25