Taking Same Worksheet from a Folder of xlsm Files with Python

Taking Same Worksheet from a Folder of xlsm Files with Python



I'm new to pandas/python and Ive come up with the following code to extract data from a specific part of a worksheet.


import openpyxl as xl
import pandas as pd

rows_with_data = [34,37,38,39,44,45,46,47,48,49, 50,54,55,57,58,59,60,62,63,64,65,66,70,71,72,76,77, 78,79,80,81,82,83,84,88,89,90,91,92]

path = r'XXX'
xpath = input('XXX')
file = r'**.xlsm'
xfile = input('Change file name, current is ' + file + ' :')
sheetname = r'Summary'

wb = xl.load_workbook(filename = xpath + '\' +file, data_only = True)
sheet = wb.get_sheet_by_name(sheetname)

rows = len(rows_with_data)
line_items =
for i in range(rows) :
line_items.append(sheet.cell(row = rows_with_data[i], column = 13).value)

period =
for col in range(17,35):
period.append(sheet.cell(row = 20, column = col).value)

print(line_items)

vals =
x =
for i in range(rows):
if i != 0:
vals.append(x)
x =
for col in range(17,35):

x.append(sheet.cell(row = rows_with_data[i], column = col).value)


vals.append(x)
all_values =
all_values['Period'] = period
for i in range(rows):
print(line_items[i])
all_values[line_items[i]] = vals[i]

print(all_values)

period_review = input('Enter a period (i.e. 2002): ')
item = input('Enter a period (i.e. XXX): ')

time = period.index(period_review)
display_item = str(all_values[item][time])
print(item + ' for ' + period_review + " is " + display_item)

Summary_Dataframe = pd.DataFrame(all_values)

writer = pd.ExcelWriter(xpath + '\' + 'values.xlsx')
Summary_Dataframe.to_excel(writer,'Sheet1')
writer.save()
writer.close()



I have the same worksheet (summary results) across a library of 60 xlsm files and I'm having a hard time figuring out how to iterate this across the entire folder of files. I also want change this from extracting specific rows to taking the entire "Summary" worksheet, pasting it to the new file and naming the worksheet by its filename ("Experiment_A") when pasted to the new excel file. Any advice?





I don't think you need to use pandas in this at all but please simplify the question.
– Charlie Clark
Sep 3 at 8:33





the code i came up with initially was to extract a range within the sheet. rather than just take a subset of the sheet, i am looking to see how to extract an entire sheet and also looping across a folder of 60 xlsm files.
– user6907218
Sep 3 at 15:04




1 Answer
1



I was having hard time to read your code to understand that what you want to do finally. So it is just an advice not a solution. You can iterate through all files in the folder using os then read the files in to one dataframe then save the single big data frame in to csv. I usually avoid excel but I guess you need the excel conversion. In the example below I have read all txt file from a directory put them in to dataframe list then store the big data frame as json. You can also store it as excel/csv.


os


import os
import pandas as pd

def process_data():
# input file path in 2 part in case it is very long
input_path_1 = r'\pathtothefolder'
input_path_2 = r'secondpartofthepath'
# adding the all file path
file_path = input_path_1 + input_path_2
# listing all file in the file folder
file_list = os.listdir(os.path.join(file_path))
# selecting only the .txt files in to a list object
file_list = [file_name for file_name in file_list if '.txt' in file_name]
# selecting the fields we need
field_names = ['country', 'ticket_id']
# defining a list to put all the datafremes in one list
pd_list =
inserted_files =
# looping over txt files and storing in to database
for file_name in file_list:
# creating the file path to read the file
file_path_ = file_path + '\' + file_name
df_ = pd.read_csv(os.path.join(file_path_), sep='t', usecols=field_names)
# converting the datetime to date
# few internal data transformation example before writting
df_['sent_date'] = pd.to_datetime(df_['sent_date'])
df_['sent_date'] = df_['sent_date'].values.astype('datetime64[M]')
# adding each dataframe to the list
pd_list.append(df_)
# adding file name to the inserted list to print later
inserted_files.append(file_name)
print(inserted_files)
# sql like union all dataframes and create a single data source
df_ = pd.concat(pd_list)

output_path_1 = r'\pathtooutput'
output_path_2 = r'pathtooutput'
output_path = output_path_1 + output_path_2
# put the file name
file_name = 'xyz.json'
# adding the day the file processed
df_['etl_run_time'] = pd.to_datetime('today').strftime('%Y-%m-%d')
# write file to json
df_.to_json(os.path.join(output_path, file_name), orient='records')
return print('Data Stored as json successfully')

process_data()





thanks for trying to read through my code and replying. i am looking to take one sheet from a group of files and copy/paste them into excel. since i am working with others, im stuck with excel.
– user6907218
Sep 2 at 22:18






So still you can use the first part to iterate through all files in the folder using os and then apply something according to your need.
– DataPsycho
Sep 3 at 9:39




Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)