Taking Same Worksheet from a Folder of xlsm Files with Python

I'm new to pandas/python and Ive come up with the following code to extract data from a specific part of a worksheet.

import openpyxl as xl import pandas as pd rows_with_data = [34,37,38,39,44,45,46,47,48,49, 50,54,55,57,58,59,60,62,63,64,65,66,70,71,72,76,77, 78,79,80,81,82,83,84,88,89,90,91,92] path = r'XXX' xpath = input('XXX') file = r'**.xlsm' xfile = input('Change file name, current is ' + file + ' :') sheetname = r'Summary' wb = xl.load_workbook(filename = xpath + '\' +file, data_only = True) sheet = wb.get_sheet_by_name(sheetname) rows = len(rows_with_data) line_items = for i in range(rows) : line_items.append(sheet.cell(row = rows_with_data[i], column = 13).value) period = for col in range(17,35): period.append(sheet.cell(row = 20, column = col).value) print(line_items) vals = x = for i in range(rows): if i != 0: vals.append(x) x = for col in range(17,35): x.append(sheet.cell(row = rows_with_data[i], column = col).value) vals.append(x) all_values = all_values['Period'] = period for i in range(rows): print(line_items[i]) all_values[line_items[i]] = vals[i] print(all_values) period_review = input('Enter a period (i.e. 2002): ') item = input('Enter a period (i.e. XXX): ') time = period.index(period_review) display_item = str(all_values[item][time]) print(item + ' for ' + period_review + " is " + display_item) Summary_Dataframe = pd.DataFrame(all_values) writer = pd.ExcelWriter(xpath + '\' + 'values.xlsx') Summary_Dataframe.to_excel(writer,'Sheet1') writer.save() writer.close()

I have the same worksheet (summary results) across a library of 60 xlsm files and I'm having a hard time figuring out how to iterate this across the entire folder of files. I also want change this from extracting specific rows to taking the entire "Summary" worksheet, pasting it to the new file and naming the worksheet by its filename ("Experiment_A") when pasted to the new excel file. Any advice?

I don't think you need to use pandas in this at all but please simplify the question.
– Charlie Clark
Sep 3 at 8:33

the code i came up with initially was to extract a range within the sheet. rather than just take a subset of the sheet, i am looking to see how to extract an entire sheet and also looping across a folder of 60 xlsm files.
– user6907218
Sep 3 at 15:04

1 Answer
1

I was having hard time to read your code to understand that what you want to do finally. So it is just an advice not a solution. You can iterate through all files in the folder using os then read the files in to one dataframe then save the single big data frame in to csv. I usually avoid excel but I guess you need the excel conversion. In the example below I have read all txt file from a directory put them in to dataframe list then store the big data frame as json. You can also store it as excel/csv.

os

import os import pandas as pd def process_data(): # input file path in 2 part in case it is very long input_path_1 = r'\pathtothefolder' input_path_2 = r'secondpartofthepath' # adding the all file path file_path = input_path_1 + input_path_2 # listing all file in the file folder file_list = os.listdir(os.path.join(file_path)) # selecting only the .txt files in to a list object file_list = [file_name for file_name in file_list if '.txt' in file_name] # selecting the fields we need field_names = ['country', 'ticket_id'] # defining a list to put all the datafremes in one list pd_list = inserted_files = # looping over txt files and storing in to database for file_name in file_list: # creating the file path to read the file file_path_ = file_path + '\' + file_name df_ = pd.read_csv(os.path.join(file_path_), sep='t', usecols=field_names) # converting the datetime to date # few internal data transformation example before writting df_['sent_date'] = pd.to_datetime(df_['sent_date']) df_['sent_date'] = df_['sent_date'].values.astype('datetime64[M]') # adding each dataframe to the list pd_list.append(df_) # adding file name to the inserted list to print later inserted_files.append(file_name) print(inserted_files) # sql like union all dataframes and create a single data source df_ = pd.concat(pd_list) output_path_1 = r'\pathtooutput' output_path_2 = r'pathtooutput' output_path = output_path_1 + output_path_2 # put the file name file_name = 'xyz.json' # adding the day the file processed df_['etl_run_time'] = pd.to_datetime('today').strftime('%Y-%m-%d') # write file to json df_.to_json(os.path.join(output_path, file_name), orient='records') return print('Data Stored as json successfully') process_data()

thanks for trying to read through my code and replying. i am looking to take one sheet from a group of files and copy/paste them into excel. since i am working with others, im stuck with excel.
– user6907218
Sep 2 at 22:18

So still you can use the first part to iterate through all files in the folder using os and then apply something according to your need.
– DataPsycho
Sep 3 at 9:39

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt