mege two csv files using python or pandas

**csv file 1** date yearMonth deviceCategory channelGrouping eventCategory Totalevents 20160719 201607 desktop Direct _GW_Legal_RM_false 149 20160719 201607 desktop Direct _GW_Risk_RM_false 298 20160719 201607 desktop Direct _GW_Risk_RM_true 149 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 20160719 201607 mobile Direct _GW_Legal_RM_false 149 20160719 201607 mobile Social _GW_Legal_RM_false 149 20160719 201607 tablet Direct _GW_Legal_RM_false 149 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 20160720 201607 desktop Direct _GW_Legal_RM_false 149 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298

I have 2 CSV Files, I want to merge based on one common column , but the common column lengths are different!Is there any way to merge/combine this without duplicating values

csv file 2

eventCategory event_type _GW_Legal_RM_false Legal _GW_Legal_RM_true Legal _GW_Legal_RM_ Legal _GW_Risk_RM_false Risk _GW_Risk_RM_true Risk _GW_Risk_RM_ Risk _GW__Product-Sign-In__ Sign-in

Output.csv

eventCategory event_type date yearMonth deviceCategory channelGrouping Totalevents _GW_Legal_RM_false Legal 20160719 201607 desktop Direct 149 _GW_Legal_RM_false Legal 20160719 201607 desktop Organic Search 149 _GW_Legal_RM_false Legal 20160719 201607 mobile Direct 149 _GW_Legal_RM_false Legal 20160719 201607 mobile Social 149

Can you give an example of what the two input csvs look like and an example of what you want the output to be?

– jobevers
Sep 18 '18 at 13:15

will modify the question

– Karan Bala
Sep 18 '18 at 13:17

There's always pandas.DataFrame.merge

– ALollz
Sep 18 '18 at 13:23

pandas.DataFrame.merge

Is that the full output you expect, or just a subset of the full output? There doesn't seem to be any particular logic I can find as to why it's only those 4 rows.

– ALollz
Sep 18 '18 at 13:32

@ALollz , It's just a subset of the output, more precisely that's an example of the output format that i require.

– Karan Bala
Sep 18 '18 at 15:27

3 Answers
3

df1 = pd.read_csv("csv1.csv") df2 = pd.read_csv("csv2.csv") df = pd.merge(df1, df2, on='eventCategory', how='left')

some modification to @FrankZhu 's answer.

Thanks@Keshav Sharma..Very helpful(i know cats)

– Karan Bala
Sep 19 '18 at 10:53

Use map with set_index:

map

set_index

import pandas as pd from io import StringIO csv1 = StringIO("""date yearMonth deviceCategory channelGrouping eventCategory Totalevents 20160719 201607 desktop Direct _GW_Legal_RM_false 149 20160719 201607 desktop Direct _GW_Risk_RM_false 298 20160719 201607 desktop Direct _GW_Risk_RM_true 149 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 20160719 201607 mobile Direct _GW_Legal_RM_false 149 20160719 201607 mobile Social _GW_Legal_RM_false 149 20160719 201607 tablet Direct _GW_Legal_RM_false 149 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 20160720 201607 desktop Direct _GW_Legal_RM_false 149 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298""") csv2= StringIO("""eventCategory event_type _GW_Legal_RM_false Legal _GW_Legal_RM_true Legal _GW_Legal_RM_ Legal _GW_Risk_RM_false Risk _GW_Risk_RM_true Risk _GW_Risk_RM_ Risk _GW__Product-Sign-In__ Sign-in""") df1 = pd.read_csv(csv1,sep='ss+') df2 = pd.read_csv(csv2, sep='ss+') df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type']) df1

Output:

date yearMonth deviceCategory channelGrouping eventCategory Totalevents event_type 0 20160719 201607 desktop Direct _GW_Legal_RM_false 149 Legal 1 20160719 201607 desktop Direct _GW_Risk_RM_false 298 Risk 2 20160719 201607 desktop Direct _GW_Risk_RM_true 149 Risk 3 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 Sign-in 4 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 Legal 5 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 Risk 6 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 Sign-in 7 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 Sign-in 8 20160719 201607 mobile Direct _GW_Legal_RM_false 149 Legal 9 20160719 201607 mobile Social _GW_Legal_RM_false 149 Legal 10 20160719 201607 tablet Direct _GW_Legal_RM_false 149 Legal 11 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 Legal 12 20160720 201607 desktop Direct _GW_Legal_RM_false 149 Legal 13 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 Sign-in 14 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 Legal 15 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 Risk 16 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 Legal 17 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298 Risk

So, @ScottBoston, both the input files are in .csv format, not a string..Can you please guide me !

– Karan Bala
Sep 18 '18 at 14:54

Oh.. that was just a test to mimic your csv input. You can do the same method just change out csv1 and csv2 for the path of your .csv files.

– Scott Boston
Sep 18 '18 at 19:39

To extend ALollz's reply,

import pandas as pd df1 = pd.read_csv("1.csv", sep=" ") df2 = pd.read_csv("2.csv", sep=" ") df = pd.merge([df1, df2], on='eventCategory', how='left')

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

搜尋此網誌

Dfyjkt