mege two csv files using python or pandas

mege two csv files using python or pandas


**csv file 1**

date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298



I have 2 CSV Files, I want to merge based on one common column , but the common column lengths are different!Is there any way to merge/combine this without duplicating values



csv file 2


eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in



Output.csv


eventCategory event_type date yearMonth deviceCategory channelGrouping Totalevents
_GW_Legal_RM_false Legal 20160719 201607 desktop Direct 149
_GW_Legal_RM_false Legal 20160719 201607 desktop Organic Search 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Direct 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Social 149






Can you give an example of what the two input csvs look like and an example of what you want the output to be?

– jobevers
Sep 18 '18 at 13:15






will modify the question

– Karan Bala
Sep 18 '18 at 13:17






There's always pandas.DataFrame.merge

– ALollz
Sep 18 '18 at 13:23



pandas.DataFrame.merge






Is that the full output you expect, or just a subset of the full output? There doesn't seem to be any particular logic I can find as to why it's only those 4 rows.

– ALollz
Sep 18 '18 at 13:32






@ALollz , It's just a subset of the output, more precisely that's an example of the output format that i require.

– Karan Bala
Sep 18 '18 at 15:27




3 Answers
3


df1 = pd.read_csv("csv1.csv")

df2 = pd.read_csv("csv2.csv")

df = pd.merge(df1, df2, on='eventCategory', how='left')



some modification to @FrankZhu 's answer.






Thanks@Keshav Sharma..Very helpful(i know cats)

– Karan Bala
Sep 19 '18 at 10:53



Use map with set_index:


map


set_index


import pandas as pd
from io import StringIO

csv1 = StringIO("""date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298""")

csv2= StringIO("""eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in""")

df1 = pd.read_csv(csv1,sep='ss+')
df2 = pd.read_csv(csv2, sep='ss+')

df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type'])

df1



Output:


date yearMonth deviceCategory channelGrouping eventCategory Totalevents event_type
0 20160719 201607 desktop Direct _GW_Legal_RM_false 149 Legal
1 20160719 201607 desktop Direct _GW_Risk_RM_false 298 Risk
2 20160719 201607 desktop Direct _GW_Risk_RM_true 149 Risk
3 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 Sign-in
4 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 Legal
5 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 Risk
6 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 Sign-in
7 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 Sign-in
8 20160719 201607 mobile Direct _GW_Legal_RM_false 149 Legal
9 20160719 201607 mobile Social _GW_Legal_RM_false 149 Legal
10 20160719 201607 tablet Direct _GW_Legal_RM_false 149 Legal
11 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 Legal
12 20160720 201607 desktop Direct _GW_Legal_RM_false 149 Legal
13 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 Sign-in
14 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 Legal
15 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 Risk
16 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 Legal
17 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298 Risk






So, @ScottBoston, both the input files are in .csv format, not a string..Can you please guide me !

– Karan Bala
Sep 18 '18 at 14:54






Oh.. that was just a test to mimic your csv input. You can do the same method just change out csv1 and csv2 for the path of your .csv files.

– Scott Boston
Sep 18 '18 at 19:39



To extend ALollz's reply,


import pandas as pd
df1 = pd.read_csv("1.csv", sep=" ")
df2 = pd.read_csv("2.csv", sep=" ")

df = pd.merge([df1, df2], on='eventCategory', how='left')



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế