mege two csv files using python or pandas
mege two csv files using python or pandas
**csv file 1**
date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298
I have 2 CSV Files, I want to merge based on one common column , but the common column lengths are different!Is there any way to merge/combine this without duplicating values
csv file 2
eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in
Output.csv
eventCategory event_type date yearMonth deviceCategory channelGrouping Totalevents
_GW_Legal_RM_false Legal 20160719 201607 desktop Direct 149
_GW_Legal_RM_false Legal 20160719 201607 desktop Organic Search 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Direct 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Social 149
will modify the question
– Karan Bala
Sep 18 '18 at 13:17
There's always
pandas.DataFrame.merge
– ALollz
Sep 18 '18 at 13:23
pandas.DataFrame.merge
Is that the full output you expect, or just a subset of the full output? There doesn't seem to be any particular logic I can find as to why it's only those 4 rows.
– ALollz
Sep 18 '18 at 13:32
@ALollz , It's just a subset of the output, more precisely that's an example of the output format that i require.
– Karan Bala
Sep 18 '18 at 15:27
3 Answers
3
df1 = pd.read_csv("csv1.csv")
df2 = pd.read_csv("csv2.csv")
df = pd.merge(df1, df2, on='eventCategory', how='left')
some modification to @FrankZhu 's answer.
Thanks@Keshav Sharma..Very helpful(i know cats)
– Karan Bala
Sep 19 '18 at 10:53
Use map
with set_index
:
map
set_index
import pandas as pd
from io import StringIO
csv1 = StringIO("""date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298""")
csv2= StringIO("""eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in""")
df1 = pd.read_csv(csv1,sep='ss+')
df2 = pd.read_csv(csv2, sep='ss+')
df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type'])
df1
Output:
date yearMonth deviceCategory channelGrouping eventCategory Totalevents event_type
0 20160719 201607 desktop Direct _GW_Legal_RM_false 149 Legal
1 20160719 201607 desktop Direct _GW_Risk_RM_false 298 Risk
2 20160719 201607 desktop Direct _GW_Risk_RM_true 149 Risk
3 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 Sign-in
4 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 Legal
5 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 Risk
6 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 Sign-in
7 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 Sign-in
8 20160719 201607 mobile Direct _GW_Legal_RM_false 149 Legal
9 20160719 201607 mobile Social _GW_Legal_RM_false 149 Legal
10 20160719 201607 tablet Direct _GW_Legal_RM_false 149 Legal
11 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 Legal
12 20160720 201607 desktop Direct _GW_Legal_RM_false 149 Legal
13 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 Sign-in
14 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 Legal
15 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 Risk
16 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 Legal
17 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298 Risk
So, @ScottBoston, both the input files are in .csv format, not a string..Can you please guide me !
– Karan Bala
Sep 18 '18 at 14:54
Oh.. that was just a test to mimic your csv input. You can do the same method just change out csv1 and csv2 for the path of your .csv files.
– Scott Boston
Sep 18 '18 at 19:39
To extend ALollz's reply,
import pandas as pd
df1 = pd.read_csv("1.csv", sep=" ")
df2 = pd.read_csv("2.csv", sep=" ")
df = pd.merge([df1, df2], on='eventCategory', how='left')
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy
Can you give an example of what the two input csvs look like and an example of what you want the output to be?
– jobevers
Sep 18 '18 at 13:15