Pandas: Search if substring contains key in dictionary, and return value
I have a dictionary (key, value) and a dataframe using pandas.
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
and a dataframe with column ['Address']
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.
e.g.
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
python string pandas dictionary
add a comment |
I have a dictionary (key, value) and a dataframe using pandas.
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
and a dataframe with column ['Address']
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.
e.g.
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
python string pandas dictionary
format yourdict
anddataframes
properly, so we can run it directly without adding all quotes and commas.
– Vineeth Sai
Nov 12 '18 at 6:43
add a comment |
I have a dictionary (key, value) and a dataframe using pandas.
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
and a dataframe with column ['Address']
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.
e.g.
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
python string pandas dictionary
I have a dictionary (key, value) and a dataframe using pandas.
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
and a dataframe with column ['Address']
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
How do I search through the dataframe to get the value from the dictionary if substring is found in the key of the dictionary.
e.g.
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
python string pandas dictionary
python string pandas dictionary
edited Nov 12 '18 at 6:53
jezrael
339k25291364
339k25291364
asked Nov 12 '18 at 6:40
newtoCSnewtoCS
436
436
format yourdict
anddataframes
properly, so we can run it directly without adding all quotes and commas.
– Vineeth Sai
Nov 12 '18 at 6:43
add a comment |
format yourdict
anddataframes
properly, so we can run it directly without adding all quotes and commas.
– Vineeth Sai
Nov 12 '18 at 6:43
format your
dict
and dataframes
properly, so we can run it directly without adding all quotes and commas.– Vineeth Sai
Nov 12 '18 at 6:43
format your
dict
and dataframes
properly, so we can run it directly without adding all quotes and commas.– Vineeth Sai
Nov 12 '18 at 6:43
add a comment |
1 Answer
1
active
oldest
votes
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame('Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG'])
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
pat = '|'.join(r"bb".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
bKULAR LUMPURb|bSINGAPOREb|bHONG KONGb|bVIETNAMb
b
are called word boundaries for match words between b
|
are for regex OR
1
Jezrael, can pls you add the explanation aboutpat
that will help readers and me :-)
– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you usedf['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?
– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257034%2fpandas-search-if-substring-contains-key-in-dictionary-and-return-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame('Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG'])
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
pat = '|'.join(r"bb".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
bKULAR LUMPURb|bSINGAPOREb|bHONG KONGb|bVIETNAMb
b
are called word boundaries for match words between b
|
are for regex OR
1
Jezrael, can pls you add the explanation aboutpat
that will help readers and me :-)
– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you usedf['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?
– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
|
show 1 more comment
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame('Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG'])
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
pat = '|'.join(r"bb".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
bKULAR LUMPURb|bSINGAPOREb|bHONG KONGb|bVIETNAMb
b
are called word boundaries for match words between b
|
are for regex OR
1
Jezrael, can pls you add the explanation aboutpat
that will help readers and me :-)
– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you usedf['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?
– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
|
show 1 more comment
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame('Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG'])
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
pat = '|'.join(r"bb".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
bKULAR LUMPURb|bSINGAPOREb|bHONG KONGb|bVIETNAMb
b
are called word boundaries for match words between b
|
are for regex OR
Use str.extract
by regex
with keys of dictionary with map
:
df = pd.DataFrame('Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG'])
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = 'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'
pat = '|'.join(r"bb".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
Explanation:
print (pat)
bKULAR LUMPURb|bSINGAPOREb|bHONG KONGb|bVIETNAMb
b
are called word boundaries for match words between b
|
are for regex OR
edited Nov 12 '18 at 7:35
answered Nov 12 '18 at 6:46
jezraeljezrael
339k25291364
339k25291364
1
Jezrael, can pls you add the explanation aboutpat
that will help readers and me :-)
– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you usedf['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?
– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
|
show 1 more comment
1
Jezrael, can pls you add the explanation aboutpat
that will help readers and me :-)
– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you usedf['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?
– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
1
1
Jezrael, can pls you add the explanation about
pat
that will help readers and me :-)– pygo
Nov 12 '18 at 6:52
Jezrael, can pls you add the explanation about
pat
that will help readers and me :-)– pygo
Nov 12 '18 at 6:52
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
I am getting this error. AttributeError: 'DataFrame' object has no attribute 'map'
– newtoCS
Nov 12 '18 at 6:56
@newtoCS - do you use
df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?– jezrael
Nov 12 '18 at 6:59
@newtoCS - do you use
df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
?– jezrael
Nov 12 '18 at 6:59
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
I have used.. and if it is KULARLUMPUR , without spacing - this answer works. However, if it is KULAR LUMPUR with spacing (2 words), the error will come up
– newtoCS
Nov 12 '18 at 7:13
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
@newtoCS - For me it working nice, there are same spaces, only one in data and in dictioanry?
– jezrael
Nov 12 '18 at 7:14
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257034%2fpandas-search-if-substring-contains-key-in-dictionary-and-return-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
format your
dict
anddataframes
properly, so we can run it directly without adding all quotes and commas.– Vineeth Sai
Nov 12 '18 at 6:43