Fill in NaN values for left join by sampling from right table

I cannot figure out a nice panda-ish way to fill in missing NaN values for left join by sampling from right table.

e.g
joined_left = left.merge(right, how="left", left_on=[attr1], right_on=[attr2])
from left and right

produces smth like

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

How do I sample a row from a right table instead of filling NaNs?

This is what I tried so far playground:

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
 right_sample = right.sample().drop(0, axis=1)
 joined_left.fillna(value=right_sample, limit=1)

print joined_left

Basically sample randomly and use fillna() for first occurance of NaN value to fill in...but for some reason I get no output.

Thank you!

One of outputs could be

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 2.0 2.0
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 3.0 2.9
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

with sampled 3 2 2and3 2 9

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

What is your expected output?. Please provide a Minimal, Complete, and Verifiable example.

– Sandeep Kadapa
Nov 11 '18 at 2:55

@SandeepKadapa provided

– YohanRoth
Nov 11 '18 at 3:06

add a comment |

I cannot figure out a nice panda-ish way to fill in missing NaN values for left join by sampling from right table.

e.g
joined_left = left.merge(right, how="left", left_on=[attr1], right_on=[attr2])
from left and right

produces smth like

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

How do I sample a row from a right table instead of filling NaNs?

This is what I tried so far playground:

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
 right_sample = right.sample().drop(0, axis=1)
 joined_left.fillna(value=right_sample, limit=1)

print joined_left

Basically sample randomly and use fillna() for first occurance of NaN value to fill in...but for some reason I get no output.

Thank you!

One of outputs could be

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 2.0 2.0
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 3.0 2.9
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

with sampled 3 2 2and3 2 9

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

What is your expected output?. Please provide a Minimal, Complete, and Verifiable example.

– Sandeep Kadapa
Nov 11 '18 at 2:55

@SandeepKadapa provided

– YohanRoth
Nov 11 '18 at 3:06

add a comment |

I cannot figure out a nice panda-ish way to fill in missing NaN values for left join by sampling from right table.

e.g
joined_left = left.merge(right, how="left", left_on=[attr1], right_on=[attr2])
from left and right

produces smth like

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

How do I sample a row from a right table instead of filling NaNs?

This is what I tried so far playground:

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
 right_sample = right.sample().drop(0, axis=1)
 joined_left.fillna(value=right_sample, limit=1)

print joined_left

Basically sample randomly and use fillna() for first occurance of NaN value to fill in...but for some reason I get no output.

Thank you!

One of outputs could be

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 2.0 2.0
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 3.0 2.9
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

with sampled 3 2 2and3 2 9

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

I cannot figure out a nice panda-ish way to fill in missing NaN values for left join by sampling from right table.

e.g
joined_left = left.merge(right, how="left", left_on=[attr1], right_on=[attr2])
from left and right

produces smth like

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 NaN NaN
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 NaN NaN
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

How do I sample a row from a right table instead of filling NaNs?

This is what I tried so far playground:

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
 right_sample = right.sample().drop(0, axis=1)
 joined_left.fillna(value=right_sample, limit=1)

print joined_left

Basically sample randomly and use fillna() for first occurance of NaN value to fill in...but for some reason I get no output.

Thank you!

One of outputs could be

 0 1_x 2_x 1_y 2_y
0 1 1 1 2.0 2.0
1 1 1 1 2.0 3.0
2 2 2 2 2.0 2.0
3 3 3 3 2.0 2.0
4 3 3 3 2.0 9.0
5 3 3 3 2.0 2.0
6 9 9 9 3.0 2.9
7 1 3 2 2.0 2.0
8 1 3 2 2.0 3.0

with sampled 3 2 2and3 2 9

python pandas

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

edited Nov 11 '18 at 16:38

asked Nov 11 '18 at 2:40

YohanRoth

9331919

asked Nov 11 '18 at 2:40

YohanRoth

9331919

asked Nov 11 '18 at 2:40

YohanRoth

9331919

What is your expected output?. Please provide a Minimal, Complete, and Verifiable example.

– Sandeep Kadapa
Nov 11 '18 at 2:55

@SandeepKadapa provided

– YohanRoth
Nov 11 '18 at 3:06

add a comment |

What is your expected output?. Please provide a Minimal, Complete, and Verifiable example.

– Sandeep Kadapa
Nov 11 '18 at 2:55

@SandeepKadapa provided

– YohanRoth
Nov 11 '18 at 3:06

What is your expected output?. Please provide a Minimal, Complete, and Verifiable example.

– Sandeep Kadapa
Nov 11 '18 at 2:55

@SandeepKadapa provided

– YohanRoth
Nov 11 '18 at 3:06

add a comment |

1 Answer
1

active

oldest

votes

Using sample with fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 NaN NaN left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 NaN NaN left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns=1:'1_y',2:'2_y')) 
Out[706]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 2.0 2.0 left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 2.0 3.0 left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

|
show 8 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245384%2ffill-in-nan-values-for-left-join-by-sampling-from-right-table%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Using sample with fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 NaN NaN left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 NaN NaN left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns=1:'1_y',2:'2_y')) 
Out[706]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 2.0 2.0 left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 2.0 3.0 left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

|
show 8 more comments

Using sample with fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 NaN NaN left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 NaN NaN left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns=1:'1_y',2:'2_y')) 
Out[706]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 2.0 2.0 left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 2.0 3.0 left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

|
show 8 more comments

Using sample with fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 NaN NaN left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 NaN NaN left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns=1:'1_y',2:'2_y')) 
Out[706]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 2.0 2.0 left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 2.0 3.0 left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

Using sample with fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 NaN NaN left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 NaN NaN left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns=1:'1_y',2:'2_y')) 
Out[706]: 
 0 1_x 2_x 1_y 2_y _merge
0 1 1 1 2.0 2.0 both
1 1 1 1 2.0 3.0 both
2 2 2 2 2.0 2.0 left_only
3 3 3 3 2.0 2.0 both
4 3 3 3 2.0 9.0 both
5 3 3 3 2.0 2.0 both
6 9 9 9 2.0 3.0 left_only
7 1 3 2 2.0 2.0 both
8 1 3 2 2.0 3.0 both

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

edited Nov 11 '18 at 16:22

answered Nov 11 '18 at 3:06

W-B

106k83165

answered Nov 11 '18 at 3:06

W-B

106k83165

answered Nov 11 '18 at 3:06

W-B

106k83165

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

|
show 8 more comments

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

could you pls briefly explain the logic

– YohanRoth
Nov 11 '18 at 3:07

@YohanRoth added

– W-B
Nov 11 '18 at 3:10

@YohanRoth you should assign it back df=df.fillna(s)

– W-B
Nov 11 '18 at 3:17

it's not really sampling from right table, but rather from right table values that we brought in the join

– YohanRoth
Nov 11 '18 at 3:34

@YohanRoth got you , let me fix

– W-B
Nov 11 '18 at 3:43

|
show 8 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt