Pandas - show percentage of values in one column, grouped by other column

Pandas - show percentage of values in one column, grouped by other column



So I have a Pandas DataFrame with two columns:
first is Grade with values 0 to 9
second is Criteria, with values 0 or 1.



Grade (0-9/ Criteria(0/1)


Grade Criteria
0 0 1
1 1 0
2 2 1
3 2 0
4 5 1
5 2 1



etc



I need to count "Criteria rate", which is actually sum of "1"s in Criteria column, divided by appropriate number of rows in Criteria column, but grouped by Grade column values.
For example, for Grade = 2 we count sum of 1 in Criteria column and divide it by number of rows with Grade 2: 2/3, so for Grade 2 we get 0.66 approx.
In my example, the answer should look like:



Grade / Criteria rate


Grade Criteria
0 0 1.000000
1 1 0.000000
2 2 0.666667
3 5 1.000000



Any ideas, how to do this?
Also the add. question - how to do this, if we have "yes/no" text values in Criteria column?
I've searched here, but found only solutions to groupby's, divided by total rows count etc.



Thank you!




2 Answers
2



You can aggregate sum with size and then divide columns:


sum


size


df = df.groupby('Grade')['Criteria'].agg(['sum','size'])
df['new'] = df['sum'] / df['size']
print (df)
sum size new
Grade
0 1 1 1.000000
1 0 1 0.000000
2 2 3 0.666667
5 1 1 1.000000



Or use custom function:


#not exclude NaNs
df = df.groupby('Grade')['Criteria'].agg(lambda x: x.sum() / len(x)).reset_index(name='new')

#exclude possible NaNs
df = df.groupby('Grade')['Criteria'].agg(lambda x: x.sum() / x.count()).reset_index(name='new')



For yes/no values working with boolean mask - Trues are processes like 1s:


yes/no


True


1


print (df)
Grade Criteria
0 0 yes
1 1 no
2 2 yes
3 2 no
4 5 yes
5 2 yes

df = (df['Criteria'] == 'yes').groupby(df['Grade']).agg(lambda x: x.sum() / len(x)).reset_index(name='new')
print (df)
Grade new
0 0 1.000000
1 1 0.000000
2 2 0.666667
3 5 1.000000





@Emilkindt - If my or another answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
Sep 13 '18 at 14:16



If criteria is 1 or 0, or even True or False


1


0


True


False



You can use mean


mean


groupby


df.groupby('Grade').mean()

Criteria
Grade
0 1.000000
1 0.000000
2 0.666667
5 1.000000


set_index


mean


df.set_index('Grade').mean(level=0)

Criteria
Grade
0 1.000000
1 0.000000
2 0.666667
5 1.000000



In the case that 'Criteria' are 'yes' and 'no' strings


'Criteria'


'yes'


'no'


df

Grade Criteria
0 0 yes
1 1 no
2 2 yes
3 2 no
4 5 yes
5 2 yes



You can group the boolean evaluation


df.Criteria.eq('yes').groupby(df.Grade).mean()

Grade
0 1.000000
1 0.000000
2 0.666667
5 1.000000
Name: Criteria, dtype: float64



Use reset_index on any of these answers to get the desired dataframe


reset_index





and add answer for yes and no values.
– jezrael
Sep 5 '18 at 16:27



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)