Pandas - show percentage of values in one column, grouped by other column

So I have a Pandas DataFrame with two columns:
first is Grade with values 0 to 9
second is Criteria, with values 0 or 1.

Grade (0-9/ Criteria(0/1)

Grade Criteria 0 0 1 1 1 0 2 2 1 3 2 0 4 5 1 5 2 1

etc

I need to count "Criteria rate", which is actually sum of "1"s in Criteria column, divided by appropriate number of rows in Criteria column, but grouped by Grade column values.
For example, for Grade = 2 we count sum of 1 in Criteria column and divide it by number of rows with Grade 2: 2/3, so for Grade 2 we get 0.66 approx.
In my example, the answer should look like:

Grade / Criteria rate

Grade Criteria 0 0 1.000000 1 1 0.000000 2 2 0.666667 3 5 1.000000

Any ideas, how to do this?
Also the add. question - how to do this, if we have "yes/no" text values in Criteria column?
I've searched here, but found only solutions to groupby's, divided by total rows count etc.

Thank you!

2 Answers
2

You can aggregate sum with size and then divide columns:

sum

size

df = df.groupby('Grade')['Criteria'].agg(['sum','size']) df['new'] = df['sum'] / df['size'] print (df) sum size new Grade 0 1 1 1.000000 1 0 1 0.000000 2 2 3 0.666667 5 1 1 1.000000

Or use custom function:

#not exclude NaNs df = df.groupby('Grade')['Criteria'].agg(lambda x: x.sum() / len(x)).reset_index(name='new') #exclude possible NaNs df = df.groupby('Grade')['Criteria'].agg(lambda x: x.sum() / x.count()).reset_index(name='new')

For yes/no values working with boolean mask - Trues are processes like 1s:

yes/no

True

1

print (df) Grade Criteria 0 0 yes 1 1 no 2 2 yes 3 2 no 4 5 yes 5 2 yes df = (df['Criteria'] == 'yes').groupby(df['Grade']).agg(lambda x: x.sum() / len(x)).reset_index(name='new') print (df) Grade new 0 0 1.000000 1 1 0.000000 2 2 0.666667 3 5 1.000000

@Emilkindt - If my or another answer was helpful, don't forget accept it - click on the check mark beside the answer to toggle it from greyed out to filled in. Thanks.
– jezrael
Sep 13 '18 at 14:16

If criteria is 1 or 0, or even True or False

1

0

True

False

You can use mean

mean

groupby

df.groupby('Grade').mean() Criteria Grade 0 1.000000 1 0.000000 2 0.666667 5 1.000000

set_index

mean

df.set_index('Grade').mean(level=0) Criteria Grade 0 1.000000 1 0.000000 2 0.666667 5 1.000000

In the case that 'Criteria' are 'yes' and 'no' strings

'Criteria'

'yes'

'no'

df Grade Criteria 0 0 yes 1 1 no 2 2 yes 3 2 no 4 5 yes 5 2 yes

You can group the boolean evaluation

df.Criteria.eq('yes').groupby(df.Grade).mean() Grade 0 1.000000 1 0.000000 2 0.666667 5 1.000000 Name: Criteria, dtype: float64

Use reset_index on any of these answers to get the desired dataframe

reset_index

and add answer for yes and no values.
– jezrael
Sep 5 '18 at 16:27

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt