Ranking variables according to their percent contribution to total

Ranking variables according to their percent contribution to total



Consider the following example data:


psu | sumsc sumst sumobc sumother sumcaste
-------|-----------------------------------------------
10018 | 3 2 0 4 9
|
10061 | 0 0 2 5 7
|
10116 | 1 1 2 4 8
|
10121 | 3 0 1 2 6
|
20002 | 4 1 0 1 6
-------------------------------------------------------



I want to rank the variables sumsc, sumst, sumobc, and sumother according to their percent contribution to sumcaste (this is the total of all variables) within psu.


sumsc


sumst


sumobc


sumother


sumcaste


psu



Could anyone help me do this in Stata?




2 Answers
2



First we enter the data:


clear all
set more off

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end



Second, we prepare the reshape:


reshape


local j=1
foreach var of varlist sumsc sumst sumobc sumother
gen temprl`j' = `var' / sumcaste
ren `var' addi`j'
local ++j


reshape long temprl addi, i(psu) j(ord)
lab def ord 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
lab val ord ord



Third, we order before presenting:


gsort psu -temprl
by psu: gen nro=_n
drop temprl
order psu nro ord



Fourth, presenting the data:


br psu nro ord addi



EDIT:



This is a combination of Aron's solution with mine (@PearlySpencer):


clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

local i = 0
foreach var of varlist sumsc sumst sumobc sumother
local ++i
generate pct`i' = 100 * `var' / sumcaste
rename `var' temp`i'
local rvars "`rvars' r`i'"


rowranks pct*, generate("`rvars'") field lowrank

reshape long pct temp r, i(psu) j(name)

label define name 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
label values name name

keep psu name pct r
bysort psu (r): replace r = sum(r != r[_n-1])



Which gives you the desired output:


list, sepby(psu) noobs

+---------------------------------+
| psu name pct r |
|---------------------------------|
| 10018 sumother 44.44444 1 |
| 10018 sumsc 33.33333 2 |
| 10018 sumst 22.22222 3 |
| 10018 sumobc 0 4 |
|---------------------------------|
| 10061 sumother 71.42857 1 |
| 10061 sumobc 28.57143 2 |
| 10061 sumsc 0 3 |
| 10061 sumst 0 3 |
|---------------------------------|
| 10116 sumother 50 1 |
| 10116 sumobc 25 2 |
| 10116 sumst 12.5 3 |
| 10116 sumsc 12.5 3 |
|---------------------------------|
| 10121 sumsc 50 1 |
| 10121 sumother 33.33333 2 |
| 10121 sumobc 16.66667 3 |
| 10121 sumst 0 4 |
|---------------------------------|
| 20002 sumsc 66.66666 1 |
| 20002 sumst 16.66667 2 |
| 20002 sumother 16.66667 2 |
| 20002 sumobc 0 3 |
+---------------------------------+



This approach will be useful if you need the variables for further analysis as opposed to just displaying the results.






This does not answer the OP's question in the sense that it does not get the rank as he wants it (see row columns in my answer).

– Pearly Spencer
Jul 23 '18 at 16:13







I have taken the liberty of adapting your solution to display the ranks.

– Pearly Spencer
Jul 23 '18 at 16:46



First you need to calculate percentages:


clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

foreach var of varlist sumsc sumst sumobc sumother
generate pct_`var' = 100 * `var' / sumcaste


egen pcttotal = rowtotal(pct_*)

list pct_* pcttotal, abbreviate(15) noobs

+--------------------------------------------------------------+
| pct_sumsc pct_sumst pct_sumobc pct_sumother pcttotal |
|--------------------------------------------------------------|
| 33.33333 22.22222 0 44.44444 100 |
| 0 0 28.57143 71.42857 100 |
| 12.5 12.5 25 50 100 |
| 50 0 16.66667 33.33333 100 |
| 66.66666 16.66667 0 16.66667 99.99999 |
+--------------------------------------------------------------+



Then you need to get the ranks and do some gymnastics:


rowranks pct_*, generate(r_sumsc r_sumst r_sumobc r_sumother) field lowrank

mkmat r_*, matrix(A)
matrix A = A'
svmat A, names(row)

local matnames : rownames A
quietly generate name = " "

forvalues i = 1 / `: word count `matnames''
quietly replace name = substr(`"`: word `i' of `matnames''"', 3, .) in `i'


ds row*

foreach var in `r(varlist)'
sort `var' name
generate `var'b = sum(`var' != `var'[_n-1])
drop `var'
rename `var'b `var'
list name `var' if name != " ", noobs
display ""



The above will give you what you want:


+-----------------+
| name row1 |
|-----------------|
| sumother 1 |
| sumsc 2 |
| sumst 3 |
| sumobc 4 |
+-----------------+

+-----------------+
| name row2 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+

+-----------------+
| name row3 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+

+-----------------+
| name row4 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumobc 3 |
| sumst 4 |
+-----------------+

+-----------------+
| name row5 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumst 2 |
| sumobc 3 |
+-----------------+



Note that you will first need to install the community-contributed command rowranks before you execute the above code:


rowranks


net install pr0046.pkg



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)