Ranking variables according to their percent contribution to total
Ranking variables according to their percent contribution to total
Consider the following example data:
psu | sumsc sumst sumobc sumother sumcaste
-------|-----------------------------------------------
10018 | 3 2 0 4 9
|
10061 | 0 0 2 5 7
|
10116 | 1 1 2 4 8
|
10121 | 3 0 1 2 6
|
20002 | 4 1 0 1 6
-------------------------------------------------------
I want to rank the variables sumsc
, sumst
, sumobc
, and sumother
according to their percent contribution to sumcaste
(this is the total of all variables) within psu
.
sumsc
sumst
sumobc
sumother
sumcaste
psu
Could anyone help me do this in Stata?
2 Answers
2
First we enter the data:
clear all
set more off
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
Second, we prepare the reshape
:
reshape
local j=1
foreach var of varlist sumsc sumst sumobc sumother
gen temprl`j' = `var' / sumcaste
ren `var' addi`j'
local ++j
reshape long temprl addi, i(psu) j(ord)
lab def ord 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
lab val ord ord
Third, we order before presenting:
gsort psu -temprl
by psu: gen nro=_n
drop temprl
order psu nro ord
Fourth, presenting the data:
br psu nro ord addi
EDIT:
This is a combination of Aron's solution with mine (@PearlySpencer):
clear
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
local i = 0
foreach var of varlist sumsc sumst sumobc sumother
local ++i
generate pct`i' = 100 * `var' / sumcaste
rename `var' temp`i'
local rvars "`rvars' r`i'"
rowranks pct*, generate("`rvars'") field lowrank
reshape long pct temp r, i(psu) j(name)
label define name 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
label values name name
keep psu name pct r
bysort psu (r): replace r = sum(r != r[_n-1])
Which gives you the desired output:
list, sepby(psu) noobs
+---------------------------------+
| psu name pct r |
|---------------------------------|
| 10018 sumother 44.44444 1 |
| 10018 sumsc 33.33333 2 |
| 10018 sumst 22.22222 3 |
| 10018 sumobc 0 4 |
|---------------------------------|
| 10061 sumother 71.42857 1 |
| 10061 sumobc 28.57143 2 |
| 10061 sumsc 0 3 |
| 10061 sumst 0 3 |
|---------------------------------|
| 10116 sumother 50 1 |
| 10116 sumobc 25 2 |
| 10116 sumst 12.5 3 |
| 10116 sumsc 12.5 3 |
|---------------------------------|
| 10121 sumsc 50 1 |
| 10121 sumother 33.33333 2 |
| 10121 sumobc 16.66667 3 |
| 10121 sumst 0 4 |
|---------------------------------|
| 20002 sumsc 66.66666 1 |
| 20002 sumst 16.66667 2 |
| 20002 sumother 16.66667 2 |
| 20002 sumobc 0 3 |
+---------------------------------+
This approach will be useful if you need the variables for further analysis as opposed to just displaying the results.
I have taken the liberty of adapting your solution to display the ranks.
– Pearly Spencer
Jul 23 '18 at 16:46
First you need to calculate percentages:
clear
input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end
foreach var of varlist sumsc sumst sumobc sumother
generate pct_`var' = 100 * `var' / sumcaste
egen pcttotal = rowtotal(pct_*)
list pct_* pcttotal, abbreviate(15) noobs
+--------------------------------------------------------------+
| pct_sumsc pct_sumst pct_sumobc pct_sumother pcttotal |
|--------------------------------------------------------------|
| 33.33333 22.22222 0 44.44444 100 |
| 0 0 28.57143 71.42857 100 |
| 12.5 12.5 25 50 100 |
| 50 0 16.66667 33.33333 100 |
| 66.66666 16.66667 0 16.66667 99.99999 |
+--------------------------------------------------------------+
Then you need to get the ranks and do some gymnastics:
rowranks pct_*, generate(r_sumsc r_sumst r_sumobc r_sumother) field lowrank
mkmat r_*, matrix(A)
matrix A = A'
svmat A, names(row)
local matnames : rownames A
quietly generate name = " "
forvalues i = 1 / `: word count `matnames''
quietly replace name = substr(`"`: word `i' of `matnames''"', 3, .) in `i'
ds row*
foreach var in `r(varlist)'
sort `var' name
generate `var'b = sum(`var' != `var'[_n-1])
drop `var'
rename `var'b `var'
list name `var' if name != " ", noobs
display ""
The above will give you what you want:
+-----------------+
| name row1 |
|-----------------|
| sumother 1 |
| sumsc 2 |
| sumst 3 |
| sumobc 4 |
+-----------------+
+-----------------+
| name row2 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+
+-----------------+
| name row3 |
|-----------------|
| sumother 1 |
| sumobc 2 |
| sumsc 3 |
| sumst 3 |
+-----------------+
+-----------------+
| name row4 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumobc 3 |
| sumst 4 |
+-----------------+
+-----------------+
| name row5 |
|-----------------|
| sumsc 1 |
| sumother 2 |
| sumst 2 |
| sumobc 3 |
+-----------------+
Note that you will first need to install the community-contributed command rowranks
before you execute the above code:
rowranks
net install pr0046.pkg
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This does not answer the OP's question in the sense that it does not get the rank as he wants it (see row columns in my answer).
– Pearly Spencer
Jul 23 '18 at 16:13