Adding observations to panel in Stata

Adding observations to panel in Stata



I have a panel data from year t1 to t2. Some individuals enter the sample after t1 and/or exit the sample before t2. For efficiency (large sample), the dataset only contains rows for years when individuals are observed.


t1


t2


t1


t2



I want to add a new observation per individual, containing the year after an individual left the sample. So, if someone left in, say 2003, I want the new observation to contain the individual's id and the value 2004 in the year variable. Every other variable in that observation should be missing.


2004



This is my approach, using a sample dataset:


webuse nlswork, clear

* Here goes plenty of lines of codes modifying the dataset ... for generality *

timer on 1

preserve
keep id year
bysort id (year) : keep if _n == _N
replace year = year + 1
save temp.dta, replace
restore

append using temp.dta
sort id year
erase temp.dta

timer off 1
timer list



I think this might be a bit inefficient, as it includes a preserve/restore, saving/deleting an additional database, and an append, all relatively time-consuming actions. Something like tsfill, last would be amazing, but that option doesn't exist. Is anyone aware of a more efficient method? The code above includes timer, so anyone can benchmark it against another method.


tsfill, last





The preserve / restore is not necessary strictly speaking.
– Pearly Spencer
Aug 28 at 17:12


preserve


restore





@PearlySpencer you mean by changing the order of dataset loading? That's the case in this example, but in reality I do many stuff with the dataset, before.
– luchonacho
Aug 28 at 17:13





@NickCox updated
– luchonacho
Aug 29 at 15:15




1 Answer
1



I am never that impressed by attempts to save seconds when coding takes minutes. This is more direct than your approach.


bysort id (year) : gen byte last = _n == _N
expand 2 if last
bysort id (year) : replace year = year + 1 if _n == _N



EDIT: You need to loop over the other variables in your dataset to replace their values with missing. For simplicity, I will assume that they are all numeric.


bysort id (year) : replace last = _n == _N
ds id year, not
quietly foreach v in `r(varlist)'
replace `v' = . if last






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ