Randomly selecting different percentages of data in Python

Randomly selecting different percentages of data in Python



Python beginner, here. I have a dataset with 101 rows which I have imported into Python (as a csv file) using Pandas. I essentially want to randomly generate a number between 0 and 1 and, based on the result, randomly select the percent equivalent from the dataset. So, for instance, a randomly generated number of 0.89 would require 89% of the data to be selected.



I also want to specify different percentages such that I have, for instance, 89%, 8% and 3% of the data randomly selected at once. This is so I can make different assumptions based on X% of data that has been selected (for instance, for 3% of rows selected print('A'), etc.). I finally want to simulate the whole thing several times and store the results.



I have been experimenting with different types of code, such as the df.sample(frac=0.89) and etc. but I'm not sure how to extend this to select different percentages at the same time.



My current code is:


import random
import pandas import pandas as pd

df = pd.read_csv(r'R_100.csv', encoding='cp1252')
df_1 = df['R_MD'].sample(frac=0.8889)
Total = df['PR_MD'].sum()
print(df_1, 'Total=', Total)



Any advice is much appreciated. Thanks in advance.






Can you frame this question even more clearer ?

– CSMaverick
Sep 18 '18 at 12:27






Can you post snippets of the code you've used already? Are you using pandas to store your data or some other format?

– vealkind
Sep 18 '18 at 12:29


pandas






Please put this code inside your question, by simply editing the question. You can also edit some of your phrasings there to make it clearer, as long as it does not change the meaning of the question.

– dennlinger
Sep 18 '18 at 12:43




1 Answer
1



Here is something you can do, you need a function to do this every time.


import pandas as pd
df = pd.read_csv(r'R_100.csv', encoding='cp1252')



After you read the dataframe


def frac(dataframe, fraction, other_info=None):
"""Returns fraction of data"""
return dataframe.sample(frac=fraction)



here other_info can be specific column name and then call the function however many times you want


df_1 = frac(df, 0.3)



it will return you a new dataframe that you can use for anything you want, you can use this something like this as I infer from your example you are taking sum of a column


import random

def random_gen():
"""generates random number"""
return random.randint(0,1)




def print_sum(column_name):
"""Prints sum"""

# call the random_gen() to give out a number
rand_num = random_gen()

# pass the number as fraction parameter to frac()
df_tmp = frac(df, rand_num)

print(df_tmp[str(column_name)].sum())



Or if you want



but I'm not sure how to extend this to select different percentages at the same time.



Then just change the print_sum as follows


print_sum


def print_sum(column_name):
"""returns result for 10 iterations"""
# list to store all the result
results =

# selecting different percentage fraction
# for 10 different random fraction or you can have a list of all the fractions you want
# and then for loop over that list
for i in range(1,10):
# generate random number
fracr = random_gen()
# pass the number as fraction parameter to frac()
df_tmp = frac(df, fracr)
result.append(df_tmp[str(column_name)].sum())

return result



Hope this helps! Feedback is much appreciated :)






Hi there. Thanks so much for your comment, I think this is what I was looking for. Yet, I am having trouble printing the results. Where exactly in the code would I specify the column name, 'PR_MD'?

– EarlofMar
Sep 18 '18 at 15:43






When you call the function print_sum(‘PR_MD’). Please accept and upvote the answer if that is what your were looking for.

– WiLL_K
Sep 18 '18 at 15:45







One final thing: I am trying to visualise my data using some distribution. Would I have to put this code in the for loop for it to take effect?

– EarlofMar
Sep 19 '18 at 11:31






@EarlofMar no you can have another list to store that information and then loop over that list to visualize the distribution.

– WiLL_K
Sep 19 '18 at 11:38



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ