Randomly selecting different percentages of data in Python
Randomly selecting different percentages of data in Python
Python beginner, here. I have a dataset with 101 rows which I have imported into Python (as a csv file) using Pandas. I essentially want to randomly generate a number between 0 and 1 and, based on the result, randomly select the percent equivalent from the dataset. So, for instance, a randomly generated number of 0.89 would require 89% of the data to be selected.
I also want to specify different percentages such that I have, for instance, 89%, 8% and 3% of the data randomly selected at once. This is so I can make different assumptions based on X% of data that has been selected (for instance, for 3% of rows selected print('A'), etc.). I finally want to simulate the whole thing several times and store the results.
I have been experimenting with different types of code, such as the df.sample(frac=0.89) and etc. but I'm not sure how to extend this to select different percentages at the same time.
My current code is:
import random
import pandas import pandas as pd
df = pd.read_csv(r'R_100.csv', encoding='cp1252')
df_1 = df['R_MD'].sample(frac=0.8889)
Total = df['PR_MD'].sum()
print(df_1, 'Total=', Total)
Any advice is much appreciated. Thanks in advance.
Can you post snippets of the code you've used already? Are you using
pandas to store your data or some other format?– vealkind
Sep 18 '18 at 12:29
pandas
Please put this code inside your question, by simply editing the question. You can also edit some of your phrasings there to make it clearer, as long as it does not change the meaning of the question.
– dennlinger
Sep 18 '18 at 12:43
1 Answer
1
Here is something you can do, you need a function to do this every time.
import pandas as pd
df = pd.read_csv(r'R_100.csv', encoding='cp1252')
After you read the dataframe
def frac(dataframe, fraction, other_info=None):
"""Returns fraction of data"""
return dataframe.sample(frac=fraction)
here other_info can be specific column name and then call the function however many times you want
df_1 = frac(df, 0.3)
it will return you a new dataframe that you can use for anything you want, you can use this something like this as I infer from your example you are taking sum of a column
import random
def random_gen():
"""generates random number"""
return random.randint(0,1)
def print_sum(column_name):
"""Prints sum"""
# call the random_gen() to give out a number
rand_num = random_gen()
# pass the number as fraction parameter to frac()
df_tmp = frac(df, rand_num)
print(df_tmp[str(column_name)].sum())
Or if you want
but I'm not sure how to extend this to select different percentages at the same time.
Then just change the print_sum as follows
print_sum
def print_sum(column_name):
"""returns result for 10 iterations"""
# list to store all the result
results =
# selecting different percentage fraction
# for 10 different random fraction or you can have a list of all the fractions you want
# and then for loop over that list
for i in range(1,10):
# generate random number
fracr = random_gen()
# pass the number as fraction parameter to frac()
df_tmp = frac(df, fracr)
result.append(df_tmp[str(column_name)].sum())
return result
Hope this helps! Feedback is much appreciated :)
Hi there. Thanks so much for your comment, I think this is what I was looking for. Yet, I am having trouble printing the results. Where exactly in the code would I specify the column name, 'PR_MD'?
– EarlofMar
Sep 18 '18 at 15:43
When you call the function print_sum(‘PR_MD’). Please accept and upvote the answer if that is what your were looking for.
– WiLL_K
Sep 18 '18 at 15:45
One final thing: I am trying to visualise my data using some distribution. Would I have to put this code in the for loop for it to take effect?
– EarlofMar
Sep 19 '18 at 11:31
@EarlofMar no you can have another list to store that information and then loop over that list to visualize the distribution.
– WiLL_K
Sep 19 '18 at 11:38
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy
Can you frame this question even more clearer ?
– CSMaverick
Sep 18 '18 at 12:27