How to take random sample from a csv file

I have a csv file with 10 rows:

Text,Class text0,class0 text1,class1 ... text9,class9

I am classifying the text, and then comparing it to the correct class labeled in the csv file. I want to take a random sample of 4 pieces of text and their class from it. I have:

import random textt=data['Text'] class_one=data['Class'] c=textt[0:] random_sample=random.sample(c,4)

My classification then starts with:

for i in random_sample:

but when i calculate the accuracy of the classification, it calculates it for the entire dataset. How can I get it to only calculate the accuracy for just the sample of 4 pieces of data?

edit:
for classification, i do:
for i in textt:
#classify text
results will look like:

choice 1 choice 2 choice 1 ...

and this is compared to the correct class from the csv file:

choice 1 choice 2 choice 2 ...

and accuracy will be calculated as 66.6% with:

for i in class_one: #if predicted_class= correct_class: #accuracy=number_correct/total_number

I want to only do the classification on the random sample, so instead of classifying all 10 examples, it would only classify 4

You haven't shown us how you're doing any of the stuff you're talking about; without a Minimal, Complete, and Verifiable example it's pretty hard to give you a concrete answer. But most likely, it's just a matter of calling <something>(random_sample) instead of <something>(c), or similar.
– abarnert
Aug 22 at 23:24

<something>(random_sample)

<something>(c)

Just use dataframe.sample(4) , refer the docs - pandas.pydata.org/pandas-docs/version/0.17.0/generated/…
– bigbounty
Aug 22 at 23:29

I just added an edit, does that help?
– Spongebob Squarepants
Aug 22 at 23:33

2 Answers
2

Best way to do it is use pandas:

import pandas as pd df=pd.read_csv("filename.csv") print(df.sample(4)) #whatever number of random sample size you want

Most probably the pandas solution is the right one for you. In case you want to split any CSV-file generically in python to a random shuffled 20%:80% training- and test-splits, you can use core python:

pandas

import random x = open("dataset.csv").readlines() random.shuffle(x) train = x[:int(total*0.8)] test = x[int(total*0.8):]

As it seems you are trying to evaluate some kind of classification (machine learning?) task, I would highly recommend looking up scikit-learn's train_test_split(), as it can stratify for other variables and also works with pandas DataFrames.

scikit-learn

train_test_split()

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt