How to take random sample from a csv file

How to take random sample from a csv file



I have a csv file with 10 rows:


Text,Class
text0,class0
text1,class1
...
text9,class9



I am classifying the text, and then comparing it to the correct class labeled in the csv file. I want to take a random sample of 4 pieces of text and their class from it. I have:


import random
textt=data['Text']
class_one=data['Class']
c=textt[0:]
random_sample=random.sample(c,4)



My classification then starts with:


for i in random_sample:



but when i calculate the accuracy of the classification, it calculates it for the entire dataset. How can I get it to only calculate the accuracy for just the sample of 4 pieces of data?



edit:
for classification, i do:
for i in textt:
#classify text
results will look like:


choice 1
choice 2
choice 1
...



and this is compared to the correct class from the csv file:


choice 1
choice 2
choice 2
...



and accuracy will be calculated as 66.6% with:


for i in class_one:
#if predicted_class= correct_class:
#accuracy=number_correct/total_number



I want to only do the classification on the random sample, so instead of classifying all 10 examples, it would only classify 4





You haven't shown us how you're doing any of the stuff you're talking about; without a Minimal, Complete, and Verifiable example it's pretty hard to give you a concrete answer. But most likely, it's just a matter of calling <something>(random_sample) instead of <something>(c), or similar.
– abarnert
Aug 22 at 23:24


<something>(random_sample)


<something>(c)





Just use dataframe.sample(4) , refer the docs - pandas.pydata.org/pandas-docs/version/0.17.0/generated/…
– bigbounty
Aug 22 at 23:29





I just added an edit, does that help?
– Spongebob Squarepants
Aug 22 at 23:33




2 Answers
2



Best way to do it is use pandas:


import pandas as pd
df=pd.read_csv("filename.csv")
print(df.sample(4)) #whatever number of random sample size you want



Most probably the pandas solution is the right one for you. In case you want to split any CSV-file generically in python to a random shuffled 20%:80% training- and test-splits, you can use core python:


pandas


import random
x = open("dataset.csv").readlines()
random.shuffle(x)
train = x[:int(total*0.8)]
test = x[int(total*0.8):]



As it seems you are trying to evaluate some kind of classification (machine learning?) task, I would highly recommend looking up scikit-learn's train_test_split(), as it can stratify for other variables and also works with pandas DataFrames.


scikit-learn


train_test_split()






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế