get list of pandas dataframe columns based on data type

get list of pandas dataframe columns based on data type



If I have a dataframe with the following columns:


1. NAME object
2. On_Time object
3. On_Budget object
4. %actual_hr float64
5. Baseline Start Date datetime64[ns]
6. Forecast Start Date datetime64[ns]



I would like to be able to say: here is a dataframe, give me a list of the columns which are of type Object or of type DateTime?



I have a function which converts numbers (Float64) to two decimal places, and I would like to use this list of dataframe columns, of a particular type, and run it through this function to convert them all to 2dp.



Maybe:


For c in col_list: if c.dtype = "Something"
list
List.append(c)?





When I came to this question, I was looking for a way to create exactly the list in the top. df.dtypes does that.
– Martin Thoma
Aug 17 at 6:19


df.dtypes




10 Answers
10



If you want a list of columns of a certain type, you can use groupby:


groupby


>>> df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>> df
A B C D E
0 1 2.3456 c d 78

[1 rows x 5 columns]
>>> df.dtypes
A int64
B float64
C object
D object
E int64
dtype: object
>>> g = df.columns.to_series().groupby(df.dtypes).groups
>>> g
dtype('int64'): ['A', 'E'], dtype('float64'): ['B'], dtype('O'): ['C', 'D']
>>> k.name: v for k, v in g.items()
'object': ['C', 'D'], 'int64': ['A', 'E'], 'float64': ['B']





This is useful as a Data Quality check, where one ensures that columns are of the type that one expects.
– prismalytics.io
Apr 14 '16 at 15:18





this doesn't work if all your dataframe columns are returning object type, regardless of their actual contents
– user5359531
Jul 17 '17 at 23:46


object





@user5359531 that doesn't mean it's not working, that actually means your DataFrame columns weren't cast to the type you think they should be, which can happen for a variety of reasons.
– Marc
Sep 5 '17 at 13:56





If you are just selecting columns by data type, then this answer is obsolete. Use select_dtypes instead
– Ted Petrou
Nov 3 '17 at 16:58


select_dtypes





How do you index this grouped dataframe afterwards?
– Allen Wang
Jul 31 at 0:43



As of pandas v0.14.1, you can utilize select_dtypes() to select columns by dtype


select_dtypes()


In [2]: df = pd.DataFrame('NAME': list('abcdef'),
'On_Time': [True, False] * 3,
'On_Budget': [False, True] * 3)

In [3]: df.select_dtypes(include=['bool'])
Out[3]:
On_Budget On_Time
0 False True
1 True False
2 False True
3 True False
4 False True
5 True False

In [4]: mylist = list(df.select_dtypes(include=['bool']).columns)

In [5]: mylist
Out[5]: ['On_Budget', 'On_Time']



You can use boolean mask on the dtypes attribute:


In [11]: df = pd.DataFrame([[1, 2.3456, 'c']])

In [12]: df.dtypes
Out[12]:
0 int64
1 float64
2 object
dtype: object

In [13]: msk = df.dtypes == np.float64 # or object, etc.

In [14]: msk
Out[14]:
0 False
1 True
2 False
dtype: bool



You can look at just those columns with the desired dtype:


In [15]: df.loc[:, msk]
Out[15]:
1
0 2.3456



Now you can use round (or whatever) and assign it back:


In [16]: np.round(df.loc[:, msk], 2)
Out[16]:
1
0 2.35

In [17]: df.loc[:, msk] = np.round(df.loc[:, msk], 2)

In [18]: df
Out[18]:
0 1 2
0 1 2.35 c





I'd love to be able to write a function which takes in the name of a dataframe, and then returns a dictionary of lists, with the dictionary key being the datatype and the value being the list of columns from the dataframe which are of that datatype.
– yoshiserry
Mar 18 '14 at 8:05





def col_types(x,pd):
– itthrill
Aug 28 at 3:03



Using dtype will give you desired column's data type:


dtype


dataframe['column1'].dtype



if you want to know data types of all the column at once, you can use plural of dtype as dtypes:


dtype


dataframe.dtypes





This should be the accepted answer, it prints the data types in almost exactly the format OP wants.
– abhi divekar
Dec 1 '17 at 17:25





Question was about listing only the specific datatype for example using df.select_dtypes(include=['Object','DateTime']).columns as discussed below
– DfAC
Jan 27 at 12:47


df.select_dtypes(include=['Object','DateTime']).columns



use df.info() where df is a pandas datafarme


df.info()


df





What I needed, Thanks!
– Gabriel Fair
Apr 15 at 20:01



df.select_dtypes(['object'])



This should do the trick



If you want a list of only the object columns you could do:


non_numerics = [x for x in df.columns
if not (df[x].dtype == np.float64
or df[x].dtype == np.int64)]



and then if you want to get another list of only the numerics:


numerics = [x for x in df.columns if x not in non_numerics]



The most direct way to get a list of columns of certain dtype e.g. 'object':


df.select_dtypes(include='object').columns



For example:


>>df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>df.dtypes

A int64
B float64
C object
D object
E int64
dtype: object



To get all 'object' dtype columns:


>>df.select_dtypes(include='object').columns

Index(['C', 'D'], dtype='object')



For just the list:


>>list(df.select_dtypes(include='object').columns)

['C', 'D']



I came up with this three liner.



Essentially, here's what it does:


inp = pd.read_csv('filename.csv') # read input. Add read_csv arguments as needed
columns = pd.DataFrame('column_names': inp.columns, 'datatypes': inp.dtypes)
columns.to_csv(inp+'columns_list.csv', encoding='utf-8') # encoding is optional



This made my life much easier in trying to generate schemas on the fly. Hope this helps



for yoshiserry;


def col_types(x,pd):
dtypes=x.dtypes
dtypes_col=dtypes.index
dtypes_type=dtypes.value
column_types=dict(zip(dtypes_col,dtypes_type))
return column_types




Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



Would you like to answer one of these unanswered questions instead?

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌