get list of pandas dataframe columns based on data type
get list of pandas dataframe columns based on data type
If I have a dataframe with the following columns:
1. NAME object
2. On_Time object
3. On_Budget object
4. %actual_hr float64
5. Baseline Start Date datetime64[ns]
6. Forecast Start Date datetime64[ns]
I would like to be able to say: here is a dataframe, give me a list of the columns which are of type Object or of type DateTime?
I have a function which converts numbers (Float64) to two decimal places, and I would like to use this list of dataframe columns, of a particular type, and run it through this function to convert them all to 2dp.
Maybe:
For c in col_list: if c.dtype = "Something"
list
List.append(c)?
df.dtypes
10 Answers
10
If you want a list of columns of a certain type, you can use groupby
:
groupby
>>> df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>> df
A B C D E
0 1 2.3456 c d 78
[1 rows x 5 columns]
>>> df.dtypes
A int64
B float64
C object
D object
E int64
dtype: object
>>> g = df.columns.to_series().groupby(df.dtypes).groups
>>> g
dtype('int64'): ['A', 'E'], dtype('float64'): ['B'], dtype('O'): ['C', 'D']
>>> k.name: v for k, v in g.items()
'object': ['C', 'D'], 'int64': ['A', 'E'], 'float64': ['B']
This is useful as a Data Quality check, where one ensures that columns are of the type that one expects.
– prismalytics.io
Apr 14 '16 at 15:18
this doesn't work if all your dataframe columns are returning
object
type, regardless of their actual contents– user5359531
Jul 17 '17 at 23:46
object
@user5359531 that doesn't mean it's not working, that actually means your DataFrame columns weren't cast to the type you think they should be, which can happen for a variety of reasons.
– Marc
Sep 5 '17 at 13:56
If you are just selecting columns by data type, then this answer is obsolete. Use
select_dtypes
instead– Ted Petrou
Nov 3 '17 at 16:58
select_dtypes
How do you index this grouped dataframe afterwards?
– Allen Wang
Jul 31 at 0:43
As of pandas v0.14.1, you can utilize select_dtypes()
to select columns by dtype
select_dtypes()
In [2]: df = pd.DataFrame('NAME': list('abcdef'),
'On_Time': [True, False] * 3,
'On_Budget': [False, True] * 3)
In [3]: df.select_dtypes(include=['bool'])
Out[3]:
On_Budget On_Time
0 False True
1 True False
2 False True
3 True False
4 False True
5 True False
In [4]: mylist = list(df.select_dtypes(include=['bool']).columns)
In [5]: mylist
Out[5]: ['On_Budget', 'On_Time']
You can use boolean mask on the dtypes attribute:
In [11]: df = pd.DataFrame([[1, 2.3456, 'c']])
In [12]: df.dtypes
Out[12]:
0 int64
1 float64
2 object
dtype: object
In [13]: msk = df.dtypes == np.float64 # or object, etc.
In [14]: msk
Out[14]:
0 False
1 True
2 False
dtype: bool
You can look at just those columns with the desired dtype:
In [15]: df.loc[:, msk]
Out[15]:
1
0 2.3456
Now you can use round (or whatever) and assign it back:
In [16]: np.round(df.loc[:, msk], 2)
Out[16]:
1
0 2.35
In [17]: df.loc[:, msk] = np.round(df.loc[:, msk], 2)
In [18]: df
Out[18]:
0 1 2
0 1 2.35 c
I'd love to be able to write a function which takes in the name of a dataframe, and then returns a dictionary of lists, with the dictionary key being the datatype and the value being the list of columns from the dataframe which are of that datatype.
– yoshiserry
Mar 18 '14 at 8:05
def col_types(x,pd):
– itthrill
Aug 28 at 3:03
Using dtype
will give you desired column's data type:
dtype
dataframe['column1'].dtype
if you want to know data types of all the column at once, you can use plural of dtype
as dtypes:
dtype
dataframe.dtypes
This should be the accepted answer, it prints the data types in almost exactly the format OP wants.
– abhi divekar
Dec 1 '17 at 17:25
Question was about listing only the specific datatype for example using
df.select_dtypes(include=['Object','DateTime']).columns
as discussed below– DfAC
Jan 27 at 12:47
df.select_dtypes(include=['Object','DateTime']).columns
use df.info()
where df
is a pandas datafarme
df.info()
df
What I needed, Thanks!
– Gabriel Fair
Apr 15 at 20:01
df.select_dtypes(['object'])
This should do the trick
If you want a list of only the object columns you could do:
non_numerics = [x for x in df.columns
if not (df[x].dtype == np.float64
or df[x].dtype == np.int64)]
and then if you want to get another list of only the numerics:
numerics = [x for x in df.columns if x not in non_numerics]
The most direct way to get a list of columns of certain dtype e.g. 'object':
df.select_dtypes(include='object').columns
For example:
>>df = pd.DataFrame([[1, 2.3456, 'c', 'd', 78]], columns=list("ABCDE"))
>>df.dtypes
A int64
B float64
C object
D object
E int64
dtype: object
To get all 'object' dtype columns:
>>df.select_dtypes(include='object').columns
Index(['C', 'D'], dtype='object')
For just the list:
>>list(df.select_dtypes(include='object').columns)
['C', 'D']
I came up with this three liner.
Essentially, here's what it does:
inp = pd.read_csv('filename.csv') # read input. Add read_csv arguments as needed
columns = pd.DataFrame('column_names': inp.columns, 'datatypes': inp.dtypes)
columns.to_csv(inp+'columns_list.csv', encoding='utf-8') # encoding is optional
This made my life much easier in trying to generate schemas on the fly. Hope this helps
for yoshiserry;
def col_types(x,pd):
dtypes=x.dtypes
dtypes_col=dtypes.index
dtypes_type=dtypes.value
column_types=dict(zip(dtypes_col,dtypes_type))
return column_types
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
When I came to this question, I was looking for a way to create exactly the list in the top.
df.dtypes
does that.– Martin Thoma
Aug 17 at 6:19