text delimiter shifting values in dataframe










-2















I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question






















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

    – ALollz
    Nov 10 '18 at 23:18







  • 1





    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

    – user3476463
    Nov 11 '18 at 2:52











  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

    – user3476463
    Nov 11 '18 at 4:05











  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

    – ALollz
    Nov 11 '18 at 4:13















-2















I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question






















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

    – ALollz
    Nov 10 '18 at 23:18







  • 1





    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

    – user3476463
    Nov 11 '18 at 2:52











  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

    – user3476463
    Nov 11 '18 at 4:05











  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

    – ALollz
    Nov 11 '18 at 4:13













-2












-2








-2








I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question














I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N






python-3.x pandas csv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 10 '18 at 23:13









user3476463user3476463

73821333




73821333












  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

    – ALollz
    Nov 10 '18 at 23:18







  • 1





    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

    – user3476463
    Nov 11 '18 at 2:52











  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

    – user3476463
    Nov 11 '18 at 4:05











  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

    – ALollz
    Nov 11 '18 at 4:13

















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

    – ALollz
    Nov 10 '18 at 23:18







  • 1





    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

    – user3476463
    Nov 11 '18 at 2:52











  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

    – user3476463
    Nov 11 '18 at 4:05











  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

    – ALollz
    Nov 11 '18 at 4:13
















The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

– ALollz
Nov 10 '18 at 23:18






The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.

– ALollz
Nov 10 '18 at 23:18





1




1





@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

– user3476463
Nov 11 '18 at 2:52





@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?

– user3476463
Nov 11 '18 at 2:52













@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

– user3476463
Nov 11 '18 at 4:05





@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"

– user3476463
Nov 11 '18 at 4:05













I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

– ALollz
Nov 11 '18 at 4:13





I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once

– ALollz
Nov 11 '18 at 4:13












1 Answer
1






active

oldest

votes


















2














The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



    Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






    share|improve this answer



























      2














      The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



      Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






      share|improve this answer

























        2












        2








        2







        The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



        Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






        share|improve this answer













        The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



        Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 '18 at 3:52









        dmitriysdmitriys

        15119




        15119



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

            Edmonton

            Crossroads (UK TV series)