Read through small CSV file returns Regex too complicated

Read through small CSV file returns Regex too complicated



I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.



How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.




2 Answers
2



The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:


Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');



This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).



Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.



Your processing input loop can make use of a switch to implement a state engine efficiently:


public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size())
String charAtX = input.substring(index, index+1);
switch on currentState
when BEGIN_FIELD
...

when FOUND_QUOTE
...

...







N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other

– cropredy
Sep 17 '18 at 22:42






@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.

– sfdcfox
Sep 17 '18 at 23:25



Not sure if this is the best approach but what I have done is a custom iterator with a batch class and split each line for processing (mentioned in my initial question).



This has allowed me to bypass the regex errors.



Thanks for contributing an answer to Salesforce Stack Exchange!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌