Read through small CSV file returns Regex too complicated
Read through small CSV file returns Regex too complicated
I have a method which reads through a CSV file and if any of the columns match my keyset prepares them in an sObject, with my test files which contained 1,000 records but only 4 columns this worked fine, however increasing the number of columns now (22 columns) returns "Regex too complicated" despite only having around 100 rows.
How can I get around this issue? I have googled and found articles like this but they are missing key parts of the code which renders this unusable.
2 Answers
2
The regex too complicated error occurs when you reach 1,000,000 input sequences, which can occur even on incredibly small strings. For example, accord to the knowledge article:
Pattern pat = Pattern.compile('(A)?(B)?(C)?(D)?(E)?(F)?(G)?(H)?(I)?(J)?(K)?(L)?(M)?(N)?(O)?(P)?(Q)?(R)?(S)?(T)?(U)?(V)?(W)?(X)?(Y)?(Z)?(AA)?(AB)?(AC)?(AD)?(AE)?(AF)?(AG)?(AH)?(AI)?(AJ)?(AK)?(AL)?(AM)?(AN)?(AO)?(AP)?(AQ)?(AR)?(AS)?(AT)?(AU)?(AV)?(AW)?(AX)?(AY)?(AZ)?$');
Matcher mat = pat.matcher('asdfasdfasdfasdfasdfasdf');
This code would fail because of the complexity of the pattern. Realistically, any Apex code that tries to parse a CSV is doomed to fail if it uses any regex matching on any reasonable sized file. Your best bet is to implement a finite state parser to parse your CSV file (note: link goes to a Java-based finite state CSV parser, not written by me).
Because of the overhead of Apex, I wouldn't necessarily try to directly duplicate the Java-based version above, but hopefully it will get you started in a direction that will allow you to process larger CSV files more efficiently.
Your processing input loop can make use of a switch to implement a state engine efficiently:
public enum PARSER_STATE BEGIN_FIELD, ERROR, FOUND_QUOTE, QUOTED_FIELD, UNQUOTED_FIELD
Integer index = 0;
PARSER_STATE currentState = PARSER_STATE.BEGIN_FIELD;
while(index < input.size())
String charAtX = input.substring(index, index+1);
switch on currentState
when BEGIN_FIELD
...
when FOUND_QUOTE
...
...
@cropredy Yes, that's also viable. You just can't put a full csv through a single pattern.
– sfdcfox
Sep 17 '18 at 23:25
Not sure if this is the best approach but what I have done is a custom iterator with a batch class and split each line for processing (mentioned in my initial question).
This has allowed me to bypass the regex errors.
Thanks for contributing an answer to Salesforce Stack Exchange!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy
N.B. I can recall solving this using a custom iterator/iterable and a batch class - works if each row can be considered independent of each other
– cropredy
Sep 17 '18 at 22:42