How to speed up evaluation of expression (multiple variables) to gain speed? Compile?

How to speed up evaluation of expression (multiple variables) to gain speed? Compile?



I obtained some long expression and need to use them with a lot of different parameters inside a simulation. However, evaluation of the expressions is pretty slow, so that I would like to speed it up. The first (and only) thing that came to my mind was Compile. But, unfortunately, speed is not improved significantly.


Compile



This is an example of the shortest expression (unable to paste the larger ones because they exceed the limits on pastebin):


fun[tt_, sig_, time_, expon_, bShift_, lam_] = Uncompress@Import["https://pastebin.com/raw/nj83nT2b"];
funComp = Compile[tt, _Real, sig, _Real, time, _Real, expon, _Integer, bShift, _Real, lam, _Real, fun[tt, sig, time, expon, bShift, lam]];
fun[3.4, 10/4, 10, 3, 1.3456, Sqrt[2]] // AbsoluteTiming
(* 0.0836853, 0.045048 *)
funComp[3.4, 10/4, 10, 3, 1.3456, Sqrt[2]] // AbsoluteTiming
(* 0.0718691, 0.045048 *)



As you can see, Compile basically has no effect. The bigger issue is that the harmful expressions I am dealing with rather take 10s to evaluate and I'd like to see one or more orders of magnitude speed up (if possible). I hope that the small example here is essentially limited by the same effect as the bigger expressions, so that it serves as a proper toy example.


Compile



Background on fun: It essentially is composed of sums and products of Gaussians and their derivatives to higher orders (also powers of them).


fun



Is there a way to significantly speed up the evaluation process? Potentially by more advanced usage of Compile or some other trick? Currently this is a serious bottleneck in my simulations.


Compile





You haven't generated the expression for the function fun by hand, have you? Please give us the code that produces this expression. If it contains many Sum: That's great because the Sums can be more efficiently compiled as this humongous symbolic expression. Plus one might use vectorized code instead of Compile which may perform even better.
– Henrik Schumacher
Aug 23 at 9:13


fun


Sum


Sum


Compile





@HenrikSchumacher Sorry, with "obtained" I mean from external sources. The derivation does not explicitly contain Sum in the sense of using the corresponding MMA function. Indeed, the humongous expression was obtained "by hand" in the form of manually adding/multiplying Gaussians with certain parameters. Was "vectorized code" explicitly referring to a possible solution of Sums were used or is it a general idea?
– Lukas
Aug 23 at 9:19


Sum


Sum





Well, vectorization often boils down to vector-vector and matrix-vector products, so many sums can be replace by vectorized code. But also many built-in arithmetic operations are already vectorized (in particular, they have the attribute Listable). For example, if a is a list of machine-precision real or complex numbers (and a should also be a packed array), Exp[a] will be compute much faster than Exp /@ a or Table[ Exp[x],x,a]. And Total[Exp[a]] is mush faster than Sum[Exp[x], x, a].
– Henrik Schumacher
Aug 23 at 9:22



Listable


a


Exp[a]


Exp /@ a


Table[ Exp[x],x,a]


Total[Exp[a]]


Sum[Exp[x], x, a]





Hm. What is the external source? If it is wirtten in another programming language, maybe its code can be processed to a simple symbolic Mathematica expression (and a list of replacement rules for the constants).
– Henrik Schumacher
Aug 23 at 9:31





@HenrikSchumacher External source is someone who is working with me. I will get in touch to see if things can be boiled down so that vectorization may be used. But the answer you provide below already helps significantly, so a major step towards bearable evaluation times can be made!
– Lukas
Aug 23 at 10:04




1 Answer
1



The code for fun[tt, sig, time, expon, bShift, lam]] was not correctly inlined into the compiled function funComp. Most robust ways is to use With as follows. I also apply N to avoid some integer to double typecasts that would otherwise take place at runtime.


fun[tt, sig, time, expon, bShift, lam]]


funComp


With


N


funComp2 = With[code = N[fun[tt, sig, time, expon, bShift, lam]],
Compile[tt, _Real, sig, _Real, time, _Real, expon, _Integer, bShift, _Real, lam, _Real,
code,
CompilationTarget -> "C"
]
];



(Compilation takes longer now because now there happens a lot.)



Resulting timings on my machine:


fun[3.4, 10./4, 10., 3, 1.3456, Sqrt[2.]] // AbsoluteTiming
funComp[3.4, 10/4, 10, 3, 1.3456, Sqrt[2]] // AbsoluteTiming



0.072316, 0.045048



0.000197, 0.045048



PS. You might get a further minor performance boost by changing expon, _Integer to expon, _Real because there are still some integer to double typecasts left in the C code. But that was hardly measurable with the simple test I tried so far.


expon, _Integer


expon, _Real



PPS. You might probably want to apply this function to large lists of inputs. The the following way to compile the function (with options RuntimeAttributes -> Listable, Parallelization -> True) might be your friend:


RuntimeAttributes -> Listable, Parallelization -> True


funComp3 = With[code = N[fun[tt, sig, time, expon, bShift, lam]],
Compile[tt, _Real, sig, _Real, time, _Real, expon, _Real, bShift, _Real, lam, _Real,
code,
CompilationTarget -> "C",
RuntimeAttributes -> Listable,
Parallelization -> True
]
];



Here some timing examples.


a = RandomReal[-1, 1, 100];

r1 = fun[#, 10/4, 10, 3, 1.3456, Sqrt[2]] & /@ a; // RepeatedTiming // First
r2 = fun[a, 10/4, 10, 3, 1.3456, Sqrt[2]]; // RepeatedTiming // First
r3 = funComp3[#, 10/4, 10, 3, 1.3456, Sqrt[2]] & /@ a; // RepeatedTiming // First
r4 = funComp3[a, 10/4, 10, 3, 1.3456, Sqrt[2]]; // RepeatedTiming // First
errors = Max[Abs[r1 - r2]], Max[Abs[r2 - r3]], Max[Abs[r3 - r4]]



9.128



0.151



0.00040



0.000047



2.60209*10^-18, 6.07153*10^-18, 0.





This is brilliant! Thank you very much for the detailed analysis. It helps a lot!
– Lukas
Aug 23 at 10:02





You're welcome!
– Henrik Schumacher
Aug 23 at 10:11





Wow, that's quite an improvment! I am a little bit confused tho which function is which. Is the very first code block supposed to define funComp, not funComp2? Or the funComp is the OP's function, and you have two examples with timing results for two different funComp2, as in the 1st and 3rd code sections (they differ by Listable and Parallel)?
– kkm
Aug 29 at 7:25


funComp


funComp2


funComp


funComp2


Listable


Parallel





@kkm Well observed! I corrected that and also replaced AbsoluteTiming by RepeatedTiming. The latter takes much longer to compute a timing estimate but it is also much more accurate, in particular with very small timings. In order to clarify: The Parallelization->True gets only active if (i) the compiled function has also the attribute Listable and (ii) the function is also called in the appropriate way: For r3, funComp3 is Mapped over the input vector; this does not exploit the Listable property while r4 profits from both listability and (a bit) from parallelization.
– Henrik Schumacher
Aug 29 at 7:39


AbsoluteTiming


RepeatedTiming


Parallelization->True


Listable


r3


funComp3


Map


Listable


r4





It's clear now which is which, thanks! As for the parallel matrix ops, If MMA uses MKL automatic parallelization, then it depends a lot of CPU architecture, problem size, L3 cache, but the most on the present alignment of stars. I've experienced bizarre results with a pure C++ programs, like a VM (Hyper-V) benefitting from parallel computation greatly, being on the order of a few times faster, while its own host physical computer experiencing some degradation on the same problem. Same OS, same MKL version, same program and task, and VM computed faster than its host. Black magic ,no less!
– kkm
Aug 29 at 7:48







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌