FLD instruction x64 bit

FLD instruction x64 bit



I have a little problem with FLD instruction in x64 bit ...
want to load Double value to the stack pointer FPU in st0 register, but it seem to be impossible.
In Delphi x32, I can use this code :


function DoSomething(X:Double):Double;
asm

FLD X
// Do Something ..
FST Result

end;



Unfortunately, in x64, the same code does not work.





Define "does not work". Does it crash? Does it not compile? Does it not return the expected result?
– Michael
Apr 3 '13 at 11:53





Did you read about Win64 compatibility in Delphi help ? They tell that there is not 10-bytes Extended type in Win64. And that shows that Delphi Win64 does not use FPU (x86). It uses SSE instead. Thus using FPU instructions is problematic. Also be careful when using BAsm x64 - there are bugs that destroy data or even inverse program control flow.
– Arioch 'The
Apr 3 '13 at 12:55



Extended





in x86_64 the FPU shouldn't be used unless you need extended precision. SSE is faster and more consistent in its results
– phuclv
Sep 4 at 6:11




3 Answers
3



In x64 mode floating point parameters are passed in xmm-registers. So when Delphi tries to compile FLD X, it becomes FLD xmm0 but there is no such instruction. You first need to move it to memory.



The same goes with the result, it should be passed back in xmm0.



Try this (not tested):


function DoSomething(X:Double):Double;
var
Temp : double;
asm
MOVQ qword ptr Temp,X
FLD Temp
//do something
FST Temp
MOVQ xmm0,qword ptr Temp
end;





>So when Delphi tries to compile FLD X, it becomes FLD XMM0 ... WHAT About this FLD Result !!! why the compiler accepet loading Result .. is this a bug !!
– SMP3
Apr 3 '13 at 13:41






@SMP3 : It turns out that when you do "FST Result" BASM allocates a temporary storage on stack for result and then adds a extra instruction at the end to load xmm0 with this value. I did not know that. See for yourself in disassembly view in debugger.
– Ville Krumlinde
Apr 3 '13 at 16:08





Is this a bug? No. On x64 use SSE and not x87. But you should stop doing asm and let the compiler do the work.
– David Heffernan
Apr 3 '13 at 17:13



Delphi inherite Microsoft x64 Calling Convention.
So if arguments of function/procedure are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L registers.



But you can use var before parameter as workaround like:


var


function DoSomething(var X:Double):Double;
asm
FLD qword ptr [X]
// Do Something ..
FST Result
end;





Nice workaround. Limitation though that you cannot pass constant literals such as DoSomething(1.0) or variables declared as Single.
– Ville Krumlinde
Apr 3 '13 at 16:06





@Ville Krumlinde: Indeed, if you need to call function with constant param than in section const first declare the constant. :)
– GJ.
Apr 3 '13 at 16:27


const



You don't need to use legacy x87 stack registers in x86-64 code, because SSE2 is baseline, a required part of the x86-64 ISA. You can and should do your scalar FP math using addsd, mulsd, sqrtsd and so on, on XMM registers. (Or addss for float)


addsd


mulsd


sqrtsd


addss



The Windows x64 calling convention passes float/double FP args in XMM0..3, if they're one of the first four args to the function. (i.e. the 3rd total arg goes in xmm2 if it's FP, rather than the 3rd FP arg going in xmm2.) It returns FP values in XMM0.



Only use x87 if you actually need 80-bit precision inside your function. (Instructions like fsin and fyl2x are not fast, and can usually be done just as well by normal math libraries using SSE/SSE2 instructions.


fsin


fyl2x


function times2(X:Double):Double;
asm
addsd xmm0, xmm0 // upper 8 bytes of XMM0 are ignored
ret
end



Storing to memory and reloading into an x87 register costs you about 10 cycles of latency for no benefit. SSE/SSE2 scalar instructions are just as fast, or faster, than their x87 equivalents, and easier to program for and optimize because you never need fxch; it's a flat register design instead of stack-based. (https://agner.org/optimize/). Also, you have 15 XMM registers.


fxch



Of course, you usually don't need inline asm at all. It could be useful for manually-vectorizing if the compiler doesn't do that for you.



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌