XMM SSE2浮点指令
SSE2 (單指令多數據流擴展)浮點指令使用128位的XMM寄存器,可以處理雙精度(64位)浮點值。也有一些工作于單精度(32位)浮點值的指令。SSE2在Pentium 4 和 Xeon處理器中被提出。
這些指令跟SSE浮點指令非常類似,除了它們工作的數據長度不同。
在你的代碼中使用這些指令之前,你必須檢測你的機器是否支持它們。設置EAX=1,調用CPUID指令,此時測試EDX的第26位,如果為1則表示支持SSE2指令。
本文的測試程序都將使用以下的數據聲明:
DOUBLEFP1 DQ 1.1
????????? DQ 3.3
DOUBLEFP2 DQ 20.66
????????? DQ 40.66
DOUBLEFPN DQ -5.1
????????? DQ +6.3
?
由于這些數據不能保證16位對齊,所以從內存到XMM寄存器傳輸數據必須使用MOVUPD指令。MOVUPD(移動兩個未對齊的雙精度值)不關心對齊。如果你在數據聲明的時候指定了16字節對齊,那么就可以使用更快的MOVAPD(移動兩個對齊的雙精度值)指令。當在兩個寄存器之間傳輸的時候,MOVUPD或者MOVAPD都可以使用。
我們在這里看到的指令往往可以分為兩種類型,第一種指令一次處理兩個64位浮點數,這些指令的名字里包含“PD”,指的是“packed double-precision”。第二種指令一次處理一個64位浮點數,這些指令的名字里包含“SD”,指的是“scalar double-precision”。 它們僅僅工作在XMM寄存器的低位部分,也就是說寄存器的64位(0-63)。
下面的測試程序,你可以給它們設置適當的斷點,單步運行。你能看到在程序運行中XMM寄存器的改變。
SSE2指令
SSE2數據轉移指令
這個測試程序演示在寄存器之間移動數據。MOVUPD和MOVAPD(對齊版本),MOVSD,MOVLPD和MOVHPD也能被使用在內存輸入輸出中獲得數據。MOVMSKPD在比較指令后使用,可以把比較結果存入eax以便分析。
作為一個測試程序,我們也能嘗試使用SSE整數指令MOVDQU和SSE浮點指令MOVUPS做這些事,后者看起來很像MOVUPD。它似乎只是位拷貝數據到XMM寄存器。然而,Intel警告反對這種不明確方式的使用指令,以防止未知的性能問題。
XMMSSE2_FPDATA:-
XMMSSE2_FPDATA: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L20????? ??????????;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L20: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1]????? ;move two double precision fp values into XMM0 MOVAPD XMM7,XMM0???????????? ;copying to XMM7 MOVSD? XMM2,[DOUBLEFP2]????? ;move fp value to XMM1 low only MOVLPD XMM3,[DOUBLEFP2]????? ;this seems to be the same MOVHPD XMM4,[DOUBLEFP2]????? ;but this moves the high value MOVUPD XMM0,[DOUBLEFPN]????? ;move two new values, one is negative MOVMSKPD EAX,XMM0??????????? ;get both sign bits in XMM0 into eax ;************ and as an experiment, see if this does the same as MOVUPD .. MOVDQU XMM1,[DOUBLEFPN]????? ;use integer instruction to transfer the bits ;************ as this too (one byte smaller) .. MOVUPS XMM2,[DOUBLEFPN]????? ;use SSE instruction to transfer the bits RET?
SSE2數學運算指令
XMMSSE2_FPARITH: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L22??????????????? ;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L22: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0??????? ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1??????? ;copying to XMM3 ADDPD? XMM0,XMM1??????? ;add both fp values result in XMM0 MOVAPD XMM0,XMM2??????? ;restore value in XMM0 SUBPD? XMM0,XMM1??????? ;subtract both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 ADDSD? XMM0,XMM1??????? ;add low fp value result in XMM0 SUBSD? XMM0,XMM1??????? ;subtract low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MULPD? XMM0,XMM1??????? ;multiply both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MULSD? XMM0,XMM1??????? ;multiply low fp value result in XMM0 ;*******??????????????? MOVAPD XMM0,XMM2??????? ;restore value in XMM0 DIVPD? XMM0,XMM1??????? ;divide both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 DIVSD? XMM0,XMM1??????? ;divide low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 SQRTPD XMM0,XMM1??????? ;get square roots of both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 SQRTSD XMM0,XMM1??????? ;get square root of low fp value result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MAXPD XMM0,XMM1???????? ;get numerically greater fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MAXSD XMM0,XMM1???????? ;get numerically greater of low fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MINPD XMM0,XMM1???????? ;get numerically smaller fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 MINSD XMM0,XMM1???????? ;get numerically smaller of low fp values result in XMM0 RET?
SSE2邏輯運算指令
XMMSSE2_FPLOGIC: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L24??????????????? ;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L24: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0??????? ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1??????? ;copying to XMM3 ANDPD? XMM0,XMM1??????? ;perform AND on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 ANDNPD XMM0,XMM1??????? ;perform AND NOT on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 ORPD?? XMM0,XMM1??????? ;perform OR on both fp values result in XMM0 ;******* MOVAPD XMM0,XMM2??????? ;restore value in XMM0 XORPD? XMM0,XMM1??????? ;perform XOR on both fp values result in XMM0 RET?
SSE2比較指令
XMMSSE2_FPCOMP: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L26??????????????? ;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L26: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0??????? ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1??????? ;copying to XMM3 ;********************* compare instructions working on both fp values CMPPD XMM0,XMM1,0?????? ;=CMPEQPD see whether equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,1?????? ;=CMPLTPD see whether less than, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,2?????? ;=CMPLEPD see whether less than or equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,3?????? ;=CMPUNORDPD see unordered, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,4?????? ;=CMPNEQPD see whether not equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,5??? ???;=CMPNLTPD see whether not less than, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,6?????? ;=CMPNLEPD see whether not less than or equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPPD XMM0,XMM1,7?????? ;=CMPORDPD see whether ordered, result in XMM0 ;********************* compare instructions working on low value only MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,0?????? ;=CMPEQPD see whether equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,1?????? ;=CMPLTPD see whether less than, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,2?????? ;=CMPLEPD see whether less than or equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,3?????? ;=CMPUNORDPD see unordered, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,4?????? ;=CMPNEQPD see whether not equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,5?????? ;=CMPNLTPD see whether not less than, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,6?????? ;=CMPNLEPD see whether not less than or equal, result in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 CMPSD XMM0,XMM1,7?????? ;=CMPORDPD see whether ordered, result in XMM0 ;********************* compare and give result in eflags MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 COMISD XMM0,XMM1??????? ;look at lowest only result in eflags UCOMISD XMM0,XMM1?????? ;(unordered compare) MOVUPD XMM1,[DOUBLEFPN] ;move two -ve, two +ve values into XMM1 COMISD XMM0,XMM1??????? ;look at lowest only - result in eflags UCOMISD XMM0,XMM1?????? ;(unordered compare) RET?
SSE2亂序與擴展指令
XMMSSE2_SHUFF: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L28??? ????????????;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L28: ;***** display XMM registers in SSE2 mode .. MOVUPD XMM0,[DOUBLEFP1] ;move two double precision fp values into XMM0 MOVAPD XMM2,XMM0??????? ;copying to XMM2 MOVUPD XMM1,[DOUBLEFP2] ;move 2nd tester fp values into XMM1 MOVAPD XMM3,XMM1??????? ;copying to XMM3 SHUFPD XMM0,XMM1,3h???? ;shuffle pack into destination SHUFPD XMM0,XMM0,1h???? ;swap the values in XMM0 MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 UNPCKHPD XMM0,XMM1????? ;unpack (high) and put into destination MOVAPD XMM0,XMM2??????? ;restore original value to XMM0 UNPCKLPD XMM0,XMM0????? ;unpack (low) and put into destination RET?
SSE2轉換指令
XMMSSE2_CONV: MOV EAX,1?????????????? ;request CPU feature flags CPUID?????????????????? ;0Fh, 0A2h CPUID instruction TEST EDX,4000000h?????? ;test bit 26 (SSE2) JNZ >L30??????????????? ;SSE2 available CALL NOSSE2FPMESS?????? ;displays message if SSE2 not available RET L30: ;***** display XMM registers in both SSE and SSE2 modes .. ;***** conversion between single and double-precision fp values .. CVTPS2PD XMM0,[SINGLEFP1]? ;put single-precision fp values into XMM0 as double-precision CVTPD2PS XMM6,XMM0???????? ;convert double precision to single precision in XMM7 CVTSS2SD XMM1,[SINGLEFP1]? ;as CVTPS2PD but working with only one value CVTSD2SS XMM7,XMM1???????? ;as CVTSS2SD but working with only one value ;***** conversion between integers and double-precision fp values .. ;***** open the MMX integer pane for these tests .. CVTPD2PI MM0,XMM0????????? ;convert fp values in XMM0 to integers in MM0 CVTTPD2PI MM1,XMM0???????? ;same as above with truncation CVTPI2PD XMM0,[DINTEGER]?? ;convert 23 and 24 to double-precision fp values ;***** open the XMM integer display and switch to dword display CVTPD2DQ XMM7,XMM0???????? ;and convert 23 and 24 to dword integers into XMM7 (low) CVTTPD2DQ XMM7,XMM0??????? ;same as above with truncation CVTDQ2PD XMM3,XMM7???????? ;and back into fp values in XMM3 CVTSD2SI EAX,XMM0????????? ;take low fp value and convert as integer in EAX CVTTSD2SI EDX,XMM0???????? ;same as above with truncation CVTSI2SD XMM4,EAX????????? ;and back again into XMM4 (low) ;***** conversion between single-precision and integers .. ;***** watch these in XMM integer display switched to dword display CVTPS2DQ XMM0,[SINGLEFP1]? ;move 4 single-precision fp values to dwords as integers CVTTPS2DQ XMM1,[SINGLEFP1] ;same as above with truncation ;***** and watch this in the SSE fp pane .. CVTDQ2PS XMM6,XMM0???????? ;and convert back to 4 single-precision fp values CVTDQ2PS XMM7,XMM1???????? ;ditto RET參考《XMM SSE2 floating point instructions 》 http://www.godevtool.com/TestbugHelp/XMMfpins2.htm
總結
以上是生活随笔為你收集整理的XMM SSE2浮点指令的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java启动参数xmm_更快的方法来测试
- 下一篇: pytorch中的MSELoss函数