日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > java >内容正文

java

为什么 Java 中 2*(i*i) 比 2*i*i 更快?

發(fā)布時(shí)間:2024/4/11 java 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 为什么 Java 中 2*(i*i) 比 2*i*i 更快? 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

點(diǎn)擊上方“朱小廝的博客”,選擇“設(shè)為星標(biāo)”

回復(fù)”666“獲取公眾號(hào)專屬資料

有人在 Stack Overflow 上提問,為什么 Java 中的 ?2 * (i * i) ?比 ?2 * i * i ?要快?

他做了如下測(cè)試:

運(yùn)行下面這段Java代碼平均需要0.50到0.55秒:

public?static?void?main(String[]?args)?{long?startTime?=?System.nanoTime();int?n?=?0;for?(int?i?=?0;?i?<?1000000000;?i++)?{n?+=?2?*?(i?*?i);}System.out.println((double)?(System.nanoTime()?-?startTime)?/?1000000000?+?"?s");System.out.println("n?=?"?+?n); }

如果把2 *(i * i)替換成2 * i * i,那么運(yùn)行時(shí)間在0.60到0.65秒之間。為什么出現(xiàn)這樣的結(jié)果?

我把程序的每個(gè)版本運(yùn)行了15次,兩次之間交替運(yùn)行。結(jié)果如下:

2*(i*i)??|??2*i*i ----------+---------- 0.5183738?|?0.6246434 0.5298337?|?0.6049722 0.5308647?|?0.6603363 0.5133458?|?0.6243328 0.5003011?|?0.6541802 0.5366181?|?0.6312638 0.515149??|?0.6241105 0.5237389?|?0.627815 0.5249942?|?0.6114252 0.5641624?|?0.6781033 0.538412??|?0.6393969 0.5466744?|?0.6608845 0.531159??|?0.6201077 0.5048032?|?0.6511559 0.5232789?|?0.6544526

2 * i * i的最快運(yùn)行時(shí)間比2 * (i * i)最慢運(yùn)行時(shí)間還要長(zhǎng)。如果兩者效率相當(dāng),發(fā)生這種情況的可能性小于1/2^15 * 100% = 0.00305%。

來自 rustyx 的回答,獲得 1172 贊同

兩種方式的字節(jié)碼順序略有不同。

2 * (i * i):

?????iconst_2iload0iload0imulimuliadd

對(duì)比2 * i * i:

?????iconst_2iload0imuliload0imuliadd

乍看之下沒有什么不同,如果有的話,第二個(gè)版本看起來少了一個(gè)slot。

因此,需要更深入研究底層(JIT)。

請(qǐng)記住,對(duì)小循環(huán)JIT會(huì)主動(dòng)展開。對(duì)2 * (i * i)可以看到實(shí)際展開了16x:

030???B2:?#?B2?B3?<-?B1?B2??Loop:?B2-B2?inner?main?of?N18?Freq:?1e+006 030?????addl????R11,?RBP????#?int 033?????movl????RBP,?R13????#?spill 036?????addl????RBP,?#14????#?int 039?????imull???RBP,?RBP????#?int 03c?????movl????R9,?R13?#?spill 03f?????addl????R9,?#13?#?int 043?????imull???R9,?R9??#?int 047?????sall????RBP,?#1 049?????sall????R9,?#1 04c?????movl????R8,?R13?#?spill 04f?????addl????R8,?#15?#?int 053?????movl????R10,?R8?#?spill 056?????movdl???XMM1,?R8????#?spill 05b?????imull???R10,?R8?#?int 05f?????movl????R8,?R13?#?spill 062?????addl????R8,?#12?#?int 066?????imull???R8,?R8??#?int 06a?????sall????R10,?#1 06d?????movl????[rsp?+?#32],?R10????#?spill 072?????sall????R8,?#1 075?????movl????RBX,?R13????#?spill 078?????addl????RBX,?#11????#?int 07b?????imull???RBX,?RBX????#?int 07e?????movl????RCX,?R13????#?spill 081?????addl????RCX,?#10????#?int 084?????imull???RCX,?RCX????#?int 087?????sall????RBX,?#1 089?????sall????RCX,?#1 08b?????movl????RDX,?R13????#?spill 08e?????addl????RDX,?#8?#?int 091?????imull???RDX,?RDX????#?int 094?????movl????RDI,?R13????#?spill 097?????addl????RDI,?#7?#?int 09a?????imull???RDI,?RDI????#?int 09d?????sall????RDX,?#1 09f?????sall????RDI,?#1 0a1?????movl????RAX,?R13????#?spill 0a4?????addl????RAX,?#6?#?int 0a7?????imull???RAX,?RAX????#?int 0aa?????movl????RSI,?R13????#?spill 0ad?????addl????RSI,?#4?#?int 0b0?????imull???RSI,?RSI????#?int 0b3?????sall????RAX,?#1 0b5?????sall????RSI,?#1 0b7?????movl????R10,?R13????#?spill 0ba?????addl????R10,?#2?#?int 0be?????imull???R10,?R10????#?int 0c2?????movl????R14,?R13????#?spill 0c5?????incl????R14?#?int 0c8?????imull???R14,?R14????#?int 0cc?????sall????R10,?#1 0cf?????sall????R14,?#1 0d2?????addl????R14,?R11????#?int 0d5?????addl????R14,?R10????#?int 0d8?????movl????R10,?R13????#?spill 0db?????addl????R10,?#3?#?int 0df?????imull???R10,?R10????#?int 0e3?????movl????R11,?R13????#?spill 0e6?????addl????R11,?#5?#?int 0ea?????imull???R11,?R11????#?int 0ee?????sall????R10,?#1 0f1?????addl????R10,?R14????#?int 0f4?????addl????R10,?RSI????#?int 0f7?????sall????R11,?#1 0fa?????addl????R11,?R10????#?int 0fd?????addl????R11,?RAX????#?int 100?????addl????R11,?RDI????#?int 103?????addl????R11,?RDX????#?int 106?????movl????R10,?R13????#?spill 109?????addl????R10,?#9?#?int 10d?????imull???R10,?R10????#?int 111?????sall????R10,?#1 114?????addl????R10,?R11????#?int 117?????addl????R10,?RCX????#?int 11a?????addl????R10,?RBX????#?int 11d?????addl????R10,?R8?#?int 120?????addl????R9,?R10?#?int 123?????addl????RBP,?R9?#?int 126?????addl????RBP,?[RSP?+?#32?(32-bit)]???#?int 12a?????addl????R13,?#16????#?int 12e?????movl????R11,?R13????#?spill 131?????imull???R11,?R13????#?int 135?????sall????R11,?#1 138?????cmpl????R13,?#999999985 13f?????jl?????B2???#?loop?end??P=1.000000?C=6554623.000000

從上面的代碼可以看到,有1個(gè)寄存器被“spill”到了整個(gè)堆棧。

對(duì)于2 * i * i版本:

05a???B3:?#?B2?B4?<-?B1?B2??Loop:?B3-B2?inner?main?of?N18?Freq:?1e+006 05a?????addl????RBX,?R11????#?int 05d?????movl????[rsp?+?#32],?RBX????#?spill 061?????movl????R11,?R8?#?spill 064?????addl????R11,?#15????#?int 068?????movl????[rsp?+?#36],?R11????#?spill 06d?????movl????R11,?R8?#?spill 070?????addl????R11,?#14????#?int 074?????movl????R10,?R9?#?spill 077?????addl????R10,?#16????#?int 07b?????movdl???XMM2,?R10???#?spill 080?????movl????RCX,?R9?#?spill 083?????addl????RCX,?#14????#?int 086?????movdl???XMM1,?RCX???#?spill 08a?????movl????R10,?R9?#?spill 08d?????addl????R10,?#12????#?int 091?????movdl???XMM4,?R10???#?spill 096?????movl????RCX,?R9?#?spill 099?????addl????RCX,?#10????#?int 09c?????movdl???XMM6,?RCX???#?spill 0a0?????movl????RBX,?R9?#?spill 0a3?????addl????RBX,?#8?#?int 0a6?????movl????RCX,?R9?#?spill 0a9?????addl????RCX,?#6?#?int 0ac?????movl????RDX,?R9?#?spill 0af?????addl????RDX,?#4?#?int 0b2?????addl????R9,?#2??#?int 0b6?????movl????R10,?R14????#?spill 0b9?????addl????R10,?#22????#?int 0bd?????movdl???XMM3,?R10???#?spill 0c2?????movl????RDI,?R14????#?spill 0c5?????addl????RDI,?#20????#?int 0c8?????movl????RAX,?R14????#?spill 0cb?????addl????RAX,?#32????#?int 0ce?????movl????RSI,?R14????#?spill 0d1?????addl????RSI,?#18????#?int 0d4?????movl????R13,?R14????#?spill 0d7?????addl????R13,?#24????#?int 0db?????movl????R10,?R14????#?spill 0de?????addl????R10,?#26????#?int 0e2?????movl????[rsp?+?#40],?R10????#?spill 0e7?????movl????RBP,?R14????#?spill 0ea?????addl????RBP,?#28????#?int 0ed?????imull???RBP,?R11????#?int 0f1?????addl????R14,?#30????#?int 0f5?????imull???R14,?[RSP?+?#36?(32-bit)]???#?int 0fb?????movl????R10,?R8?#?spill 0fe?????addl????R10,?#11????#?int 102?????movdl???R11,?XMM3???#?spill 107?????imull???R11,?R10????#?int 10b?????movl????[rsp?+?#44],?R11????#?spill 110?????movl????R10,?R8?#?spill 113?????addl????R10,?#10????#?int 117?????imull???RDI,?R10????#?int 11b?????movl????R11,?R8?#?spill 11e?????addl????R11,?#8?#?int 122?????movdl???R10,?XMM2???#?spill 127?????imull???R10,?R11????#?int 12b?????movl????[rsp?+?#48],?R10????#?spill 130?????movl????R10,?R8?#?spill 133?????addl????R10,?#7?#?int 137?????movdl???R11,?XMM1???#?spill 13c?????imull???R11,?R10????#?int 140?????movl????[rsp?+?#52],?R11????#?spill 145?????movl????R11,?R8?#?spill 148?????addl????R11,?#6?#?int 14c?????movdl???R10,?XMM4???#?spill 151?????imull???R10,?R11????#?int 155?????movl????[rsp?+?#56],?R10????#?spill 15a?????movl????R10,?R8?#?spill 15d?????addl????R10,?#5?#?int 161?????movdl???R11,?XMM6???#?spill 166?????imull???R11,?R10????#?int 16a?????movl????[rsp?+?#60],?R11????#?spill 16f?????movl????R11,?R8?#?spill 172?????addl????R11,?#4?#?int 176?????imull???RBX,?R11????#?int 17a?????movl????R11,?R8?#?spill 17d?????addl????R11,?#3?#?int 181?????imull???RCX,?R11????#?int 185?????movl????R10,?R8?#?spill 188?????addl????R10,?#2?#?int 18c?????imull???RDX,?R10????#?int 190?????movl????R11,?R8?#?spill 193?????incl????R11?#?int 196?????imull???R9,?R11?#?int 19a?????addl????R9,?[RSP?+?#32?(32-bit)]????#?int 19f?????addl????R9,?RDX?#?int 1a2?????addl????R9,?RCX?#?int 1a5?????addl????R9,?RBX?#?int 1a8?????addl????R9,?[RSP?+?#60?(32-bit)]????#?int 1ad?????addl????R9,?[RSP?+?#56?(32-bit)]????#?int 1b2?????addl????R9,?[RSP?+?#52?(32-bit)]????#?int 1b7?????addl????R9,?[RSP?+?#48?(32-bit)]????#?int 1bc?????movl????R10,?R8?#?spill 1bf?????addl????R10,?#9?#?int 1c3?????imull???R10,?RSI????#?int 1c7?????addl????R10,?R9?#?int 1ca?????addl????R10,?RDI????#?int 1cd?????addl????R10,?[RSP?+?#44?(32-bit)]???#?int 1d2?????movl????R11,?R8?#?spill 1d5?????addl????R11,?#12????#?int 1d9?????imull???R13,?R11????#?int 1dd?????addl????R13,?R10????#?int 1e0?????movl????R10,?R8?#?spill 1e3?????addl????R10,?#13????#?int 1e7?????imull???R10,?[RSP?+?#40?(32-bit)]???#?int 1ed?????addl????R10,?R13????#?int 1f0?????addl????RBP,?R10????#?int 1f3?????addl????R14,?RBP????#?int 1f6?????movl????R10,?R8?#?spill 1f9?????addl????R10,?#16????#?int 1fd?????cmpl????R10,?#999999985 204?????jl?????B2???#?loop?end??P=1.000000?C=7419903.000000

出于保存中間結(jié)果的需要,這里出現(xiàn)了更多的“spill”及堆棧[RSP + …]訪問。

問題的答案很簡(jiǎn)單:2 *(i * i)比2 * i * i更快,因?yàn)獒槍?duì)前者JIT生成的匯編代碼更優(yōu)化。

但是,顯然這兩個(gè)版本都不夠好。由于x86-64 CPU都至少支持SSE2,因此循環(huán)可以從向量化中受益。

因此,這是optimizer的問題:通常循環(huán)過度展開會(huì)帶來問題,錯(cuò)失其他優(yōu)化機(jī)會(huì)。

實(shí)際上,現(xiàn)代x86-64 CPU會(huì)把指令進(jìn)一步細(xì)分為微操作(μops)。循環(huán)優(yōu)化可以借助寄存器重命名、μop緩存和循環(huán)緩沖區(qū)等眾多特性,而不是僅僅做一次展開。根據(jù)Agner Fog的優(yōu)化指南:

如果平均指令長(zhǎng)度超過4字節(jié),由于μop緩存而導(dǎo)致的性能提升會(huì)非常可觀。可以考慮下列方法優(yōu)化μop緩存:

  • 確保關(guān)鍵循環(huán)足夠小以適應(yīng)μop緩存。

  • 將最關(guān)鍵的循環(huán)條目和功能條目以32對(duì)齊。

  • 避免不必要的循環(huán)展開。

  • 避免使用需要額外加載時(shí)間的指令:..

考慮到加載時(shí)間:即使命中最快的L1D也要花費(fèi)4個(gè)周期,需要一個(gè)額外的寄存器和μop。只要對(duì)存儲(chǔ)器訪問,哪怕幾次也會(huì)損害循環(huán)的性能。

再考慮矢量化方案:要了解優(yōu)化能達(dá)到多快,可以使用GCC編譯類似的C應(yīng)用程序,直接對(duì)其進(jìn)行矢量化(下面展示了AVX2、SSE2結(jié)果):

??vmovdqa?ymm0,?YMMWORD?PTR?.LC0[rip]vmovdqa?ymm3,?YMMWORD?PTR?.LC1[rip]xor?eax,?eaxvpxor?xmm2,?xmm2,?xmm2 .L2:vpmulld?ymm1,?ymm0,?ymm0inc?eaxvpaddd?ymm0,?ymm0,?ymm3vpslld?ymm1,?ymm1,?1vpaddd?ymm2,?ymm2,?ymm1cmp?eax,?125000000??????;?8?calculations?per?iterationjne?.L2vmovdqa?xmm0,?xmm2vextracti128?xmm2,?ymm2,?1vpaddd?xmm2,?xmm0,?xmm2vpsrldq?xmm0,?xmm2,?8vpaddd?xmm0,?xmm2,?xmm0vpsrldq?xmm1,?xmm0,?4vpaddd?xmm0,?xmm0,?xmm1vmovd?eax,?xmm0vzeroupper

運(yùn)行時(shí)間:

  • SSE:0.24 s,大約快2倍。

  • AVX:0.15 s,大約快3倍。

  • AVX2:0.08 s,大約快5倍。

要輸出JIT生成的程序集,請(qǐng)獲取JVM調(diào)試版本,并使用-XX:+ PrintOptoAssembly運(yùn)行。

C程序版本使用-fwrapv標(biāo)志進(jìn)行編譯,該標(biāo)志使GCC可以將帶符號(hào)整數(shù)溢出視為二進(jìn)制補(bǔ)碼。

翻譯: 唐尤華stackoverflow.com/questions/53452713/why-is-2-i-i-faster-than-2-i-i-in-java

想知道更多?描下面的二維碼關(guān)注我


怎么加群?:

怎么免費(fèi)加入知識(shí)星球:

免費(fèi)資料入口:后臺(tái)回復(fù)“666”

掘金小冊(cè)支付通道開啟,全場(chǎng)8折優(yōu)惠。有需要購買電子版的《深入理解Kafka》的同學(xué)可以在公眾號(hào)后臺(tái)回復(fù):【電子版】 或者 【Kafka】獲取海報(bào)購買。

感謝閱讀

朕已閱?

總結(jié)

以上是生活随笔為你收集整理的为什么 Java 中 2*(i*i) 比 2*i*i 更快?的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。