日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > linux >内容正文

linux

linux perf - 性能测试和优化工具

發布時間:2025/5/22 linux 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 linux perf - 性能测试和优化工具 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Perf簡介

Perf是Linux kernel自帶的系統性能優化工具。雖然它的版本還只是0.0.2,Perf已經顯現出它強大的實力,足以與目前Linux流行的OProfile相媲美了。

Perf 的優勢在于與Linux Kernel的緊密結合,它可以最先應用到加入Kernel的new feature。而像OProfile, GProf等通常會“慢一拍”。Perf的基本原理跟OProfile等類似,也是在CPU的PMU registers中Get/Set performance counters來獲得諸如instructions executed, cache-missed suffered, branches mispredicted等信息。Linux kernel對這些registers進行了一系列抽象,所以你可以按進程,按CPU或者按counter group等不同類別來查看Sample信息。

?

使用Perf

Perf的使用流程和OProfile很像。所以如果你會用OProfile的話,用Perf就很簡單。這里只是簡單翻譯一下在[1]中的Perf examples中舉的例子。有更多發現的話以后會繼續更新。

$?perf record -f -- git gc???Counting objects: 1283571, done. Compressing objects: 100% (206724/206724), done. Writing objects: 100% (1283571/1283571), done. Total 1283571 (delta 1070675), reused 1281443 (delta 1068566) [ perf record: Captured and wrote 31.054 MB perf.data (~1356768 samples) ] ? $?perf report --sort comm,dso,symbol?| head -10 # Samples: 1355726 # # Overhead Command Shared Object Symbol # ........ ............... ....................................... ...... # 31.53% git /usr/bin/git [.] 0x0000000009804f 13.41% git-prune /usr/bin/git-prune [.] 0x000000000ad06d 10.05% git /lib/tls/i686/cmov/libc-2.8.90.so [.] _nl_make_l10nflist 5.36% git-prune /usr/lib/libz.so.1.2.3.3 [.] 0x00000000009d51 4.48% git /lib/tls/i686/cmov/libc-2.8.90.so [.] memcpy

perf record相當于opcontrol –-start, 而perf report相當于opreport.

Perf用例

查看所有可用的counters用'perf list’:

titan:~> perf list [...] kmem:kmalloc [Tracepoint event] kmem:kmem_cache_alloc [Tracepoint event] kmem:kmalloc_node [Tracepoint event] kmem:kmem_cache_alloc_node [Tracepoint event] kmem:kfree [Tracepoint event] kmem:kmem_cache_free [Tracepoint event] kmem:mm_page_free_direct [Tracepoint event] kmem:mm_pagevec_free [Tracepoint event] kmem:mm_page_alloc [Tracepoint event] kmem:mm_page_alloc_zone_locked [Tracepoint event] kmem:mm_page_pcpu_drain [Tracepoint event] kmem:mm_page_alloc_extfrag [Tracepoint event] 你可以用以上counter的任意組合來跑你的測試程序。比如,用以下命令來看跑 hackbench時page alloc/free的次數。 titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct ./hackbench 10 Time: 0.575 Performance counter stats for './hackbench 10': 13857 kmem:mm_page_pcpu_drain 27576 kmem:mm_page_alloc 6025 kmem:mm_pagevec_free 20934 kmem:mm_page_free_direct 0.613972165 seconds time elapsed Perf可以幫你統計N次結果的數值波動情況: titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct ./hackbench 10 Time: 0.627 Time: 0.644 Time: 0.564 Time: 0.559 Time: 0.626 Performance counter stats for './hackbench 10' (5 runs): 12920 kmem:mm_page_pcpu_drain ( +- 3.359% ) 25035 kmem:mm_page_alloc ( +- 3.783% ) 6104 kmem:mm_pagevec_free ( +- 0.934% ) 18376 kmem:mm_page_free_direct ( +- 4.941% ) 0.643954516 seconds time elapsed ( +- 2.363% ) 有了以上的統計數據,你可以開始sample某一個你關心的tracepoint(比如page allocations): titan:~/git> perf record -f -e kmem:mm_page_alloc -c 1 ./git gc Counting objects: 1148, done. Delta compression using up to 2 threads. Compressing objects: 100% (450/450), done. Writing objects: 100% (1148/1148), done. Total 1148 (delta 690), reused 1148 (delta 690) [ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ] 查看哪個function引起了page allocations: titan:~/git> perf report # Samples: 10646 # # Overhead Command Shared Object # ........ ............... .......................... # 23.57% git-repack /lib64/libc-2.5.so 21.81% git /lib64/libc-2.5.so 14.59% git ./git 11.79% git-repack ./git 7.12% git /lib64/ld-2.5.so 3.16% git-repack /lib64/libpthread-2.5.so 2.09% git-repack /bin/bash 1.97% rm /lib64/libc-2.5.so 1.39% mv /lib64/ld-2.5.so 1.37% mv /lib64/libc-2.5.so 1.12% git-repack /lib64/ld-2.5.so 0.95% rm /lib64/ld-2.5.so 0.90% git-update-serv /lib64/libc-2.5.so 0.73% git-update-serv /lib64/ld-2.5.so 0.68% perf /lib64/libpthread-2.5.so 0.64% git-repack /usr/lib64/libz.so.1.2.3

更進一步的查看:

titan:~/git> perf report --sort comm,dso,symbol

# Samples: 10646 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...... # 9.35% git-repack ./git [.] insert_obj_hash 9.12% git ./git [.] insert_obj_hash 7.31% git /lib64/libc-2.5.so [.] memcpy 6.34% git-repack /lib64/libc-2.5.so [.] _int_malloc 6.24% git-repack /lib64/libc-2.5.so [.] memcpy 5.82% git-repack /lib64/libc-2.5.so [.] __GI___fork 5.47% git /lib64/libc-2.5.so [.] _int_malloc 2.99% git /lib64/libc-2.5.so [.] memset ? 同時,call-graph(函數調用圖)也可以被記錄下來,并且能告訴你每個函數所占用的百分比。 titan:~/git> perf record -f -g -e kmem:mm_page_alloc -c 1 ./git gc Counting objects: 1148, done. Delta compression using up to 2 threads. Compressing objects: 100% (450/450), done. Writing objects: 100% (1148/1148), done. Total 1148 (delta 690), reused 1148 (delta 690) [ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ] titan:~/git> perf report -g # Samples: 10686 # # Overhead Command Shared Object # ........ ............... .......................... # 23.25% git-repack /lib64/libc-2.5.so | |--50.00%-- _int_free | |--37.50%-- __GI___fork | make_child | |--12.50%-- ptmalloc_unlock_all2 | make_child | --6.25%-- __GI_strcpy 21.61% git /lib64/libc-2.5.so | |--30.00%-- __GI_read | | | --83.33%-- git_config_from_file | git_config | | [...] 用以下命令可以查看整個系統10秒內的page allocation次數:

titan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct sleep 10

Performance counter stats for 'sleep 10': 171585 kmem:mm_page_pcpu_drain 322114 kmem:mm_page_alloc 73623 kmem:mm_pagevec_free 254115 kmem:mm_page_free_direct 10.000591410 seconds time elapsed

用以下命令查看每隔1秒,系統page allocation的波動狀況:

titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct sleep 1 Performance counter stats for 'sleep 1' (10 runs): 17254 kmem:mm_page_pcpu_drain ( +- 3.709% ) 34394 kmem:mm_page_alloc ( +- 4.617% ) 7509 kmem:mm_pagevec_free ( +- 4.820% ) 25653 kmem:mm_page_free_direct ( +- 3.672% ) 1.058135029 seconds time elapsed ( +- 3.089% )

通過反匯編往往能找出是哪行代碼生成的指令會引起問題。

titan:~/git> perf annotate __GI___fork ------------------------------------------------ Percent | Source code & Disassembly of libc-2.5.so ------------------------------------------------??:??:??: Disassembly of section .plt:??: Disassembly of section .text:??:?: 00000031a2e95560 <__fork>: [...] 0.00 : 31a2e95602: b8 38 00 00 00 mov $0x38,�x 0.00 : 31a2e95607: 0f 05 syscall 83.42 : 31a2e95609: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 0.00 : 31a2e9560f: 0f 87 4d 01 00 00 ja 31a2e95762 <__fork+0x202> 0.00 : 31a2e95615: 85 c0 test??�x,�x 以上結果顯示__GI__forks的83.42%的時間來源于0x38的系統調用。

值得優化某個特定的函數嗎?

你也許想知道是否值得去優化你程序中的某個特定函數。一個很好的例子是git mailing list中討論的關于SHA1 哈希算法優化的問題,我們可以用perf來預判優化的結果。具體參見Linus的回信[2].

"perf report --sort comm,dso,symbol" profiling shows the following for 'git fsck --full' on the kernel repo, using the Mozilla SHA1: 47.69% git /home/torvalds/git/git [.] moz_SHA1_Update 22.98% git /lib64/libz.so.1.2.3 [.] inflate_fast 7.32% git /lib64/libc-2.10.1.so [.] __GI_memcpy 4.66% git /lib64/libz.so.1.2.3 [.] inflate 3.76% git /lib64/libz.so.1.2.3 [.] adler32 2.86% git /lib64/libz.so.1.2.3 [.] inflate_table 2.41% git /home/torvalds/git/git [.] lookup_object 1.31% git /lib64/libc-2.10.1.so [.] _int_malloc 0.84% git /home/torvalds/git/git [.] patch_delta 0.78% git [kernel] [k] hpet_next_event

很明顯,SHA1加密算法的性能在這里很關鍵。

如何測量latency

如果你在build kernel時enabled了

CONFIG_PERF_COUNTER=y CONFIG_EVENT_TRACING=y

那你可以加-tip參數來使用幾個新的performance counter來測量scheduler的lantencies。

perf stat -e sched:sched_stat_wait -e task-clock ./hackbench 20

以上命令能夠得出等待CPU用了多少時間。你可以重復10次這樣的操作:

aldebaran:/home/mingo> perf stat --repeat 10 -e / sched:sched_stat_wait:r -e task-clock ./hackbench 20 Time: 0.251 Time: 0.214 Time: 0.254 Time: 0.278 Time: 0.245 Time: 0.308 Time: 0.242 Time: 0.222 Time: 0.268 Time: 0.244 Performance counter stats for './hackbench 20' (10 runs): 59826 sched:sched_stat_wait # 0.026 M/sec ( +- 5.540% ) 2280.099643 task-clock-msecs # 7.525 CPUs ( +- 1.620% ) 0.303013390 seconds time elapsed ( +- 3.189% ) 讀取scheduling的events counter # perf list 2>&1 | grep sched: sched:sched_kthread_stop [Tracepoint event] sched:sched_kthread_stop_ret [Tracepoint event] sched:sched_wait_task [Tracepoint event] sched:sched_wakeup [Tracepoint event] sched:sched_wakeup_new [Tracepoint event] sched:sched_switch [Tracepoint event] sched:sched_migrate_task [Tracepoint event] sched:sched_process_free [Tracepoint event] sched:sched_process_exit [Tracepoint event] sched:sched_process_wait [Tracepoint event] sched:sched_process_fork [Tracepoint event] sched:sched_signal_send [Tracepoint event] sched:sched_stat_wait [Tracepoint event] sched:sched_stat_sleep [Tracepoint event] sched:sched_stat_iowait [Tracepoint event]

對于latency analysis而言,stat_wait/sleep/iowait是值得注意的event。如果你想看所有delays和它們的mix/max/avg,你可以:

perf record -e sched:sched_stat_wait:r -f -R -c 1 ./hackbench 20 perf trace

perf stats for doing nothing http://blog.csdn.net/bluebeach/article/details/5912062

Perf stats for "doing nothing"

I've recently discovered the?perf?Linux tool. I heard that oprofile was deprecated and that there is a new tool, and I noted down to try it sometime.

Updated: more languages, fixed typos, more details, some graphs. Apologies if this shows twice in your feed.

The problem with perf stats is that I?hate?bloat, or even perceived bloat. Even when it doesn't affect me in any way, the concept of wasted cycles makes me really sad.

You probably can guess where this is going… I said, well, let's see what perf says about a simple "null" program. Surely doing nothing should be just a small number of instructions, right?

Note: I think that perf also records kernel-side code, because the lowest I could get was about ~50K instructions for starting a null program in assembler that doesn't use libc and just executes the?syscall?asm instruction. However, these ~50K instructions are noise the moment you start to use more high-level languages. Yes, this is expected, but the I was still shocked. And there's lots of delta between languages I'd expected to behave somewhat identical.

Again, this is not important in the real world. At all. They are just numbers, and probably the noise (due to short runtime) has lots of influence on the resulting numbers. And I might have screwed up the measurements somehow.

Test setup

Each program was the equivalent of 'exit 0' in the appropriate form for the language. During the measurements, the machine was as much as possible idle (single-user mode, measurements run at real-time priority, etc.). For compiled languages,?-O2?was used. For scripts, a simple?#!/path/to/interpreter?(without options, except in the case of Python, see below) was used. Each program/script was run 500 times (perf's?-r 500) and I've checked that the variations were small (±0.80% on the metrics I used).

You can find all the programs I've used at?http://git.k1024.org/perf-null.git/, the current tests are for the tag version?perf-null-0.1.

The raw data for the below tables/graphs is at?log-4.

Results

Compiled languages

LanguageCyclesInstructions
asm63K51K
c-dietlibc74K57K
c-libc-static177K107K
c-libc-shared506K300K
c++-static178K107K
c++-dynamic1,750K1,675K
haskell-single2,229K1,338K
haskell-threaded2,629K1,522K
ocaml-bytecode3,271K2,741K
ocaml-native1,042K666K

Going from dietlibc to glibc doubles the number of instructions, and for libc going from static to dynamic linking again roughly doubles it. I didn't manage to compile a program dynamically-linked against dietlibc.

C++ is interesting. Linked statically, it is in the same ballpark as C, but when linked dynamically, it executes an order of magnitude??more instructions. I would guess that the initialisation of the standard C++ library is complex?

Haskell, which has a GC and quite a complex runtime, executes slightly less instructions than C++, but uses more cycles. Not bad, given the capabilities of the runtime. The two versions of the Haskell program are with the single-threaded runtime and with the multi-threaded one; not much difference. A fully statically-linked Haskell binary (not recommended usually) goes below 1M instructions, but not by much.

OCaml is a very nice surprise. The bytecode runtime is a bit slow to startup, but the (native) compiled version is quite fast to start: only 2× number of instructions and cycles compared to C, for an advanced language. And twice as fast as Haskell ?. Nice!

Shells

LanguageCyclesInstructions
dash766K469K
bash1,680K1,044K
mksh1,258K942K
mksh-static504K322K

So, dash takes ~470K instructions to start, which is way below the C++ count and a bit higher than the C one. Hence, I'd guess that dash is implemented in C ?.

Next, bash is indeed slower on startup than dash, and by slightly more than 2× (both instructions and cycles). So yes, switching?/bin/sh?from bash to dash makes sense.

I wasn't aware of?mksh, so thanks for the comments. It is, in the static variant, more efficient that dash, by about 1.5×. However, the dynamically linked version doesn't look too great (dash is also dynamically linked; I would guess a statically-linked dash "beats" mksh-static).

Text processing

I've added perl here (even though it's a 'full' language) just for comparison; it's also in the next section.

LanguageCyclesInstructions
mawk849K514K
gawk1,363K980K
perl2,946K2,213K

A normal spread. I knew the reason why mawk is?Priority: required?is that it's faster than gawk, but I wouldn't have guessed it's almost twice as fast.

Interpreted languages

Here is where the fun starts…

LanguageCyclesInstructions
lua 5.11,947K1,485K
lua 5.21,724K1,335K
lua jit1,209K803K
perl2,946K2,213K
tcl 8.45,011K4,552K
tcl 8.56,888K6,022K
tcl 8.68,196K7,236K
ruby 1.87,013K6,128K
ruby 1.9.335,870K35,022K
python 2.6 -S11,752K10,247K
python 2.7 -S11,438K10,198K
python 3.2 -S29,003K27,409K
pypy -S21,106K10,036K
python 2.625,143K21,989K
python 2.747,325K50,217K
python 2.7 -O47,341K50,185K
python 3.2113,567K124,133K
python 3.2 -O113,424K124,133K
pypy90,779K68,455K

The numbers here are not quite what I expected. There's a huge delta between the fastest (hi Lua!) and the slowest (bye Python!).

I wasn't familiar with Lua, so I tested it thanks to the comments. It is, I think, the only language which actually improves from one version to the next (bonus points), and where the JIT version also make is faster. In context, lua jit starts faster than C++.

Perl is the one that goes above C++'s instructions count, but not by much. From the point of view of the system, a Perl 'hello world' is only about 1.3×-1.6x slower than a C++ one. Not bad, not bad.

Next category is composed of TCL and Ruby, both of which had older versions 2-3× slower than Perl, but whose most recent versions are even more slower. TCL has an almost constant slowdown across versions (5M, 6.9M, 8.2M cycles), but Ruby seems to have taken a significant step backwards: 1.9.3 is 5× slower than 1.8. I wonder why? As for TCL, I didn't expect it to be slower to startup than Perl; good to know.

Last category is Python. Oh my. If you run?perf stat python -c 'pass'?you get some unbelievable numbers, like 50M instructions to do, well, nothing. Yes, it has a GC, yes, it does import modules at runtime, but still… On closer investigation, the?sitemodule and the imports it does do eat a lot of time. Running a simpler?python -S?brings it back to a more reasonable 10M instructions, which is in-line with the other interpreted languages.

However, even with the -S taken into account, Python also slows down across versions: a tiny improvement from 2.6 to 2.7, but (like Ruby) a 3× slowdown from 2.7 to 3.2. Trying the “optimised” version (-O) doesn't help at all. Trying pypy, which was based on Python 2.7, makes it around 2× slower to startup (both with and without?-S).

So in the interpreted languages, it seems only Lua is trying to improve, the rest of the languages are piling up bloat with every version. Note: I should have tried multiple perl versions too.

Java

Java is in its own category; you guess why ?, right?

GCJ was version 4.6, whereas by?java?below I mean?OpenJDK Runtime Environment (IcedTea6 1.11) (6b24-1.11-4).

LanguageCyclesInstructions
null-gcj97,156K74,576K
java -jamvm85,535K80,102K
java -server147,174K136,803K
java -zero132,967K124,977K
java -cacao229,799K205,312K

Using gcj to compile to “native code” (not sure whether that's native-native or something else) results in a binary that uses less than 100M cycles to start, but the jamvm VM is faster than that (85M cycles). Not bad for java! Python 3.2 is slower to startup—yes, I think the world has gone crazy.

However, the other VMs are a few times slower: server (the default one) is ~150M cycles, and cacao is ~230M cycles. Wow.

The other thing about java is that it was the only one that couldn't be put nicely in a file that you just ‘exec’ (there is?binfmt_misc?indeed, but that doesn't allow different Java classes to use different Java VMs, so I don't count this), as opposed to every single other thing I tested here. Someone didn't grow on Unix?

Comparative analysis

Since there are almost 4 orders of magnitude difference between all the things tested here, a graph of cycles or instructions is not really useful. However, cycles/instruction, branches percentage and branches miss-predicted percentage can be. Hence first the cycles/instructions:

Pypy is jumping out of the graph here, with the top value of over 2 cycles/instruction. Lua JIT is also bigger than Lua non-JIT, so maybe there's something to this (mostly joking, two data points don't make a series). On the other hand, Python wins as best cycles/instruction (0.91). Lots of ILP, to get below 1?

Java gets, irrespective of VM, consistently near 1.0-1.1. C++ gets very different numbers between static linking (1.666) and dynamic linking (1.045), whereas C has basically identical numbers. mksh also has a difference between dynamic and static linking. Hmm…

Ruby, TCL and Python have consistent values across versions.

And that's about what I can see from that graph. Next up, percentage of branches out of total instructions and percentage of branches missed:

Note that the two lines shouldn't really be on the same graph; for the branch %, the 100% is the total instructions count, but for the branch miss %, the 100% is the total branch count. Anyway.

There are two low-value outliers:

  • dynamically-linked C++ has a low branch percentage (17.46%) and a very low branch miss percentage (only 4.32%)
  • gcj-compiled java has a?very?low branch miss percentage (only 2.82%!!!), even though is has a “regular” branch percentage (20.85%)

So it seems the gcj libraries are well optimised? I'm not familiar enough with this topic, but on the graph it does indeed stand out.

On the other end, mksh-static has a high branch miss percentage: 11.60%, which jumps clearly ahead of all the others; this might be why it has a high cycles/instruction count, due to all the stalls in misprediction; one has to wonder why it confuses the branch predictor?

I find it interesting that the overall branch count is very similar across languages, both when most of the cost is in the kernel (e.g. asm) and when the user-space cost heavily over-weighs the kernel (e.g. Java). The average is 20.85%, minimum is 17.46%, max 22.93%, standard deviation (if I used gnumeric correctly) is just 0.01. This seems a bit suspicious to me ?. On the other hand, the mispredicted branches percentage varies much more: from a measly 2.82% to 11.60% (5x difference).

Summary

So to recap, counting just instructions:

  • going from dietlibc to glibc: 2× increase
  • going from statically-linked libc to dynamically-linked libc: doubles it again
  • going from C to C++: 5× increase
  • C++ to Perl: 1.3×
  • Perl to Ruby: 3×
  • Ruby to Python (-S): 1.6x
  • Python -S to regular Python: 5x
  • Python to Java: 1×-2×, depending on version/runtime
  • branch percentage (per total instructions) is quite consistent across all of the programs

Overall, you get roughly three orders of magnitude slower startup between a plain C program using dietlibc and Python. And all, to do basically nothing.

On the other hand, I learned some interesting things while doing it, so it wasn't quite for nothing ?.

總結

以上是生活随笔為你收集整理的linux perf - 性能测试和优化工具的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 美女福利视频网 | 国产午夜福利一区二区 | av一级免费 | 三上悠亚影音先锋 | 伊人春色av | 中文av网站 | 波多野结衣50连登视频 | 九九综合久久 | 成年男女免费视频网站 | 好大好舒服视频 | 黄色精彩视频 | 灌满闺乖女h高h调教尿h | 三级亚洲 | 精品视频99 | 一区二区三区成人 | 国产调教在线 | 精品国产91久久久久久 | 日韩精彩视频 | 激情网页 | 国产精品久久久久久久久久东京 | 天天做日日做 | 欧美久久久久久又粗又大 | 国产成人无遮挡在线视频 | 欧美色图小说 | 91肉色超薄丝袜脚交一区二区 | 337p粉嫩大胆色噜噜狠狠图片 | 无码国产精品久久一区免费 | 亚洲狠狠丁香婷婷综合久久久 | 精品一区二区三区免费看 | 亚洲第一激情 | 国产人澡人澡澡澡人碰视频 | 欧美情侣性视频 | 日韩欧美中文字幕一区二区 | 久草新 | 国产xxxx18 | 欧美丰满熟妇xxxxx | 视频免费1区二区三区 | ⅹxxxxhd亚洲日本hd老师 | www日本com| 伊人网在线 | 成人av免费网站 | 国产精品免费视频一区二区三区 | av老司机福利 | 蜜桃久久av一区 | 朝桐光一区二区 | 麻豆91在线播放 | 国产亚洲精品码 | 亚洲黄色一级大片 | 男人操女人的视频 | 日韩无马| 99久99 | 色偷偷综合网 | 午夜日韩在线观看 | 亚洲综合激情网 | 波多野结衣中文字幕久久 | 久久综合精品国产二区无码不卡 | 91精品国产高清一区二区三区蜜臀 | 日本三级中文 | 三级网站在线看 | 深夜福利视频在线观看 | 91av国产在线| 国模少妇一区二区三区 | 久久久精品一区二区 | 日日夜夜噜 | 久久视频这里只有精品 | 亚洲伦理天堂 | 日韩深夜在线 | 日本少妇与黑人 | 欧美大片一级 | 色婷婷av国产精品 | 久久精品国产亚洲AV高清综合 | 日韩精品一区二区视频 | 亚洲bb| 老色批av | 性免费视频 | 精品国产一区二区三区av性色 | 都市乱淫 | 亚洲欧美日本一区二区三区 | 国产精品成熟老女人 | 久久丫丫 | 欧美激情视频网 | 天天人人综合 | 4438亚洲最大 | 亚洲第一色图 | 美女bb视频 | 最新视频 - 88av| 欧美国产视频一区 | 特级a级片| 永久av在线免费观看 | 91网站观看 | 天堂视频一区二区 | 国产吃瓜黑料一区二区 | 日本美女操 | 男人天堂1024 | 日韩激情综合网 | 成年人免费黄色片 | 亚洲二区视频 | 欧洲黄色录像 | 亚洲第一天堂在线观看 |