當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Machine Learning week 10 quiz: Large Scale Machine Learning

發(fā)布時(shí)間：2025/3/21 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 Machine Learning week 10 quiz: Large Scale Machine Learning 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Large Scale Machine Learning

5?試題

1.?

Suppose you are training a logistic regression classifier using stochastic gradient descent. You find that the cost (say,?cost(θ,(x(i),y(i))), averaged over the last 500 examples), plotted as a function of the number of iterations, is slowly increasing over time. Which of the following changes are likely to help?

Try using a larger learning rate?α.

Try averaging the cost over a larger number of examples (say 1000 examples instead of 500) in the plot.

Try using a smaller learning rate?α.

This is not an issue, as we expect this to occur with stochastic gradient descent.

2.?

Which of the following statements about stochastic gradient

descent are true? Check all that apply.

One of the advantages of stochastic gradient descent is that it uses parallelization and thus runs much faster than batch gradient descent.

In order to make sure stochastic gradient descent is converging, we typically compute?Jtrain(θ)?after each iteration (and plot it) in order to make sure that the cost function is generally decreasing.

Before running stochastic gradient descent, you should randomly shuffle (reorder) the training set.

If you have a huge training set, then stochastic gradient descent may be much faster than batch gradient descent.

3.?

Which of the following statements about online learning are true? Check all that apply.

In the approach to online learning discussed in the lecture video, we repeatedly get a single training example, take one step of stochastic gradient descent using that example, and then move on to the next example.

One of the advantages of online learning is that there is no need to pick a learning rate?α.

One of the disadvantages of online learning is that it requires a large amount of computer memory/disk space to store all the training examples we have seen.

When using online learning, in each step we get a new example?(x,y), perform one step of (essentially stochastic gradient descent) learning on that example, and then discard that example and move on to the next.

4.?

Assuming that you have a very large training set, which of the

following algorithms do you think can be parallelized using

map-reduce and splitting the training set across different

machines? Check all that apply.

Since stochastic gradient descent processes one example at a time and updates the parameter values after each, it cannot be easily parallelized.

Linear regression trained using stochastic gradient descent.

Computing the average of all the features in your training set?μ=1m∑mi=1x(i)?(say in order to perform mean normalization).

Logistic regression trained using batch gradient descent.

5.?

Which of the following statements about map-reduce are true? Check all that apply.

If we run map-reduce using?N?computers, then we will always get at least an?N-fold speedup compared to using 1 computer.

Because of network latency and other overhead associated with map-reduce, if we run map-reduce using?N?computers, we might get less than an?N-fold speedup compared to using 1 computer.

When using map-reduce with gradient descent, we usually use a single machine that accumulates the gradients from each of the map-reduce machines, in order to compute the parameter update for that iteration.

If you have only 1 computer with 1 computing core, then map-reduce is unlikely to help.

總結(jié)

以上是生活随笔為你收集整理的Machine Learning week 10 quiz: Large Scale Machine Learning的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Stanford UFLDL教程栈式自
下一篇：机器学习入门资源不完全汇总