如何击败腾讯_击败股市
如何擊敗騰訊
個(gè)人項(xiàng)目 (Personal Proyects)
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Towards Data Science編輯的注意事項(xiàng): 盡管我們允許獨(dú)立作者按照我們的 規(guī)則和指南 發(fā)表文章 ,但我們不認(rèn)可每位作者的貢獻(xiàn)。 您不應(yīng)在未征求專業(yè)意見的情況下依賴作者的作品。 有關(guān) 詳細(xì)信息, 請參見我們的 閱讀器條款 。
摘要 (Summary)
This is a personal project in which I have tried to develop a trading application using machine learning tools. Starting with data modelling along with a categorisation based on distribution and machine learning techniques, I have developed a trading strategy for beginner investors to generate low-risk profit with the help of this application.
這是一個(gè)個(gè)人項(xiàng)目,我嘗試使用機(jī)器學(xué)習(xí)工具開發(fā)交易應(yīng)用程序。 從數(shù)據(jù)建模以及基于分布和機(jī)器學(xué)習(xí)技術(shù)的分類開始,我已經(jīng)開發(fā)了一種交易策略,供初學(xué)者使用此應(yīng)用程序產(chǎn)生低風(fēng)險(xiǎn)利潤。
介紹 (Introduction)
The market analysis is both interesting and complex as it can be seen in the following link [1]. Nevertheless, there are several works carried out with machine-learning which try to shed light on this field.
市場分析既有趣又復(fù)雜,因?yàn)榭梢栽谝韵骆溄又锌吹絒1] 。 然而,機(jī)器學(xué)習(xí)方面進(jìn)行了一些工作,試圖為這一領(lǐng)域提供啟示。
In this piece of work, I have created an application consisting of two main points:
在本文中,我創(chuàng)建了一個(gè)包含兩個(gè)要點(diǎn)的應(yīng)用程序:
A screen where stock market index may be analysed over different temporal horizons. Here it can be found a candlestick chart; a chart to analyse technical indicators [2]; a line chart which shows the percentage of price change between days, as well as a box-plot representing this last chart in order to understand that distribution.
一個(gè)可以在不同時(shí)間范圍內(nèi)分析股市指數(shù)的屏幕。 在這里可以找到一個(gè)燭臺圖 ; 分析技術(shù)指標(biāo)的圖表[2] ; 線形圖顯示了兩天之間價(jià)格變化的百分比,以及表示最后一個(gè)圖的箱形圖,以了解這種分布。
A screen where the analysis of the trading strategy which I have developed (Strategyone) can be done. This strategy is divided into two different parts: the first one consisting of the prediction of stock market index movements by means of machine learning, whereas the second one involves the comparison between the current data vectors prediction and what happened in the past. The chosen temporal horizons range from 7, 14, 21 to 28 days.
可以在此屏幕上分析我開發(fā)的交易策略( Strategyone )。 該策略分為兩個(gè)不同的部分:第一個(gè)部分包括通過機(jī)器學(xué)習(xí)預(yù)測股市指數(shù)運(yùn)動(dòng),而第二個(gè)則包括當(dāng)前數(shù)據(jù)向量預(yù)測與過去發(fā)生的情況之間的比較 。 所選的時(shí)間范圍為7、14、21至28天。
This last section is explained thoroughly in “How to beat the market” and “Trading strategy”
最后一部分在“如何戰(zhàn)勝市場”和“交易策略”中進(jìn)行了詳細(xì)說明。
Data has been obtained through the Alpha Vantage API [3], while a list of the stock market index from the Finnhub API [4].
數(shù)據(jù)是通過Alpha Vantage API [3]獲得的 ,而股票市場指數(shù)則是從Finnhub API [4]獲得的 。
語境 (Context)
As a physicist I have been always fascinated by the complex systems world: how certain formulae can be applied to and have interesting results either to biological systems or financial ones, as well as to a group of several electrons.
作為物理學(xué)家,我一直著迷于復(fù)雜的系統(tǒng)世界:如何將某些公式應(yīng)用于生物系統(tǒng)或金融系統(tǒng)以及一組多個(gè)電子,并對它們產(chǎn)生有趣的結(jié)果。
Likewise, how the individual study of an element of the system might result into a different behaviour when it is studied within the system.
同樣,當(dāng)在系統(tǒng)中進(jìn)行研究時(shí),對系統(tǒng)元素的個(gè)別研究可能如何導(dǎo)致不同的行為。
Consequently, this project emerges from the curiosity about the stock market in addition to the software and intellectual challenge that implies to understand such a complex system as the market is.
因此,除了對理解市場這樣一個(gè)復(fù)雜系統(tǒng)的軟件和知識挑戰(zhàn)之外,該項(xiàng)目還來自對股票市場的好奇心。
The project has gone through three stages: the first version of this work was developed as the final thesis of the Master’s Degree in Data Science which I attended in [5], and whose aim was only the creation of classification model which could predict the future of an stock in the market using machine learning. The second version was designed externally to the Master’s and it tried to improve the first one. Finally, the third version is the one here discussed, and it offers a significant improvement, the development of a trading strategy.
該項(xiàng)目經(jīng)歷了三個(gè)階段:這項(xiàng)工作的第一個(gè)版本是我在[5]中參加的數(shù)據(jù)科學(xué)碩士學(xué)位的最終論文,其目的僅僅是建立可以預(yù)測未來的分類模型。使用機(jī)器學(xué)習(xí)來分析市場中的股票 第二個(gè)版本是在碩士課程外部設(shè)計(jì)的,它試圖改進(jìn)第一個(gè)版本。 最后,這里討論的是第三個(gè)版本,它提供了顯著的改進(jìn),即交易策略的發(fā)展。
如何打敗市場 (How to beat the market)
In order to use a classification model to predict market movements, I needed to categorise the data. These prediction categories have been called “Strong bull”, predictions in which the price increase is significant; “Bull”, when there is a price increase; “Keep”, the price remains the same; “Bear”, a decrease on the price, and “Strong bear”, the price decrease is significant [6].
為了使用分類模型來預(yù)測市場走勢,我需要對數(shù)據(jù)進(jìn)行分類。 這些預(yù)測類別被稱為“強(qiáng)牛”,即價(jià)格上漲顯著的預(yù)測。 “牛”,當(dāng)價(jià)格上漲時(shí); “保持”,價(jià)格保持不變; 價(jià)格下降的“熊市”和價(jià)格下跌的“強(qiáng)熊市”很明顯[6] 。
How are the stock market index categories chosen?
如何選擇股市指數(shù)類別?
This have done through the distribution of percentage variation in the stock price. As our aim is predicting the future, in the registers, the percentage variation column needs the daily information about how the price varies in relation to the temporal horizon that we want to predict.
這是通過分配股票價(jià)格的百分比變化來實(shí)現(xiàn)的 。 由于我們的目標(biāo)是預(yù)測未來,因此在價(jià)格記錄中,百分比變化列需要有關(guān)價(jià)格如何相對于我們要預(yù)測的時(shí)間范圍變化的每日信息 。
Therefore, the variation percentage to be categorised is compared to the last 4-month distribution, and one of the categories abovementioned will be selected based on the range of the percentiles in relation to that distribution.
因此,將要分類的變化百分比與最近4個(gè)月的分布進(jìn)行比較,并且將基于相對于該分布的百分位數(shù)范圍選擇上述類別之一。
In this way, we could categorise all the data given a temporal horizon, and this will always be about the future.
這樣,我們可以在時(shí)間范圍內(nèi)對所有數(shù)據(jù)進(jìn)行分類,而這將永遠(yuǎn)與未來有關(guān)。
Once the categorisation is done, the next step was getting to know which the best way to apply an algorithm of classification with more precision is. After a number of trials and different ideas, the selected process was scaling the data by means of the robust scaler technique and Random Forest as classification algorithm. These were the chosen ones since they provide an average higher precision upon all the categories.
分類完成后,下一步就是知道哪種方法更準(zhǔn)確地應(yīng)用分類算法。 經(jīng)過大量的試驗(yàn)和不同的想法,選擇的過程是通過健壯的縮放器技術(shù)和隨機(jī)森林作為分類算法來縮放數(shù)據(jù)。 選擇它們是因?yàn)樗鼈冊谒蓄悇e上均提供了平均較高的精度。
Only following these steps, we can obtain a model which is able to predict “Strong bull” with a 40 % level of accuracy.
僅按照這些步驟,我們就可以得到能夠以40%的準(zhǔn)確度預(yù)測“強(qiáng)牛”的模型。
交易策略 (Trading Strategy)
The trading strategy will be based on what happened in the past and on the idea that we guess correctly provided that we win, omitting that in order to win we must also guess the right predicted category.
交易策略將基于過去發(fā)生的情況以及我們能夠正確猜出的想法(前提是我們獲勝),而忽略了為了獲勝,我們還必須猜出正確的預(yù)測類別 。
That is, if the prediction is “Bull”, we carry out a long position operation and the resulting outcome is actually “Strong bull”, our prediction will be considered as accurate. Likewise, if we predict “Strong bull” and the result is “Bull” or when the prediction is “Strong bear”, we carry out a short position movement and the outcome achieved is “Bear” and the other way round.
也就是說,如果預(yù)測為“牛”,我們執(zhí)行多頭頭寸操作,而結(jié)果實(shí)際上為“強(qiáng)牛”,我們的預(yù)測將被認(rèn)為是準(zhǔn)確的。 同樣,如果我們預(yù)測“強(qiáng)牛”而結(jié)果為“牛”,或者當(dāng)預(yù)測為“強(qiáng)熊”時(shí),我們進(jìn)行空頭頭寸移動(dòng)并且獲得的結(jié)果為“熊”,反之亦然。
If none of the abovementioned cases take place, the operation will be considered as a fail.
如果上述情況均未發(fā)生,則該操作將被視為失敗。
Having this in mind, the strategy will only consist of long position operation and when the model predicts “Strong bull” given that it is the category with higher accuracy from the classification model.
考慮到這一點(diǎn), 該策略將僅包括多頭頭寸操作,并且當(dāng)模型預(yù)測“強(qiáng)牛”時(shí) , 該策略將被認(rèn)為是分類模型中具有較高準(zhǔn)確性的類別。
How does the strategy work?
該策略如何運(yùn)作?
Once the robust scaler is applied to all the registers, the category is predicted and the actual categorisation, a PCA is applied to reduce the number of dimensions to 4 maintaining the 95 % of data variability. Therefore, we have other 4 variables together with the prediction linked to the register and its actual category. How the variables are can be known when something is predicted in relation to the real category, so we arrange the prediction and the category, and we calculate the median associated to each profile curve to understand how to describe each one.
一旦將魯棒縮放器應(yīng)用于所有寄存器,預(yù)測了類別并進(jìn)行了實(shí)際分類, 便會(huì)應(yīng)用PCA將維數(shù)減少到4,以保持95%的數(shù)據(jù)可變性。 因此,我們還有其他4個(gè)變量以及鏈接到寄存器及其實(shí)際類別的預(yù)測。 當(dāng)預(yù)測與真實(shí)類別有關(guān)的某物時(shí)可以知道變量的方式 ,因此我們安排了預(yù)測和類別,并計(jì)算了與每個(gè)輪廓曲線相關(guān)的中位數(shù)以了解如何描述每個(gè)輪廓。
As a result, we will have described the variables in which “Strong bull” is predicted” and the actual outcome was “Strong bull” or any other category.
結(jié)果,我們將描述“強(qiáng)牛”被預(yù)測的變量,而實(shí)際結(jié)果是“強(qiáng)牛”或任何其他類別。
All of this will be limited to the last 6-month-data in relation to the prediction day in order to avoid the influence of an old market state on the strategy. The results obtained are summarised below:
所有這些都將僅限于與預(yù)測日相關(guān)的最后6個(gè)月的數(shù)據(jù),以避免舊市場狀況對策略的影響。 獲得的結(jié)果總結(jié)如下:
Description of the variables for each prediction-category after the PCA.PCA之后每個(gè)預(yù)測類別的變量說明。The interpretation of this table is that in the last 6 months before the prediction of “Strong bull” and the category was guess correctly, the variables of the main components had as the median.
該表的解釋是,在“強(qiáng)牛”的預(yù)測出現(xiàn)之前的最近6個(gè)月中,該類別被正確猜出,主要成分的變量為中位數(shù)。
Consequently, in order to carry out a operation, we must apply the data of the day in which we are doing the prediction a robust scaler and a PCA,
因此,為了執(zhí)行操作,我們必須應(yīng)用進(jìn)行預(yù)測的當(dāng)天的數(shù)據(jù),魯棒的縮放器和PCA,
If the prediction obtained is “Strong bull”, we will have reached the first step to carry out the operation. The second step is checking which profile of the previous curves is more similar to the data that is being predicted. This will be done using the cosine similarity which will allow us to observe the more similar vector to the data. If it corresponds to “Strong bull-strong bull”, we will have the key to perform a safer operation.
如果獲得的預(yù)測是“堅(jiān)強(qiáng)的公牛”,我們將到達(dá)執(zhí)行該操作的第一步。 第二步是檢查先前曲線的輪廓與正在預(yù)測的數(shù)據(jù)更相似。 這將使用余弦相似度完成,這將使我們能夠觀察到與數(shù)據(jù)更相似的向量。 如果它對應(yīng)于“強(qiáng)牛-強(qiáng)牛”,我們將擁有執(zhí)行更安全操作的鑰匙。
Following this trading strategy, we will obtain almost a 50 % level of accuracy, but, as it was mentioned at the beginning, guessing correctly does not imply guessing the category too.
按照這種交易策略,我們將獲得幾乎50%的準(zhǔn)確度 ,但是,正如開頭提到的那樣,正確猜測并不意味著也猜測類別。
Guessing correctly does not imply guessing the category too
正確猜測并不意味著也猜測類別
Under our circumstances, a correct guessing will be also the prediction of “Strong bull” and obtaining “Bull” as a final result. The strategy level of accuracy reaches 58 % when this is taken into account.
在我們的情況下,正確的猜測也將是對“強(qiáng)牛”的預(yù)測,并最終獲得“牛”。 考慮到這一點(diǎn),策略的準(zhǔn)確性達(dá)到58%。
結(jié)論 (Conclusion)
The aim of this piece of work was the development of a strategy which allows a beginner investor to carry out to generate low-risk profit without suffering a total loss. As I have mentioned, the strategy ensures a 58 % level of accuracy under the described conditions, but, on a personal note, it is not a strategy to be implemented automatically because the error level assumed raises up to 40%.
這項(xiàng)工作的目的是開發(fā)一種策略,該策略允許初學(xué)者投資者開展活動(dòng)以產(chǎn)生低風(fēng)險(xiǎn)的利潤而不會(huì)造成總損失。 正如我已經(jīng)提到的那樣,該策略可確保在所描述的條件下達(dá)到58%的準(zhǔn)確度,但是,就個(gè)人而言,由于假定的錯(cuò)誤級別會(huì)上升到40%,因此這不是自動(dòng)實(shí)施的策略。
However, it is interesting to see how a level of accuracy over 50 % is obtained in the performed operations, following a strategy based only on data and with a limited and minimal knowledge about the stock market.
然而,有趣的是,遵循僅基于數(shù)據(jù)且對股票市場的了解有限且很少的策略,在執(zhí)行的操作中如何獲得超過50%的準(zhǔn)確度。
All the project code can be read on: GitHub/esan94/bsm03
所有項(xiàng)目代碼都可以在GitHub / esan94 / bsm03上閱讀
后續(xù)步驟 (Following Steps)
The possible next steps to improve might include:
可能需要改進(jìn)的后續(xù)步驟可能包括:
- The change of the data model. 數(shù)據(jù)模型的變化。
- The improvement of the classification algorithm. 分類算法的改進(jìn)。
- The addition to the project of more knowledge about the stock market. 除了該項(xiàng)目以外,還擁有有關(guān)股票市場的更多知識。
- The assignation of value to the main components to apply the cosine similarity. 將值分配給主要成分以應(yīng)用余弦相似度。
資源資源 (Resources)
[1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis
[1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis
[2] https://www.investopedia.com/technical-analysis-4689657
[2] https://www.investopedia.com/technical-analysis-4689657
[3] https://www.alphavantage.co/
[3] https://www.alphavantage.co/
[4] https://finnhub.io/
[4] https://finnhub.io/
[5] https://kschool.com/
[5] https://kschool.com/
[6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/
[6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/
You can follow me on LinkedIn, GitHub o Medium.
您可以在LinkedIn , GitHub或 o 中關(guān)注我。
Translation made by Paloma Sánchez Narváez.
翻譯由PalomaSánchezNarváez撰寫 。
翻譯自: https://towardsdatascience.com/beating-stock-market-8b33c5afb633
如何擊敗騰訊
總結(jié)
以上是生活随笔為你收集整理的如何击败腾讯_击败股市的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到有几个男人对我有好感
- 下一篇: 如何将Jupyter Notebook连