日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tableau使用_使用Tableau升级Kaplan-Meier曲线

發布時間:2023/11/29 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 tableau使用_使用Tableau升级Kaplan-Meier曲线 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

tableau使用

In a previous article, I showed how we can create the Kaplan-Meier curves using Python. As much as I love Python and writing code, there might be some alternative approaches with their unique set of benefits. Enter Tableau!

在上一篇文章中 ,我展示了如何使用Python創建Kaplan-Meier曲線。 盡管我非常喜歡Python和編寫代碼,但可能會有一些其他方法具有其獨特的優勢。 進入Tableau!

Source資源

Tableau is a business intelligence tool used for creating elegant and interactive visualizations on top of data coming from a vast number of sources (you would be surprised how many distinct ones are there!). To make the definition even shorter, Tableau is used for building dashboards.

Tableau是一種商務智能工具,用于在來自大量來源的數據之上創建優雅的交互式可視化效果(您會驚訝地發現那里有許多不同的數據!)。 為了使定義更短,Tableau用于構建儀表板。

So why would a data scientist be interested in using Tableau instead of Python? When creating a Notebook/report with the results of a survival analysis exercise in Python, the reader will always be limited to:

那么,為什么數據科學家會對使用Tableau而不是Python感興趣? 當使用Python的生存分析練習的結果創建Notebook /報告時,讀者將始終限于:

  • what the creator of the visualization had in mind,

    可視化創建者的想法是什么,
  • what data was available at the moment of creating the report.

    創建報告時可以使用哪些數據。

In other words, there is little freedom for the reader to explore some alternative angles. What is more, if someone in the company will accidentally find the report a few years later, the only way to make the analysis up-to-date would be to find the data scientist and make them rerun the Notebook and generate another report. Definitely not the best situation.

換句話說,讀者幾乎沒有自由來探索某些替代角度。 更重要的是,如果公司中有人會在幾年后無意間找到報告,那么使分析保持最新狀態的唯一方法是找到數據科學家,然后讓他們重新運行筆記本并生成另一份報告。 絕對不是最好的情況。

This is where a solution based on Tableau (or other business intelligence tools such as PowerBI, Looker, etc.) shines. As the visualizations are built directly on top of a data source, the visualization will be updated together with the data. Less work for the data scientist!

這是基于Tableau(或其他商業智能工具,如PowerBI,Looker等)的解決方案的發源地。 由于可視化直接建立在數據源之上,因此可視化將與數據一起更新。 減少數據科學家的工作!

Another extra benefit is the possibility to include some filters, so the readers can play around and try to explore different subsets of the data. From experience, this is a feature often used by product owners, who want to dive deep into the details and at the same time do not want to constantly come to the data person with another request for a new filter or feature. Another win :)

另一個額外的好處是可以包含一些過濾器,以便讀者可以玩轉并嘗試探索數據的不同子集。 根據經驗,這是產品所有者經常使用的功能,他們想深入了解細節,同時又不想不斷向數據人員提出新過濾器或功能的另一要求。 另一個勝利:)

Lastly, by using such tools, the analysts democratize the access to the data and analyses, as basically anyone in the company can access the dashboard and try to answer their own questions or verify their hypotheses.

最后,通過使用此類工具,分析師可以使對數據和分析的訪問民主化,因為基本上公司中的任何人都可以訪問儀表板并嘗試回答自己的問題或驗證其假設。

After this introduction, let’s jump right into re-creating the very same Kaplan-Meier curves we created in the previous article. Once again, we use the Telco Churn dataset, which requires close to no extra preparation before the analysis. Please refer to that article if you need a refresher on the Kaplan-Meier estimator, as we will not cover theory this time. Also, we assume some basic knowledge of Tableau.

在介紹完之后,讓我們直接重新創建與上一篇文章中創建的相同的Kaplan-Meier曲線。 再一次,我們使用Telco Churn數據集,該數據集在分析之前幾乎不需要任何額外準備。 如果您需要對Kaplan-Meier估計器進行復習,請參閱該文章,因為我們這次將不討論理論。 此外,我們假設您具有Tableau的一些基本知識。

Note: Tableau is a commercial software and requires a license. You can get access to a 14-day trial by following the instructions here.

注意 :Tableau是商業軟件,需要許可證。 您可以按照此處的說明訪問14天試用版。

mohamed Hassan from mohamed Hassan在PixabayPixabay上發布

方法1:簡易模式 (Approach #1: Easy mode)

The first approach is dubbed easy, as it will favor speed and simplicity, while at the same time introducing some shortcomings. First, we load the data from a text file (available here).

第一種方法被稱為簡單方法,因為它將有利于速度和簡便性,同時又帶來了一些缺點。 首先,我們從文本文件(可在此處下載 )中加載數據。

To carry out the survival analysis in Tableau, we will need the following variables:

為了在Tableau中進行生存分析,我們將需要以下變量:

  • time-to-event — expressed as time periods (for example, days or months) elapsed since joining the sample until the event of interest or censoring.

    事件發生時間-表示從加入樣本到感興趣或檢查事件為止的時間段(例如,天或數月)。
  • event-of-interest — expressed as a binary variable, where 1 indicates that the event happened, 0 otherwise.

    感興趣的事件—用二進制變量表示,其中1表示事件已發生,否則為0。
  • additional categorical variables — used for filtering and/or grouping.

    其他類別變量-用于過濾和/或分組。

The tenure variable does not require any preparation, as it already expresses the number of months since signing up for the services of the Telco company. But the Churn variable is expressed as a yes/no string, so we need to encode it to binary using a calculated field:

tenure變量不需要任何準備,因為它已經表示自注冊電信公司的服務以來的月數。 但是Churn變量表示為是/否字符串,因此我們需要使用計算字段將其編碼為二進制:

To create this field, right-click on the Churn variable in the variable selector on the left (Data tab), select Create -> Calculated Field.

要創建此字段,請在左側(數據選項卡)的變量選擇器中右鍵單擊Churn變量,然后選擇創建->計算字段。

As the next step, we create a new calculated field, d_i, which represents the number of events that occur over time:

下一步,我們創建一個新的計算字段d_i ,該字段代表隨時間發生的事件數:

The names we used for the variables correspond to the elements you can find in the formula for the Kaplan-Meier estimator.

我們用于變量的名稱與您可以在Kaplan-Meier估計器的公式中找到的元素相對應。

The next variable we create will be the denominator used for calculating the hazard function at a given time. It represents the total number of observations since the last time period:

我們創建的下一個變量將是在給定時間用于計算危險函數的分母。 它表示自上一個時間段以來的觀察總數:

The Number of Records variable is a helper variable used for, as you might have guessed, counting the observations. For that purpose, newer versions of Tableau create a variable based on the name of the data source. However, you can easily create this variable manually by creating a calculated field and placing 1 in the field’s definition. Lastly, we define the Kaplan-Meier curve as:

“ Number of Records變量是一個輔助變量,您可能已經猜到了該變量用于對觀察值進行計數。 為此,Tableau的較新版本根據數據源的名稱創建一個變量。 但是,您可以通過創建一個計算字段并在該字段的定義中放置1來輕松手動創建此變量。 最后,我們將Kaplan-Meier曲線定義為:

Here, the probability of survival is defined as 1 - hazard function.

在此,將生存概率定義為1 - hazard function 。

All the building blocks are ready. Now, we place the tenure on the x-axis, the Kaplan-Meier Curve on the y-axis, format the curve as a percentage, add the tile and place the PaymentMethod variable as a color. This way, we create the following visualization:

所有構建塊均已準備就緒。 現在,我們將使用tenure放置在x軸上,將Kaplan-Meier Curve放置在y軸上,將曲線設置為百分比格式,添加平鋪,并將PaymentMethod變量放置為顏色。 這樣,我們創建以下可視化文件:

Which is very similar to what we obtained last time using lifelines:

這與我們上次使用lifelines獲得的結果非常相似:

Some quick observations:

一些快速觀察:

  • the survival curves obtained in Tableau are more or less straight, without the characteristic step structure,

    在Tableau中獲得的生存曲線或多或少是筆直的,沒有典型的階梯結構,
  • there are no confidence intervals, as their calculation is not that simple in Tableau.

    沒有置信區間,因為在Tableau中它們的計算不是那么簡單。

Using Tableau, we can easily add some additional filters to the visualization, such as the cohort date, age, or any of the available categorical variables.

使用Tableau,我們可以輕松地向可視化添加一些其他過濾器,例如隊列日期,年齡或任何可用的分類變量。

方法2:正常模式 (Approach #2: Normal Mode)

In this approach, we will focus on recreating the characteristic step-like shape of the Kaplan-Meier curves. This approach is dubbed the normal mode, as it requires a bit more preparation.

在這種方法中,我們將專注于重新創建Kaplan-Meier曲線的特征階梯狀形狀。 這種方法被稱為普通模式,因為它需要更多的準備。

For the additional data preprocessing, we need to complete two steps. First, add a column called link to the CSV file with the Telco Customer Churn data. The column should be populated with a ‘link’ string. As a matter of fact, this string can be arbitrary, just as the column name. What matters is consistency, but all will become clear in a second. The second step is to create a new CSV file (we called it blending.csv), which contains the following:

對于其他數據預處理,我們需要完成兩個步驟。 首先,在帶有Telco客戶流失數據的CSV文件中添加一個名為link的列。 該列應使用'link'字符串填充。 實際上,該字符串可以是任意的,就像列名一樣。 重要的是一致性,但是所有這些都將在一秒鐘之內變得清晰。 第二步是創建一個新的CSV文件(我們將其稱為blending.csv ),其中包含以下內容:

link, set
link, 1
link, 2

Yep, that’s pretty much it. For your convenience, I stored both files on my GitHub.

是的,僅此而已。 為了方便起見,我將這兩個文件都存儲在GitHub上 。

Armed with the two files, we load them to Tableau and left join the tables using the link variable. You can see that in the following image.

有了這兩個文件,我們將它們加載到Tableau,并使用link變量左連接表。 您可以在下圖中看到它。

As this is the “normal mode”, we will combine a few steps at the same time and create a calculated field called Kaplan-Meier Dots:

由于這是“正常模式”,因此我們將同時結合幾個步驟,并創建一個稱為Kaplan-Meier Dots的計算字段:

You can easily recognize the contents of this field from the “easy mode”, this time, we have put everything into one field. After doing so, comes the new part. We define the Kaplan-Meier Curve as:

您可以從“簡單模式”輕松識別該字段的內容,這一次,我們將所有內容都放在一個字段中。 這樣做之后,出現了新的部分。 我們將Kaplan-Meier Curve定義為:

This convoluted formula will enable us to obtain the step-like shape of the curves. Lastly, we need one more helper variable:

這個復雜的公式將使我們能夠獲得曲線的階梯狀形狀。 最后,我們需要一個輔助變量:

When doing so, please click on the Default Table Calculation and specify to compute the results along tenure.

這樣做時,請單擊“ 默認表計算”,并指定沿tenure計算結果。

Finally, we have all the building blocks to create the curves. We approach the setup similarly to the “easy mode”, with the difference of placing the Index as the Path and set as the Detail. To recreate the curves from Python, we once again use the PaymentMethod as the Color.

最后,我們具有創建曲線的所有構造塊。 我們采用類似于“簡易模式”的方法進行設置,不同之處在于將“ Index ”放置為“路徑”并set為“細節”。 要從Python重新創建曲線,我們再次使用PaymentMethod作為顏色。

In the picture above, we accurately recreated the curves we previously obtained using the lifelines library in Python. This definitely required a bit more work but can pay off in the end.

在上圖中,我們準確地重新創建了先前使用Python的lifelines庫獲得的曲線。 這肯定需要更多的工作,但最終可以得到回報。

We can additionally use the Kaplan-Meier Dots to visualize the events as they happen along the curve. In this case, I believe this would simply clutter the visualization. It would be more suitable for a smaller dataset.

我們還可以使用Kaplan-Meier Dots來可視化沿曲線發生的事件。 在這種情況下,我相信這只會使可視化變得混亂。 它更適合于較小的數據集。

We can further improve the dashboard by adding some filters/splits and then share it with our colleagues via the company’s reporting portal (in this case, an instance of Tableau Server).

我們可以通過添加一些過濾器/拆分來進一步改進儀表板,然后通過公司的報告門戶(在本例中為Tableau Server實例)與同事共享。

結論 (Conclusion)

In this article, I explained the potential benefits of using business intelligence tools such as Tableau for survival analysis and showed how to create dashboards with the Kaplan-Meier curves.

在本文中,我解釋了使用Tableau等商業智能工具進行生存分析的潛在好處,并展示了如何使用Kaplan-Meier曲線創建儀表板。

As is often the case, nothing comes for free and there are also some disadvantages to this approach:

通常,沒有什么是免費的,這種方法也有一些缺點:

  • Calculating the confidence intervals is definitely harder and needs quite some effort.

    計算置信區間肯定比較困難,并且需要付出很多努力。
  • In Tableau, there is no simple way to carry out the log-rank test to compare different survival curves (unless we use R from Tableau, but this might be an idea for a future article).

    在Tableau中,沒有簡單的方法來執行對數秩檢驗以比較不同的生存曲線(除非我們使用Tableau中的R,但這可能是以后的文章的想法)。
  • If new features are added to the data, for example, new customer segmentation or another category for each observation, this will still require some work from an analyst to add to an already existing dashboard. However, most of the time this does not happen often or requires little extra work.

    如果將新功能(例如,新客戶細分或每個觀察的另一個類別)添加到數據中,則仍需要分析師進行一些工作才能添加到現有儀表板中。 但是,在大多數情況下,這種情況并不經常發生或需要很少的額外工作。

I hope you enjoyed this alternative approach to visualizing the Kaplan-Meier curves. As always, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.

我希望您喜歡這種替代方法來可視化Kaplan-Meier曲線。 一如既往,歡迎任何建設性的反饋。 您可以在Twitter或評論中與我聯系。

If you liked this article, you might also like the other ones in the series:

如果您喜歡這篇文章,您可能還喜歡該系列中的其他文章:

翻譯自: https://towardsdatascience.com/level-up-your-kaplan-meier-curves-with-tableau-bc4a10ec6a15

tableau使用

總結

以上是生活随笔為你收集整理的tableau使用_使用Tableau升级Kaplan-Meier曲线的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 香蕉网站在线观看 | 亚洲欧美在线视频 | 日干夜干天天干 | 欧美xxxx中国 | 性囗交免费视频观看 | 日韩欧美一区二区区 | 用舌头去添高潮无码视频 | 大陆一级黄色片 | 最新一区二区三区 | 男女调教视频 | 深夜网站在线观看 | 浪漫樱花动漫在线观看免费 | 无限资源日本好片 | 第四色影音先锋 | 国产精品一区二区三 | 亚洲一二三区av | 日韩高清国产一区在线 | 女性喷水视频 | 欧美色图片区 | 一级在线免费观看 | 久久久久久久久久久久久久久久久久久 | 91岛国| 欧美日韩免费 | 天天干干天天 | 亚洲高清毛片 | 成年人网站在线免费观看 | 激烈娇喘叫1v1高h糙汉 | 欧美怡红院 | 成人免费播放视频 | 欧美精品第一区 | 欧美xx在线 | 国产欧美一区二区三区鸳鸯浴 | 精品少妇一区二区三区免费观 | 四虎看黄| 日韩图色| 奇米第四色在线 | 中文字幕人成人乱码亚洲电影 | 免费成年人视频 | 天天射天天拍 | 日韩视频免费观看高清完整版在线观看 | 黑人操亚洲人 | 国产精品99久久免费黑人人妻 | 激情五月五月婷婷 | 久久久久免费精品 | 福利所第一导航 | 六月激情综合网 | 欧美va天堂 | 丝袜脚交免费网站xx | 亚洲人体一区 | 午夜之声l性8电台lx8电台 | 亚洲欧美va天堂人熟伦 | 欧美高清性xxxxhd | 香蕉视频在线播放 | 精品国产亚洲一区二区麻豆 | 黄色在线观看网址 | 欧美一区二区三区久久妖精 | 麻豆视频免费入口 | avtt久久| 韩国伦理中文字幕 | av国产一区二区 | 成人深夜影院 | 日日夜夜网站 | 成人手机视频在线观看 | 少妇一级淫片免费放 | 超碰人人人人人人 | 久久天天躁狠狠躁夜夜躁 | 窝窝在线视频 | 72pao成人国产永久免费视频 | 丰满熟女人妻一区二区三区 | 日韩伦理在线视频 | 在线观看av网页 | 国产又大又粗又爽 | 国产精品日韩 | 国产精品一品 | 91欧美大片 | 一区二区在线视频免费观看 | 欧美色涩在线第一页 | 精品爆乳一区二区三区无码av | 日韩欧美视频网站 | 久久无吗视频 | 3344av| 日韩成人免费在线视频 | 成人午夜精品福利 | 密臀av在线 | 视频一区二区免费 | 亚洲区小说区 | 一区二区三区日本视频 | 成人xxxx | chinese麻豆新拍video | 法国极品成人h版 | 男女视频免费观看 | 欧美大片a | 少妇被黑人到高潮喷出白浆 | 91成人福利视频 | 欧美性猛交xxxx乱大交俱乐部 | 三上悠亚久久 | 99人人爽| 欧美日韩一卡二卡 | 爱爱一级|