日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

27个机器学习图表翻译_使用机器学习的信息图表信息组织

發(fā)布時(shí)間:2023/11/29 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 27个机器学习图表翻译_使用机器学习的信息图表信息组织 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

27個(gè)機(jī)器學(xué)習(xí)圖表翻譯

Infographics are crucial for presenting information in a more digestible fashion to the audience. With their usage being expanding to many (if not all) professions like journalism, science, and research, advertisements, business, the research on automating the process of generating beautiful and user-centric infographics has been the latest features of the data visualization community.

信息圖表對于以更易消化的方式向觀眾展示信息至關(guān)重要。 隨著它們的使用擴(kuò)展到許多(如果不是全部)行業(yè),例如新聞,科學(xué),研究,廣告,商業(yè),關(guān)于自動生成美觀和以用戶為中心的信息圖表的過程的研究已成為數(shù)據(jù)可視化社區(qū)的最新功能。

In this series of posts, we will discuss 5 pioneering research papers focusing on automating the process of generating beautiful infographics for different types of data.

在本系列文章中,我們將討論5項(xiàng)開拓性研究論文,這些論文專注于針對不同類型的數(shù)據(jù)自動生成漂亮的信息圖表的過程。

Infographics examples. [1]信息圖表示例。 [1]

Presently, there are many, very powerful design software and code libraries supporting infographic generation from data. The list below mentions some of these tools and libraries that you might want to check out. However, when it comes to designing infographics, the process is not straightforward. To create a very engaging piece of art, it requires expensive labor and is generally very time-consuming. Every small piece of information, from selecting what topics are to be highlighted, to all the way for choosing color combinations, the skillset required to create an infographic is also very diverse.

當(dāng)前,有許多非常強(qiáng)大的設(shè)計(jì)軟件和代碼庫支持從數(shù)據(jù)生成信息圖。 下面的列表提到了您可能需要檢出的一些工具和庫。 但是,在設(shè)計(jì)信息圖表時(shí),過程并不簡單。 為了制造出非常吸引人的藝術(shù)品,這需要昂貴的勞動并且通常非常耗時(shí)。 從選擇要突出顯示的主題到選擇顏色組合的所有小信息,創(chuàng)建信息圖所需的技能也非常多樣化。

Softwares Supporting Infographic Generation

支持信息圖生成的軟件

  • Microsoft Powerpoint (Design Ideas)

    Microsoft Powerpoint(設(shè)計(jì)思路)
  • Microsoft PowerBI — For developing data visualization dashboards

    Microsoft PowerBI —用于開發(fā)數(shù)據(jù)可視化儀表板
  • Adobe Illustrator

    Adobe Illustrator的
  • Tableau

    畫面

Javascript packages to help create infographics

Javascript包可幫助創(chuàng)建信息圖表

  • D3.js

    D3.js
  • Highcharts.js

    Highcharts.js

自動化的信息圖表設(shè)計(jì) (Automated Infographics Design)

Recent research in Information Visualization has seen increased interest in automating/semi-automating the complicated process of infographics generation. However, the main purpose of this research has not been to completely take control away from humans, but instead, focus on developing techniques to support designers in the decision-making process. To study this research, we have divided the papers broadly into 5 categories:

信息可視化的最新研究已經(jīng)看到了對自動化/半自動化信息圖表生成過程的興趣。 但是,這項(xiàng)研究的主要目的并不是要完全擺脫對人類的控制,而是要著重于開發(fā)技術(shù)以在決策過程中為設(shè)計(jì)師提供支持。 為了研究這項(xiàng)研究,我們將論文大致分為5類:

  • Timeline infographics design automation

    時(shí)間線圖表設(shè)計(jì)自動化
  • Icon design automation

    圖標(biāo)設(shè)計(jì)自動化
  • Information flow based automation

    基于信息流的自動化
  • Text-based automation

    基于文本的自動化
  • Image Chart fusion automation

    圖像圖表融合自動化

時(shí)間線圖表生成[1,2] (Timeline Infographics Generation [1,2])

Extracting template components from a time series infographic and fitting new data. [1]從時(shí)間序列圖表中提取模板組件并擬合新數(shù)據(jù)。 [1]

As the name suggests, these methods try to automatically design an infographic for time-based data. One of the approaches works directly on the bitmap images of an already existing timeline infographic to extract global and local information. Global information can be of the type: Orientation, Layout (Unified, Faceted, Segmented, etc. ) and Representation type (Radial, Linear, etc.). Similarly, local information is about the bounding boxes containing a piece of information in the infographic, for example, text boxes, icons, etc. These methods use already existing Convolutional Neural Networks to draw bounding boxes or segment the infographic for local information and also predict the values for global information via classification. After the template information is extracted, we can replace the old information with a new piece of information to get a new infographic automatically.

顧名思義,這些方法嘗試為基于時(shí)間的數(shù)據(jù)自動設(shè)計(jì)圖表。 一種方法直接在已經(jīng)存在的時(shí)間線圖表的位圖圖像上工作,以提取全局和本地信息。 全局信息的類型可以是:方向,布局(統(tǒng)一,多面,分段等)和表示類型(徑向,線性等)。 類似地,本地信息與包含信息圖表中的一條信息的邊界框有關(guān),例如文本框,圖標(biāo)等。這些方法使用已經(jīng)存在的卷積神經(jīng)網(wǎng)絡(luò)繪制邊界框或?qū)π畔D進(jìn)行分段以獲取局部信息,并預(yù)測通過分類獲取全球信息的值。 提取模板信息后,我們可以將舊信息替換為新信息,以自動獲取新信息圖。

Timeline Storyteller. [2]時(shí)間軸講故事的人。 [2]

One the other hand, there exists a visualization dashboard called Timeline Storyteller [2] which directly takes the raw CSV/Excel sheet of timeline data and generates infographics which can be later customized by the users as per their design choices. The users can design infographics and animations with even very large time-series datasets and import pictures of their choice into these infographics as shown in the example. Try the timeline storyteller here.

另一方面,存在一個(gè)可視化儀表板,稱為時(shí)間線講故事[2],它直接獲取時(shí)間線數(shù)據(jù)的原始CSV / Excel工作表并生成信息圖表,這些信息以后可根據(jù)用戶的設(shè)計(jì)選擇進(jìn)行定制。 用戶可以使用非常大的時(shí)間序列數(shù)據(jù)集設(shè)計(jì)圖表和動畫,并將他們選擇的圖片導(dǎo)入這些圖表,如示例所示。 在此處嘗試時(shí)間軸講故事的人。

圖標(biāo)設(shè)計(jì)自動化[3] (Icon Design Automation [3])

Compound Icon Generation. [3]復(fù)合圖標(biāo)生成。 [3]

The next category we have on our list is about techniques to design complex icons. So given an input text, for example, House Cleaning, the task is to come up with a semantically meaningful icon. Now, the problem might look simple to search for icons by each word in the query, for example, an icon for “House” and another one for “Cleaning”. Now combine both of these icons and there we go, we have a compound icon. Even though this is correct for simple queries, however, the data for semantically labeled icons is scarce. So we need to figure out ways to extend the existing semantic knowledge of labeled icons to other sectors that are not so well explored. For this purpose, using the well-studied word embeddings from Natural Language Processing can be useful.

我們列表中的下一個(gè)類別是有關(guān)設(shè)計(jì)復(fù)雜圖標(biāo)的技術(shù)。 因此,給定輸入文本(例如, House Cleaning) ,任務(wù)是拿出一個(gè)語義上有意義的圖標(biāo)。 現(xiàn)在,按查詢中每個(gè)單詞搜索圖標(biāo)看起來似乎很簡單,例如,一個(gè)圖標(biāo)用于“房屋”,另一個(gè)圖標(biāo)用于“清潔”。 現(xiàn)在將這兩個(gè)圖標(biāo)結(jié)合起來,我們開始制作復(fù)合圖標(biāo)。 即使對于簡單查詢來說這是正確的,但是,帶有語義標(biāo)簽的圖標(biāo)的數(shù)據(jù)卻很少。 因此,我們需要找出將標(biāo)簽圖標(biāo)的現(xiàn)有語義知識擴(kuò)展到其他領(lǐng)域的方法,而這些領(lǐng)域并沒有得到很好的探索。 為此,使用自然語言處理中經(jīng)過深入研究的詞嵌入可能會很有用。

Developing a compound Icon pipeline from a text query. [3]通過文本查詢開發(fā)復(fù)合Icon管道。 [3]

Given a query text, we calculate the nearest word for each unigram that is annotated and is associated with an icon in the existing dataset. Then the extracted icons from the query unigrams are ranked on the basis of style compatibility. To measure style compatibility, an embedding vector is generated for each icon describing it’s style. So closer the style vectors of two icons, similar they are in style. For this purpose, we can train a CNN to generate these style embeddings. This model is trained on an existing 1000 human-curated compound icons dataset where the individual icons inside a compound icon were considered more similar in styles as opposed to a different style of that icon occurring in another input compound icon.

給定一個(gè)查詢文本,我們將為每個(gè)有符號的,與現(xiàn)有數(shù)據(jù)集中的圖標(biāo)相關(guān)聯(lián)的字母組合計(jì)算最接近的單詞。 然后,根據(jù)樣式兼容性對從查詢字母組合中提取的圖標(biāo)進(jìn)行排名。 為了衡量樣式的兼容性,將為每個(gè)描述樣式的圖標(biāo)生成一個(gè)嵌入向量。 如此接近兩個(gè)圖標(biāo)的樣式矢量,它們的樣式相似。 為此,我們可以訓(xùn)練CNN來生成這些樣式嵌入。 該模型在現(xiàn)有的1000個(gè)人類管理的復(fù)合圖標(biāo)數(shù)據(jù)集上進(jìn)行了訓(xùn)練,在該數(shù)據(jù)集中,復(fù)合圖標(biāo)內(nèi)的各個(gè)圖標(biāo)在樣式上被認(rèn)為與在另一個(gè)輸入復(fù)合圖標(biāo)中出現(xiàn)的該圖標(biāo)的不同樣式更加相似。

For the final part of the jigsaw, when the icons are filtered based on semantics and style compatibility, they are placed based on space compatibility. To calculate the space compatibility, the icons from the 1k human-curated compound icons are studied to generate templates based on each of the icons (shown in the image above). This is done to generate an idea of where the other icon can be placed relative to the current icon. Using this information, the icons are placed in the template to generate compound icons.

對于拼圖的最后一部分,當(dāng)根據(jù)語義和樣式兼容性對圖標(biāo)進(jìn)行過濾時(shí),將根據(jù)空間兼容性來放置圖標(biāo)。 為了計(jì)算空間兼容性,研究了來自1k種人類固化復(fù)合圖標(biāo)的圖標(biāo),以基于每個(gè)圖標(biāo)生成模板(如上圖所示)。 這樣做是為了產(chǎn)生一個(gè)想法,即相對于當(dāng)前圖標(biāo)可以放置另一個(gè)圖標(biāo)。 使用此信息,將圖標(biāo)放置在模板中以生成復(fù)合圖標(biāo)。

基于信息流的自動化[4] (Information Flow Based Automation [4])

Moving on to the text category, this work focuses on extracting information flow in infographics.

轉(zhuǎn)到文本類別,此工作著重于提取信息圖表中的信息流。

Given an infographic image, the information flow is basically a way to display the direction of visual group placements inside that image. Visual Groups are the information containing segments inside an infographic which are repeated to present a full picture. The flow of these visual groups is called Narrative Flow.

給定一個(gè)信息圖圖像,信息流基本上是一種顯示圖像內(nèi)部視覺組放置方向的方法。 可視組是信息,這些信息包含信息圖表內(nèi)的片段,這些片段會重復(fù)顯示完整的圖片。 這些視覺組的流動稱為敘事流 。

Information Flow Direction in Infographics. [4]信息流在圖表中的方向。 [4]

This paper classifies these Narrative Flow patterns into 12 classes based on the studied Visual Groups and their placements in the 13k infographic images dataset. Object Detection CNNs were used to initially detect Visual Groups containing Icons and Texts inside infographics and then the placements were studied to generate the information flow diagram.

根據(jù)研究的視覺組及其在13k信息圖圖像數(shù)據(jù)集中的位置,將這些敘事流模式分為12類。 使用對象檢測CNN最初檢測信息圖表中包含圖標(biāo)和文本的視覺組,然后研究放置位置以生成信息流程圖。

Generating the Information Flow path from the bounding boxes detected by the YOLO network. [4]根據(jù)YOLO網(wǎng)絡(luò)檢測到的邊界框生成信息流路徑。 [4]

This paper discusses a Flow Extraction Algorithm to group the bounding boxes detected by the CNN (YOLO) into visual groups based on proximity and size and then detect the flow of these visual groups to predict the final visual information flow. Besides this, this system is also able to perform a reverse selection and classification where the users draw the direction of information flow and the system fetches the relevant infographics with a similar direction of flow. Also, as discussed above, the 12 classification categories of information flow are shown in the image below.

本文討論了一種流量提取算法 ,該算法將CNN( YOLO )所檢測到的邊界框根據(jù)接近度和大小分為可視組,然后檢測這些可視組的流量以預(yù)測最終的可視信息流。 除此之外,該系統(tǒng)還能夠執(zhí)行反向選擇和分類,其中用戶繪制信息流的方向,并且系統(tǒng)以相似的流向獲取相關(guān)信息圖表。 另外,如上所述,下圖顯示了信息流的12個(gè)分類類別。

12 categories of information flow in infographics. [4]信息流在圖表中的12類。 [4]

This paper also studies the spatial distribution of different elements inside the infographics based on these 12 classes, as shown below.

本文還基于這12個(gè)類別研究了信息圖表內(nèi)部不同元素的空間分布,如下所示。

Spatial Distribution of Elements in each Narrative Flow category. [4]每個(gè)敘述流類別中元素的空間分布。 [4]

基于文本的自動化[5] (Text-Based Automation [5])

Automatic infographics generated for the statement: More than 20% of Smartphone users are social network users. [5]為該聲明生成的自動信息圖表: 超過20%的Smartphone用戶是社交網(wǎng)絡(luò)用戶 。 [5]

Another system in this series is known as the Text-to-Viz. Given a statistical statement, this system tries to directly come up with complete infographic design. Unlike other tools for infographic management, where the user needs to/can edit the final design of the infographic, Text-to-Viz generates these well defined, aesthetic infographics which need no editing. The best use case of this system if for the scenarios where the user doesn’t need to create a very design rich infographic but needs something simple and quick to present a piece of statistical information in a better way. According to this paper, there are 4 types of most common infographics:

該系列中的另一個(gè)系統(tǒng)稱為“ 文本到視頻”。 給定統(tǒng)計(jì)報(bào)表,此系統(tǒng)將嘗試直接提出完整的信息圖表設(shè)計(jì)。 與用戶需要/可以編輯信息圖的最終設(shè)計(jì)的其他信息圖管理工具不同,Text-to-Viz生成了這些定義清晰,美觀的信息圖,無需進(jìn)行編輯。 該系統(tǒng)的最佳用例是針對以下情況:用戶不需要創(chuàng)建非常豐富的信息圖表,而是需要簡單快速地以更好的方式呈現(xiàn)統(tǒng)計(jì)信息的情況。 根據(jù)本文,最常見的信息圖表有4種類型:

Infographics for each category with their occurrence percentage. [5]每個(gè)類別的圖表及其出現(xiàn)百分比。 [5]
  • Statistical-based: Infographics containing charts, pictographs, etc. for presenting statistical information.

    基于統(tǒng)計(jì)的:包含圖表,象形文字等的信息圖表,用于呈現(xiàn)統(tǒng)計(jì)信息。
  • Timeline-based: Presenting timeline information.

    基于時(shí)間軸:顯示時(shí)間軸信息。
  • Process-based: Step by step action presentation.

    基于過程:分步操作演示。
  • Location-based: Showing information on a map.

    基于位置:在地圖上顯示信息。

Since, according to this research, around 50% of the infographics are statistical-based, and in that, around 45% are proportion-based, they only tried to create an automatic infographic generation system for this set of infographics. After that, the next step was to study different parts of the proportion-based information text. An example is shown below where they are trying to classify and extract pieces of information to be designed separately.

由于根據(jù)這項(xiàng)研究,大約50%的信息圖表是基于統(tǒng)計(jì)的 ,而其中大約45%是基于比例的 ,因此他們僅嘗試為這組圖表創(chuàng)建自動的信息圖表生成系統(tǒng)。 之后,下一步是研究基于比例的信息文本的不同部分。 下面顯示了一個(gè)示例,他們試圖對這些信息進(jìn)行分類和提取,以分別設(shè)計(jì)。

Classifying different parts of the proportion-based information. [5]對基于比例的信息的不同部分進(jìn)行分類。 [5]

Next up, the design space needs to be separated based on where different elements are to be placed. The researchers came up with 20 template designs where different elements could be placed based on the rules mentioned in the paper.

接下來,需要根據(jù)放置不同元素的位置來分隔設(shè)計(jì)空間。 研究人員提出了20種模板設(shè)計(jì),可以根據(jù)論文中提到的規(guī)則放置不同的元素。

Designing the infographic from a template. [5]從模板設(shè)計(jì)信息圖表。 [5] Rules that were considered while designing the templates. [5]設(shè)計(jì)模板時(shí)考慮的規(guī)則。 [5]

圖像圖表融合自動化[6] (Image Chart Fusion Automation [6])

Images with embedded charts. [6]帶有嵌入式圖表的圖像。 [6]

The last techniques in the list of automatic infographics generation are the techniques to design images containing chats, as shown in the above image. A survey on the photographic infographics showed the type of charts that are frequently used to present data embedded inside images [6]. These are Bar charts [41.2%], Pie charts [21.4%], Line charts [9.4%], and Scatterplots [2.2%]. Other than the charts, other ways of embedding this information are Single Divided Object: where the graphics are divided into smaller parts along a horizontal/vertical axis and the area of these divisions can be based on the ratio of different quantities we are trying to compare. This is followed by Multiple Resized Objects where the objects inside an image are sized according to the data they are trying to portray. Using the information about how and where the information is represented, researchers generally follow the pipeline shown below to generate final infographics.

自動信息圖表生成列表中的最后一種技術(shù)是設(shè)計(jì)包含聊天的圖像的技術(shù),如上圖所示。 對攝影信息圖表的一項(xiàng)調(diào)查顯示了圖表的類型,這些圖表通常用于展示嵌入圖像內(nèi)部的數(shù)據(jù)[6]。 這些是條形圖[41.2%],餅圖[21.4%],折線圖[9.4%]和散點(diǎn)圖[2.2%]。 除圖表外,其他嵌入此信息的方法是“ 單個(gè)對象劃分”:將圖形沿水平/垂直軸劃分為較小的部分,并且這些劃分的面積可以基于我們嘗試比較的不同數(shù)量的比率。 接下來是多個(gè)調(diào)整大小的對象 ,其中圖像內(nèi)的對象根據(jù)它們要描繪的數(shù)據(jù)進(jìn)行大小調(diào)整 。 使用有關(guān)如何以及在何處表示信息的信息,研究人員通常會按照以下所示的流程生成最終信息圖表。

The workflow of embedding charts into images. [6]將圖表嵌入圖像的工作流程。 [6]

So, from a given dataset, relevant variables are selected and the images corresponding to those variables are collected. When the user selects one of these images, then the charts are generated for the selected variables. These are to be embedded inside the selected images. At this stage, the user can either drag an area on the image to embed the chart on, or they can choose features from that image (for eg. Hough lines) to use as an anchor to overlay charts on these images.

因此,從給定的數(shù)據(jù)集中選擇相關(guān)變量,并收集與這些變量相對應(yīng)的圖像。 當(dāng)用戶選擇這些圖像之一時(shí),將為所選變量生成圖表。 這些將嵌入到所選圖像中。 在此階段,用戶可以在圖像上拖動一個(gè)區(qū)域以將圖表嵌入其中,也可以從該圖像中選擇要素(例如,霍夫線)作為錨點(diǎn)以將圖表覆蓋在這些圖像上。

Chart embedding with the masking technique. [6]使用掩膜技術(shù)嵌入圖表。 [6] Chart embedding based on image features (Hough lines in this case). [6]基于圖像特征的圖表嵌入(在這種情況下為粗線)。 [6]

Overall, it is reasonable to represent “Trends/Timeline Data (Line Charts)” with Hugh Lines and “Pie Charts/Bar Charts, etc. ” with a masking technique. To fine-tune these embeddings, there are different types of distortions that can be calculated for each type of chart. For example, comparing the slope of the high lines and the line chart can give an estimated distortion of how well the line chart is embedded in the image. These values are used to optimize the fit of the charts on the images to generate aesthetic info-images. And finally, all of this is implemented in an interface where the users can use their domain knowledge or designing skills to fine-tune these automatically generated results.

總體而言,用屏蔽線表示“趨勢/時(shí)間線數(shù)據(jù)(折線圖)”和“休線”和“餅圖/條形圖等”是合理的。 為了微調(diào)這些嵌入,可以為每種圖表類型計(jì)算不同類型的失真。 例如,比較高線的斜率和折線圖可以給出折線圖在圖像中嵌入程度的估計(jì)失真。 這些值用于優(yōu)化圖表在圖像上的擬合度,以生成美觀的信息圖像。 最后,所有這些都在一個(gè)界面中實(shí)現(xiàn),用戶可以在其中使用他們的領(lǐng)域知識或設(shè)計(jì)技能來微調(diào)這些自動生成的結(jié)果。

結(jié)論 (Conclusion)

We discussed methods for generating infographics on different types of datasets: Timeline, Icons, Text, and Charts. All of these methods focus on a certain aspect of infographics focusing on the type of data they are trying to represent. These cues are generally an outcome of a survey of already existing infographics and then use that information to automate the process. This is still a new research area with a very promising future. The future direction of research can be to explore more variety of infographics and then combine the existing techniques with the new techniques to create a more holistic, generalized technique to automate/semi-automate this tedious process of infographics generation.

我們討論了在不同類型的數(shù)據(jù)集上生成圖表的方法:時(shí)間線,圖標(biāo),文本和圖表。 所有這些方法都集中在信息圖形的某個(gè)方面,集中在它們試圖表示的數(shù)據(jù)類型上。 這些提示通常是對已經(jīng)存在的信息圖表進(jìn)行調(diào)查的結(jié)果,然后使用該信息來自動化流程。 這仍然是一個(gè)新的研究領(lǐng)域,前景光明。 未來的研究方向可以是探索更多種類的信息圖表,然后將現(xiàn)有技術(shù)與新技術(shù)結(jié)合以創(chuàng)建更全面,通用的技術(shù)來自動化/半自動化這種繁瑣的信息圖表生成過程。

翻譯自: https://towardsdatascience.com/information-organization-with-infographics-using-machine-learning-a-survey-54b2169c1f21

27個(gè)機(jī)器學(xué)習(xí)圖表翻譯

總結(jié)

以上是生活随笔為你收集整理的27个机器学习图表翻译_使用机器学习的信息图表信息组织的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。