當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

recv返回值为0_基于GNES和Tensorflow 2.0的大规模视频语义搜索

發(fā)布時(shí)間：2024/9/19 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 recv返回值为0_基于GNES和Tensorflow 2.0的大规模视频语义搜索小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

BLOG ABOUT ARCHIVE SUBSCRIBE

https://github.com/gnes-ai/gnes?github.com

Background
項(xiàng)目背景
Nov 22, 2019 2019/11/22
Many people may know me from bert-as-service (and of course from Fashion-MNIST </bragging>). So when they first heard about my new project GNES: Generic Neural Elastic Search, people naturally think that I’m building a semantic text search solution. But actually, GNES has a more ambitious goal to become the next-generation semantic search engine for all content forms, including text, image, video and audio. In this post, I will show you how to use the latest GNES Flow API and Tensorflow 2.0 to build a video semantic search system. For the impatient, feel free to watch the teaser video below before continue reading.

很多朋友可能都在bert-as-service中了解到了肖涵老師的工作，所以很多朋友聽到了肖涵的新項(xiàng)目GNES的時(shí)候：通用信息彈性搜索的時(shí)候會(huì)感覺我是在做一個(gè)文本語義搜索框架，但是GNES是一個(gè)多模態(tài)的項(xiàng)目。項(xiàng)目涵蓋的所有內(nèi)容形式的語義搜索引擎，包括文本、圖像、視頻和音頻。在本文中，我將向您展示如何使用最新的GNES流API和Tensorflow 2.0構(gòu)建視頻語義搜索系統(tǒng)。對于希望知道效果的朋友，在繼續(xù)閱讀之前，請隨意觀看下面的有趣的視頻。

Formulating the Problem in GNES Framework
Preprocessing Videos
Encoding Chunks into Vectors
Indexing Chunks and Documents
Scoring Results
Putting it All Together
- What Should We Send/Receive?

Summary

I plan to have a series on the topic of video semantic search using GNES. This article serves as the first part. Readers who are looking for benchmarking, evaluations and models comparision, stay tuned and feel free to subscribe to my Blog.

我計(jì)劃有一個(gè)系列的主題視頻語義搜索使用GNES。

本文作為第一部分。如果你在尋找基準(zhǔn)（benchmarking），評(píng)估（evaluations ）和模型比較，保持關(guān)注，并隨時(shí)訂閱肖涵老師的博客。
Formulating the Problem in GNES Framework

在GNES框架中構(gòu)造問題
The data we are using is Tumblr GIF (TGIF) dataset, which contains 100K animated GIFs and 120K sentences describing visual contents. Our problem is the following: given a video database and a query video, find the top-k semantically related videos from the database.

本文使用的數(shù)據(jù)是Tumblr GIF（TGIF）數(shù)據(jù)集，其中包含100K個(gè)動(dòng)畫GIF和120K個(gè)描述視覺內(nèi)容的句子。我們的問題是：給定一個(gè)視頻數(shù)據(jù)庫和一個(gè)查詢視頻，從數(shù)據(jù)庫中找到top-k語義相關(guān)（針對矢量化信息進(jìn)行排序）的視頻。

a woman in a car is singing.一個(gè)坐在車?yán)锍璧呐ⅰ?p>

A well-dressed young guy with gelled red hair glides across a room and scans it with his eyes.一個(gè)穿著考究、頭發(fā)呈膠狀的年輕人滑過一個(gè)房間，用眼睛掃視。

一個(gè)穿著西裝的男人對著遠(yuǎn)處的東西微笑。a man wearing a suit smiles at something in the distance.

“Semantic” is a casual and ambiguous word, I know. Depending on your applications and scenarios, it could mean motion-wise similar (sports video), emotional similar (e.g. memes), etc. Right now I will just consider semantically-related as as visually similar.

“語義”是一個(gè)隨意而含糊不清的詞，我知道。根據(jù)你的應(yīng)用程序和場景，它可能意味著動(dòng)作相似（運(yùn)動(dòng)視頻），情感相似（比如模因），等等。現(xiàn)在我只會(huì)把語義相關(guān)看作視覺相似。

Text descriptions of the videos, though potentially can be very useful, are ignored at the moment. We are not building a cross-modality search solution (e.g. from text to video or vice versa), we also do not leverage textual information when building the video search solution. Nonetheless, those text descriptions can be used to evaluate/compare the effectiveness of the system in a quantitative manner.

視頻的文本描述雖然可能非常有用，但目前被忽略。我們沒有構(gòu)建跨模態(tài)搜索解決方案（例如，從文本到視頻或反之亦然），因此在構(gòu)建視頻搜索解決方案時(shí)，我們不提供文本信息。然而，這些文本描述可用于定量地評(píng)估/比較系統(tǒng)的有效性。

Putting the problem into GNES framework, this breaks down into the following steps:

將問題（視頻信息）放入GNES框架中，可以分解為以下步驟：

Index time
索引部分

segment each video into workable semantic units (aka “Chunk” in GNES);

encode each chunk as a fixed-length vector;

store all vector representations in a vector database.

1.將每個(gè)視頻分割成可操作的語義單元（在GNES中稱為“塊”）；

2.將每個(gè)塊編碼為固定長度的向量；

3.將所有矢量表示存儲(chǔ)在矢量數(shù)據(jù)庫中。Query time
查詢部分

do steps 1,2 in the index time for each incoming query;

retrieve relevant chunks from database;

aggregate the chunk-level score back to document-level;

return the top-k results to users.

1.在每個(gè)傳入查詢的索引需要執(zhí)行執(zhí)行索引部分的步驟1、2；

2.從數(shù)據(jù)庫中檢索相關(guān)塊；

3.將塊級(jí)別的分?jǐn)?shù)聚合回文檔級(jí)別；

4.將top-k結(jié)果返回給用戶。

If you find these steps hard to follow, then please first read this blog post to understand the philosophy behind GNES. These steps can be accomplished by using the preprocessor, encoder, indexer and router microservices in GNES. Before we dig into the concrete design of each service, we can first write down these two runtimes using the GNES Flow API.

如果你覺得這些步驟（代碼規(guī)范）很難遵循，那么請先閱讀這篇博文，了解GNES背后的哲學(xué)（設(shè)計(jì)思想）。這些步驟可以通過在GNES中使用預(yù)處理器、編碼器、索引器和路由器微服務(wù)來完成。在深入研究每個(gè)服務(wù)的具體設(shè)計(jì)之前，我們可以先使用GNES流API寫下這兩個(gè)運(yùn)行時(shí)。

num_rep = 1index_flow = (Flow().add_preprocessor(name='chunk_proc', replicas=num_rep).add_indexer(name='doc_idx').add_encoder(replicas=num_rep, recv_from='chunk_proc').add_indexer(name='vec_idx').add_router(name='sync_barrier', yaml_path='BaseReduceRouter',num_part=2, recv_from=['vec_idx', 'doc_idx']))query_flow = (Flow().add_preprocessor(name='chunk_proc', replicas=num_rep).add_encoder(replicas=num_rep).add_indexer(name='vec_idx').add_router(name='scorer').add_indexer(name='doc_idx', sorted_response='descend'))

One can visualize these flows by flow.build(backend=None).to_url(), which gives:

我們可以通過flow.build(backend=None).to_url()
Index flow
索引過程

Query flow
查詢過程

More usages and specifications of GNES Flow API can be found in this post. We are now moving forward to the concrete logic behind each component.

更多的GNES流API的用法和規(guī)范可以在本文中找到。我們現(xiàn)在正朝著每個(gè)構(gòu)件后面的更好的代碼可讀性前進(jìn)。
Preprocessing Videos

處理視頻
In the previous post, I stated that a good neural search is only possible when document and query are comparable semantic units. The preprocessor serves exactly this purpose. It segments a document into a list of semantic units, each of which is called a “chunk” in GNES. For video, a meaningful unary chunk could a frame or a shot (i.e. a series of frames that runs for an uninterrupted period of time). In Tumblr GIF dataset, most of the animations have less than three shots. Thus, I will simply use frame as chunk to represent the document.

在上一篇文章中，小寒老師指出只有當(dāng)文檔和查詢是可比較的語義單元時(shí)，才有可能進(jìn)行良好的神經(jīng)搜索。預(yù)處理器正是為了這個(gè)目的。它將文檔分割成一個(gè)語義單元列表，每個(gè)語義單元在GNES中稱為“塊”。對于視頻，一個(gè)有意義的一元塊可以是一個(gè)幀或一個(gè)鏡頭（即一系列連續(xù)運(yùn)行一段時(shí)間的幀）。在Tumblr GIF數(shù)據(jù)集中，大多數(shù)動(dòng)畫的鏡頭數(shù)都少于三個(gè)。因此，我將簡單地使用frame作為塊來表示文檔。
GNES itself does not contain such preprocessor (implementing all possible preprocessors/encoders is also not the design philosophy of GNES), so we need to write our own. Thanks to the well-designed GNES component API, this can be easily done by inheriting from the BaseImagePreprocessor and implement apply(), for example:

GNES本身不包含這樣的預(yù)處理器（實(shí)現(xiàn)所有可能的預(yù)處理器/編碼器也不是GNES的設(shè)計(jì)理念），所以我們需要自己編寫。由于設(shè)計(jì)良好的GNES組件API，可以通過從BaseImagePreprocessor類繼承并實(shí)現(xiàn)apply（）來輕松完成，例如：

from gnes.component import BaseImagePreprocessor from gnes.proto import array2blobclass GifPreprocessor(BaseImagePreprocessor):img_shape = 96def apply(self, doc: 'gnes_pb2.Document') -> None:super().apply(doc)im = Image.open(doc.raw_bytes.decode())idx = 0for frame in get_frames(im):try:new_frame = frame.convert('RGB').resize([img_shape, ] * 2)img = (np.array(new_frame) / 255).astype(np.float32)c = doc.chunks.add()c.doc_id = doc.doc_idc.offset = idxc.weight = 1.c.blob.CopyFrom(array2blob(img))except Exception as ex:self.logger.error(ex)finally:idx = idx + 1

This preprocessor loads the animation, reads its frames into RGB format, resizes each of them to 96x96 and stores indoc.chunks.blob as numpy.ndarray. At the moment we don’t implement any keyframe detection in the preprocessor, so every chunk has a uniform weight, i.e. c.weight=1.
此預(yù)處理器加載動(dòng)畫，將其幀讀取為RGB格式，將每個(gè)幀的大小調(diào)整為96x96，并將indoc.chunks.blob存儲(chǔ)為numpy.ndarray。目前我們沒有在預(yù)處理器中實(shí)現(xiàn)任何關(guān)鍵幀檢測，因此每個(gè)塊都有一個(gè)統(tǒng)一的權(quán)重，即c.weight=1。（一旦我們有了觀看流量的數(shù)據(jù)這個(gè)位置的權(quán)重我認(rèn)為就可以用場景中的觀看流量進(jìn)行加成)

One may think of more sophisticated preprocessors. For example, smart sub-sampling to reduce the number of near-duplicated frames; using seam carving for better cropping and resizing frames; or adding image effects and enhancements. Everything is possible and I will leave these possibilities to the readers.

人們可能會(huì)想到更復(fù)雜的預(yù)處理器。例如，智能子采樣可減少幾乎重復(fù)的幀數(shù)；使用接縫雕刻可更好地裁剪和調(diào)整幀大小；或添加圖像效果和增強(qiáng)功能。一切皆有可能，我將把這些可能性留給讀者。
Encoding Chunks into Vectors

將塊編碼為向量
In the encoding step, we want to represent each chunk by a fixed-length vector. This can be easily done with the pretrained models in Tensorflow 2.0. For the sake of clarity and simplicity, we will employ MobileNetV2 as our encoder. The pretrained weights on ImageNet are downloaded automatically when instantiating the encoder in post_init. The full list of pretrained models can be found at here.
在編碼步驟中，我們希望用一個(gè)固定長度的向量來表示每個(gè)塊。這可以通過Tensorflow 2.0中的預(yù)訓(xùn)練模型輕松完成。為了簡潔明了，我們將使用MobileNetV2作為編碼器。在post_init中實(shí)例化編碼器時(shí)，ImageNet上的預(yù)訓(xùn)練權(quán)重會(huì)自動(dòng)下載。在這里可以找到完整的預(yù)訓(xùn)練模型列表。

from gnes.component import BaseImageEncoder from gnes.helper import batching, as_numpy_arrayclass TF2ImageEncoder(BaseImageEncoder):batch_size = 128img_shape = 96pooling_strategy = 'avg',model_name = 'MobileNetV2'def post_init(self):self.model = getattr(tf.keras.applications, self.model_name)(input_shape=(self.img_shape, self.img_shape, 3),include_top=False,pooling=self.pool_strategy,weights='imagenet')self.model.trainable = False@batching@as_numpy_arraydef encode(self, img: List['np.ndarray'], *args, **kwargs) -> np.ndarray:img = np.stack(img, axis=0)return self.model(img)

Code should be fairly straightforward. I create a new encoder class by inherit from BaseImageEncoder, in which the most important function encode() is simply calling the model to extract features. The batching decorator is a very handy helper to control the size of the data flowing into the encoder. After all, OOM error is the last thing you want to see.

代碼應(yīng)該相當(dāng)簡單。我通過繼承BaseImageEncoder創(chuàng)建了一個(gè)新的編碼器類，其中最重要的函數(shù)encode（）只是調(diào)用模型來提取特性。批處理裝飾器是控制流入編碼器的數(shù)據(jù)大小的非常方便的助手。畢竟，OOM錯(cuò)誤（顯存溢出，內(nèi)存溢出）是你最不想看到的。
Indexing Chunks and Documents

索引塊和文檔（塊是GNES中的基礎(chǔ)概念）
For indexing, I will use the built-in chunk indexers and document indexers of GNES. Chunk indexing is essentially vector indexing, we need to store a map of chunk ids and their corresponding vector representations. As GNES supports Faiss indexer already, you don’t need to write Python code anymore. Simply write a YAML config vec.ymlas follows:
對于索引，我將使用GNES的內(nèi)置塊索引器和文檔索引器。塊索引本質(zhì)上是矢量索引，我們需要存儲(chǔ)塊ID及其對應(yīng)的向量表示的映射。由于GNES已經(jīng)支持Faiss indexer，您不需要再編寫Python代碼了。簡單地編寫一個(gè)YAML config vec.ymlas如下：

!FaissIndexer parameters:num_dim: -1 # automatically determinedindex_key: HNSW32data_path: $WORKDIR/idx.binary gnes_config:name: my_vec_indexer # a customized namework_dir: $WORKDIR

As eventually in the query time, we are interested in documents not chunks, hence the map of doc id and chunk ids should be also stored. This is essentially a key-value database, and a simple Python Dict structure will do the job. Again, only a YAML config doc.yml is required:

最終在查詢時(shí)，我們對文檔而不是塊感興趣，因此還應(yīng)該存儲(chǔ)doc id和chunk id的映射。這實(shí)際上是一個(gè)鍵值數(shù)據(jù)庫，一個(gè)簡單的Python Dict結(jié)構(gòu)將完成這項(xiàng)工作。同樣，只需要一個(gè)YAML config doc.yml：

!DictIndexer gnes_config:name: my_doc_indexer # a customized namework_dir: $WORKDIR

Note that the doc indexer does not require the encoding step, thus it can be done in parallel with the chunk indexer. Notice how chunk_proc is broadcasting its output to the encoder and doc indexer, and how a sync barrier is placed afterwards to ensure all jobs are completed.
請注意，文檔索引器不需要編碼步驟，因此可以與塊索引器并行進(jìn)行。請注意chunk_proc如何將其輸出廣播到編碼器和文檔索引器，以及隨后如何放置同步屏障以確保所有作業(yè)都已完成。

Scoring Results

評(píng)分結(jié)果
Scoring is important but hard, it often requires domain-specific expertise and many iterations. You can simply take the average all chunk scores as the document score, or you can weight chunks differently and combine them with some heuristics. In the current GNES, scorer or ranker can be implemented by inheriting from BaseReduceRouter and overriding its apply method.

效果評(píng)價(jià)很重要但很難，它通常需要特定領(lǐng)域的專業(yè)知識(shí)和許多迭代。您可以簡單地將所有塊分?jǐn)?shù)的平均值作為文檔分?jǐn)?shù)，也可以對塊進(jìn)行不同的加權(quán)，并將它們與一些啟發(fā)式方法結(jié)合起來。在當(dāng)前的GNES中，scorer或ranker可以通過繼承BaseReduceRouter并重寫其apply方法來實(shí)現(xiàn)。
When designing your own score function, make sure to use the existing ones from gnes.score_fn.base as your basic building blocks. Stacking and combining these score functions can create a complicated yet explainable score function, greatly reducing the effort when debugging. Besides, all score functions from gnes.score_fn.base are trainable (via .train() method), enabling advanced scoring techniques such as learning to rank.

在設(shè)計(jì)自己的評(píng)分函數(shù)時(shí)，請確保使用gnes.score_fn.base中現(xiàn)有的那些作為基本的構(gòu)建塊。堆疊和組合這些評(píng)分函數(shù)可以創(chuàng)建一個(gè)復(fù)雜但可解釋的評(píng)分函數(shù)，大大減少調(diào)試時(shí)的工作量。此外，gnes.score_fn.base中的所有評(píng)分函數(shù)都是可訓(xùn)練的（via.train（）方法），可以使用高級(jí)評(píng)分技術(shù)，例如學(xué)習(xí)排名。

class ScoreOps:multiply = CombinedScoreFn('multiply')sum = CombinedScoreFn('sum')max = CombinedScoreFn('max')min = CombinedScoreFn('min')avg = CombinedScoreFn('avg')none = ModifierScoreFn('none')log = ModifierScoreFn('log')log1p = ModifierScoreFn('log1p')log2p = ModifierScoreFn('log2p')ln = ModifierScoreFn('ln')ln1p = ModifierScoreFn('ln1p')ln2p = ModifierScoreFn('ln2p')square = ModifierScoreFn('square')sqrt = ModifierScoreFn('sqrt')abs = ModifierScoreFn('abs')reciprocal = ModifierScoreFn('reciprocal')reciprocal1p = ModifierScoreFn('reciprocal1p')const = ConstScoreFn()

Putting it All Together

把所有的結(jié)果放在一起（合并結(jié)果）
With all the YAML config and Python module we just made, we can import them to the flow by specifying py_pathand yaml_path in the flow. Besides scale out the preprocessor and encoder to 4, I also make a small tweak in the flow: I added a thumbnail preprocessor thumbnail_proc to store all extracted frames in a row as a JPEG file.

對于我們剛剛創(chuàng)建的所有YAML config和Python模塊，我們可以通過在流中指定py_path和YAML_path將它們導(dǎo)入到流中。除了將預(yù)處理器和編碼器的比例縮小到4之外，我還在流程中做了一個(gè)小調(diào)整：我添加了一個(gè)縮略圖預(yù)處理器thumbnail_proc，將所有提取的幀作為JPEG文件存儲(chǔ)在一行中。

replicas = 4index_flow = (Flow().add_preprocessor(name='chunk_proc', yaml_path='gif2chunk.yml',py_path=['gif_reader.py', 'gif2chunk.py'],replicas=replicas).add_preprocessor(name='thumbnail_proc', yaml_path='chunks2jpg.yml', py_path='chunks2jpg.py', replicas=replicas).add_indexer(name='doc_idx', yaml_path='doc.yml').add_encoder(yaml_path='encode.yml', py_path='encode.py',replicas=replicas, recv_from='chunk_proc').add_indexer(name='vec_idx', yaml_path='vec.yml').add_router(name='sync_barrier', yaml_path='BaseReduceRouter',num_part=2, recv_from=['vec_idx', 'doc_idx']))query_flow = (Flow().add_preprocessor(name='chunk_proc', yaml_path='gif2chunk.yml',py_path=['gif_reader.py', 'gif2chunk.py'],replicas=replicas).add_preprocessor(name='thumbnail_proc', yaml_path='chunks2jpg.yml', py_path='chunks2jpg.py', replicas=replicas).add_encoder(yaml_path='encode.yml', py_path='encode.py', replicas=replicas, recv_from='chunk_proc').add_indexer(name='vec_idx', yaml_path='vec.yml').add_router(name='scorer', yaml_path='score.yml', py_path='videoscorer.py').add_indexer(name='doc_idx', yaml_path='doc.yml', sorted_response='descend').add_router(name='sync_barrier', yaml_path='BaseReduceRouter',num_part=2, recv_from=['thumbnail_proc', 'doc_idx']))

Visualizing these two flows give:

將這兩個(gè)流可視化可以得到：
Index flow
索引流（個(gè)人理解相當(dāng)于數(shù)據(jù)庫）

Query flow
查詢流（查詢張量）

What Should We Send/Receive?
Sending data to the flow is easy, simply build a Iterator[bytes] and feed to flow.index(). The example below get the absolute paths of all animation files and send those paths to the flow:

我們應(yīng)該發(fā)送/接收什么？
向流發(fā)送數(shù)據(jù)很簡單，只需構(gòu)建一個(gè)迭代器[bytes] （ Iterator[bytes] ）并將其饋送給flow.index（）。

下面的示例獲代碼取所有動(dòng)畫文件的絕對路徑并將這些路徑發(fā)送到流：

bytes_gen = (g.encode() for g in glob.glob('dataset/*.gif'))with index_flow.build(backend='process') as fl:fl.index(bytes_gen, batch_size=64)

Of course one can first read() the animation into the memory, and send binary animation directly to the flow. But that will give very poor efficiency. We do not want IO ops to be the bottleneck, and that’s why we spawn four preprocessors in the flow.

當(dāng)然，可以調(diào)用read()動(dòng)畫讀入內(nèi)存，然后直接將二進(jìn)制動(dòng)畫發(fā)送到流（張量）中。但這將導(dǎo)致非常低的效率。我們不希望IO操作成為瓶頸，這就是為什么我們在流中生成四個(gè)預(yù)處理器。

The indexing procedure is pretty fast. On my i7-8850H desktop with no GPU, indexing the full dataset (~100K videos) takes 4 hours. Things can be much faster if you have a powerful GPU.

索引過程非常快。在我的i7-8850H電腦上沒有GPU，索引完整的數(shù)據(jù)集（~100K視頻）需要4個(gè)小時(shí)。如果你有一個(gè)強(qiáng)大的GPU，事情會(huì)更快。

Once the flow is indexed, we can throw a video query in it and retrieve relevant videos. To do that, we randomly sample some videos as queries:

流被一次索引，我們就可以在其中拋出一個(gè)視頻查詢并檢索相關(guān)視頻。為此，我們隨機(jī)抽取一些視頻作為查詢：

bytes_gen = (g.encode() for g in random.sample(glob.glob(GIF_BLOB), num_docs)) with query_flow.build(backend='process') as fl:fl.query(bytes_gen, callback=dump_result_to_json, top_k=60, batch_size=32)

Note that callback=dump_result_to_json in the code. Every time a search result is returned, this callback function will be invoked. In this example, I simply dump the search result into the JSON format so that I can later visualize it in the web frontend.

注意，callback=dump_result_to_json在代碼中。每次返回搜索結(jié)果時(shí)，都會(huì)調(diào)用此回調(diào)函數(shù)。在本例中，我只是將搜索結(jié)果轉(zhuǎn)儲(chǔ)為JSON格式，以便以后可以在web前端可視化它。

fp = open('/topk.json', 'w', encoding='utf8')def dump_result_to_json(resp):resp = remove_envelope(resp)for r in resp.search.results:v = MessageToDict(r, including_default_value_fields=True)v['doc']['rawBytes'] = r.doc.raw_bytes.decode()for k, kk in zip(v['topkResults'], r.topk_results):k['doc']['rawBytes'] = kk.doc.raw_bytes.decode()k['score']['explained'] = json.loads(kk.score.explained)fp.write(json.dumps(v, sort_keys=True) + 'n')

Summary

總結(jié)
Video semantic search is not only fun (seriously I have spent even more time on watching cat videos after building this system), but has many usages in the customer facing applications, e.g. short videos apps, movie/film editors. Though it is too early to say GNES is the defacto solution to video semantic search, I hope this article sends a good signal: GNES is much more beyond bert-as-service, and it enables the search of almost any content form including text, image, video and audio.

視頻語義搜索不僅有趣（說真的，在構(gòu)建了這個(gè)系統(tǒng)之后，我花了更多的時(shí)間看貓視頻<cat videos:我認(rèn)為應(yīng)該是一個(gè)國外的短視頻信息流平臺(tái)>），而且在面向客戶的應(yīng)用程序中有很多用途，例如短視頻應(yīng)用程序、電影/電影編輯器。雖然現(xiàn)在說GNES是視頻語義搜索的實(shí)際解決方案還為時(shí)過早，但我希望本文能發(fā)出一個(gè)好的信號(hào)：GNES遠(yuǎn)遠(yuǎn)超出了bert的服務(wù)范圍，它能夠搜索幾乎所有的內(nèi)容形式，包括文本、圖像、視頻和音頻。

In the second part, I will use the token-based similarity of textual descriptions (e.g. Rouge-L) as the groundtruth to evaluate our video search system. I also plan to benchmark different pretrained models, preprocessors, indexers and their combinations. If you are interested in reading more on this thread or knowing more about my plan on GNES, stay tuned.

在第二部分中，我將使用基于標(biāo)記的文本描述相似度（如Rouge-L）作為基礎(chǔ)真理來評(píng)估我們的視頻搜索系統(tǒng)。我還計(jì)劃對不同的預(yù)訓(xùn)練模型、預(yù)處理器、索引器及其組合進(jìn)行基準(zhǔn)測試。如果你有興趣關(guān)于這條線索或了解更多關(guān)于我的GNES計(jì)劃，請繼續(xù)關(guān)注。

A Better Practice fo... ?

Like
Subscribe
Error: Comments Not Initialized
Write PreviewLogin with GitHub
Styling with Markdown is supportedPOST

? 2017 - 2019 Han Xiao. Opinions are solely my own.

添加微信17710158550 回復(fù)GNES 拉你進(jìn)群

需要算力的小伙伴可以關(guān)注一下openbeyes以及www.52lm.xyz

希望打比賽的朋友們可以通過鏈接

FlyAI-AI競賽服務(wù)平臺(tái)?www.flyai.com

參加眾多FlyAI-AI競賽服務(wù)平臺(tái)參

總結(jié)

以上是生活随笔為你收集整理的recv返回值为0_基于GNES和Tensorflow 2.0的大规模视频语义搜索的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python文件输出中文_【python
下一篇： html 上传文件_【实战篇】记一次文件