當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

ffmpeg api的应用——提取视频图片

發布時間：2023/11/27 生活经验 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 ffmpeg api的应用——提取视频图片小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? ? ? 這些年來，“短視頻”吸引了無數網民的注意。相對于豐富有趣的內容，我們碼農可能更關心其底層技術實現。本系列文章將結合ffmpeg，講解幾則視頻處理案例。（轉載請指明出于breaksoftware的csdn博客）

? ? ? ? “短視頻”都是以“文件"的形式保存于服務器上。任何一個便于傳播的文件都會有一種定義良好的格式，同樣視頻也有其格式。這系列我們不會去從微觀的角度去分析這些格式，因為其應用意義不是很大。我們將從宏觀角度去分析，視頻文件應該包含哪些信息？

? ? ? ? 能確定的是，大部分情況下，我們可以使用眼睛看到“圖像”，使用耳朵聽到“聲音”。如果我們關閉其中任意一個器官，就將停止接受對應的信息；而沒有關閉的器官還和之前一樣接受信息，不受影響。

? ? ? ? 所以目前至少我們可以把視頻分為：圖像和聲音兩個模塊。那這兩個模塊是怎么組合的？是不是一個極短時間內的圖像和聲音（比如我們此時此刻正看到的圖像和聽到的聲音）融合在一個“區塊”中？

? ? ? ? 從設計的角度說，“耦合”是非常不好的。如果將圖像和聲音信息融合在一個“區塊”中，就是一種很強的“耦合”。一種良好的設計就像我們小時候在電影院看的電影文件（不知道現在電影播放的原理）：一個文件用于播放圖像，一個文件用于播放聲音。這樣我們可以配一個普通話版，一個英語版、一個法語版……的音頻文件，而不用去修改播放的圖像文件。但是我們在PC上看到的視頻文件是一個獨立文件，那是怎么搞的？

? ? ? ? 于是在設計就要在“易用”和“可維護”之間做個平衡：宏觀層面融合圖像和聲音文件，微觀層面圖像和聲音信息是分離的。對應到ffmpeg上來說就是：

圖像文件和聲音文件分別是一個流——AVStream結構；
圖像文件和聲音文件微觀分離體現在它們都是獨立的包——AVPacket；
圖像文件和聲音文件宏觀融合是通過“視音頻復用器——Muxer”融合的；

? ? ? ? 以ffmpeg4.0.2版本的API為例

void get_video_pictures(const char* file_path) {std::unique_ptr<AVFormatContext, std::function<void(AVFormatContext*)>> avfmt_ctx_t(avformat_alloc_context(),[](AVFormatContext *s) {if (s) {avformat_close_input(&s);}});AVFormatContext* && avfmt_ctx = avfmt_ctx_t.get();if (avformat_open_input(&avfmt_ctx, file_path, NULL, NULL)) {std::cerr << "avformat_open_input error";return;}

? ? ? ? 首先我們需要構造一個AVFormatContext對象，它用于承載我們分析文件的上下文。Context（上下文）這個概念在ffmpeg中非常重要，我們可以通過它的一些參數干預ffmpeg底層的行為，還可以通過它獲得對應層面的信息。之后我們會遇到各種Context。這類Context的使用有比較固定的套路：

使用XXXXX_alloc_context分配空間。AVFormatContext對應的就是avformat_alloc_context。
使用XXXXX_openXXX初始化。AVFormatContext對應的就是avformat_open_input。
使用XXXXX_free_context釋放空間。AVFormatContext對應的就是avformat_free_context。由于avformat_close_input包含了更多的釋放操作，且其底層也會調用avformat_free_context，所以此處我們使用了它。

? ? ? ??AVFormatContext有個兩個和“流”——AVStream相關的信息:nb_streams和streams。后者是一個AVStream數組的首地址，前者是該數組的元素個數。我們可以遍歷所有流

    for (unsigned int i = 0; i < avfmt_ctx->nb_streams; i++) {AVStream *st = avfmt_ctx->streams[i];

? ? ? ? 之前我們談到，圖像和聲音分別屬于不同的流，于是我們可以通過AVStream::codecpar::codec_type辨別流

enum AVMediaType {AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATAAVMEDIA_TYPE_VIDEO,AVMEDIA_TYPE_AUDIO,AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuousAVMEDIA_TYPE_SUBTITLE,AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparseAVMEDIA_TYPE_NB
};

? ? ? ? 在這組枚舉類型中，我們還看到AVMEDIA_TYPE_SUBTITLE，它是“字幕流”類型。可以見得，字幕并不是刻印在圖像上的。在現實生活中，我們在播放器中可以選擇不同的字幕，不同的語言配音（英文/中文），這些都是以流的形式保存在視頻文件這個容器中的，而且它們還可以是多份的。比如中文配音是一個流，英文配音是一個流，中文字幕是一個流，英文字幕是一個流。

? ? ? ? 如本文標題，我們需要從圖像流中提取圖片，于是切入AVMEDIA_TYPE_VIDEO類型的流進行操作

        if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {std::unique_ptr<AVCodecContext, std::function<void(AVCodecContext*)>> avcodec_ctx(avcodec_alloc_context3(NULL),[](AVCodecContext *avctx) {if (avctx) {avcodec_free_context(&avctx);}});if (0 > avcodec_parameters_to_context(avcodec_ctx.get(), st->codecpar)) {std::cerr << "avcodec_parameters_to_context error.stream " << i;continue;}AVCodec *avcodec = avcodec_find_decoder(avcodec_ctx->codec_id);if (avcodec_open2(avcodec_ctx.get(), avcodec, NULL) < 0) {std::cerr << "Failed to open codec" << std::endl;continue;}save_video_pic(avfmt_ctx, i, avcodec_ctx.get());}}
}

? ? ? ? 對于每個流，也有其自身的格式。我們需要使用解碼器對該流進行解碼分析，所以這次會涉及到AVCodecContext結構。和之前的Context使用套路一致：

使用avcodec_alloc_context3申請空間；
使用avcodec_free_context釋放空間；
通過avcodec_parameters_to_context以流中解碼器信息初始化；
通過avcodec_find_decoder找到對應的解碼器；
使用avcodec_open2和上述找到的解碼器，打開這個上下文；

? ? ? ? 這次我們沒有使用avcodec_open2對應的avcodec_close方法，是因為該方法在4.0.2版本中被聲明為“即將廢棄”

/*** Close a given AVCodecContext and free all the data associated with it* (but not the AVCodecContext itself).** Calling this function on an AVCodecContext that hasn't been opened will free* the codec-specific data allocated in avcodec_alloc_context3() with a non-NULL* codec. Subsequent calls will do nothing.** @note Do not use this function. Use avcodec_free_context() to destroy a* codec context (either open or closed). Opening and closing a codec context* multiple times is not supported anymore -- use multiple codec contexts* instead.*/
int avcodec_close(AVCodecContext *avctx);

? ? ? ? 類似的，我們沒有直接使用AVSteam中的AVCodecContext *codec，也是因為它“即將廢棄”

    attribute_deprecatedAVCodecContext *codec;

? ? ? ? 通過avcodec_open2打開一個和解碼器相關的上下文后，我們就可以開始解碼了。在這之前需要熟悉兩個比較微觀的結構——AVPacket和AVFrame。AVPacket是編碼后（未解碼）的數據結構，AVFrame是編碼前（未編碼）的結構。所以我們從一個視頻文件中，通過av_read_frame讀出來的是一個尚未解碼的數據——AVPacket。

void save_video_pic(AVFormatContext *avfmt_ctx, int stream_index, AVCodecContext *avcodec_ctx) {int err = av_seek_frame(avfmt_ctx, -1, avfmt_ctx->start_time, 0);do {std::unique_ptr<AVPacket, std::function<void(AVPacket*)>> avpacket_src(av_packet_alloc(), [](AVPacket *pkt) {if (pkt) {av_packet_free(&pkt);}});av_init_packet(avpacket_src.get());if (av_read_frame(avfmt_ctx, avpacket_src.get()) < 0) {break;}if (avpacket_src->stream_index != stream_index) {continue;}

? ? ? ? 注意第16行，它通過判斷讀出來的AVPacket的stream_index是否為之前分析出來的視頻流下標，決定是否繼續執行。這個流程說明不同流的AVPacket在文件中可以是穿插分布的。這種設計存在一定的合理性。因為在同一時刻，圖像、聲音、字幕等都要展現出來，順序性讀取并解析可以減少頻繁的跳轉。

? ? ? ? 因為編解碼過程比較類似，我將過程中結果保存組織在一個模板類中

template<typename Component>
class AvComponentStore {
public:virtual void save(Component *d) = 0;
};template<typename Component>
class TransStore :public AvComponentStore<Component>
{
public:TransStore(std::function<Component*(const Component*)> clone, std::function<void(Component**)> free) {_clone = clone;_free = free;}~TransStore() {for (auto it = _store.begin(); it != _store.end(); it++) {if (*it) {_free(&*it);}}}
public:void traverse(std::function<void(Component*)> t) {if (!t) {return;}for (auto it = _store.begin(); it != _store.end(); it++) {if (*it) {t(*it);}}}
public:virtual void save(Component *d) {Component *p = _clone(d);_store.push_back(p);}
private:std::vector<Component*> _store;std::function<Component*(const Component*)> _clone;std::function<void(Component**)> _free;
};using PacketsStore = TransStore<AVPacket>;
using FramesStore = TransStore<AVFrame>;

? ? ? ? FrameStore用于保存AVPacket的解碼結果。對于中間產生的AVFrame結構，我們使用av_frame_clone深度拷貝。FrameStore對象釋放時，將通過av_frame_free釋放這些空間和資源。

        std::shared_ptr<FramesStore> frames_store = std::make_shared<FramesStore>(av_frame_clone, av_frame_free);decode_packet(avcodec_ctx, avpacket_src.get(), frames_store);frames_store->traverse(traverse_frame);} while (true);
}

? ? ? ? 解碼AVPacket通過avcodec_send_packet和avcodec_receive_frame實現。從語義上說，我們將一個解碼前的數據發送給一個解碼器上下文，然后從這個解碼器上下文中獲得解碼后的數據。

int decode_packet(AVCodecContext *avctx, AVPacket *pkt, std::shared_ptr<FramesStore> store) {int ret = avcodec_send_packet(avctx, pkt);if (ret < 0 && ret != AVERROR_EOF) {return ret;}std::unique_ptr<AVFrame, std::function<void(AVFrame*)>> frame(av_frame_alloc(),[](AVFrame *frame) {if (frame) {av_frame_free(&frame);}});ret = avcodec_receive_frame(avctx, frame.get());if (ret >= 0) {store->save(frame.get());}else if (ret < 0 && ret != AVERROR(EAGAIN)) {return ret;}return 0;
}

? ? ? ? 對于每個解碼后的數據，我們需要通過圖片編碼器將其編碼成一個圖片文件。

? ? ? ? 和之前生成解碼器上下文相似，我們要構造一個編碼器上下文。這次我們要使用avcodec_find_encoder去尋找編碼器

void traverse_frame(AVFrame* avframe) {AVCodec *avcodec = avcodec_find_encoder(AV_CODEC_ID_MJPEG);

? ? ? ? 然后使用avcodec_open2去打開一個和該編碼器相關的上下文

    std::unique_ptr<AVCodecContext, std::function<void(AVCodecContext*)>> avcodec_ctx_output(avcodec_alloc_context3(avcodec),[](AVCodecContext *avctx) {if (avctx) {avcodec_free_context(&avctx);}});avcodec_ctx_output->width = avframe->width;avcodec_ctx_output->height = avframe->height;avcodec_ctx_output->time_base.num = 1;avcodec_ctx_output->time_base.den = 1000;avcodec_ctx_output->pix_fmt = AV_PIX_FMT_YUVJ420P;avcodec_ctx_output->codec_id = avcodec->id;avcodec_ctx_output->codec_type = AVMEDIA_TYPE_VIDEO;if (avcodec_open2(avcodec_ctx_output.get(), avcodec, nullptr) < 0) {std::cerr << "Failed to open codec" << std::endl;return;}

? ? ? ??encode_frame方法將把每個AVFrame打包成若干個AVPacket，并保存在PacketsStore對象中

    std::shared_ptr<PacketsStore> packets_store = std::make_shared<PacketsStore>(av_packet_clone, av_packet_free);if (encode_frame(avcodec_ctx_output.get(), avframe, packets_store) < 0) {std::cerr << "encode_frame error" << std::endl;return;}

? ? ? ? 編碼的過程使用avcodec_send_frame和avcodec_receive_packet方法。從語義上就是將一個解碼前的數據發送到一個編碼器上下文，然后從這個上下文中獲得編碼后的數據。

int encode_frame(AVCodecContext *c, AVFrame *frame, std::shared_ptr<PacketsStore> store) {int ret;int size = 0;std::unique_ptr<AVPacket, std::function<void(AVPacket*)>> pkt(av_packet_alloc(),[](AVPacket *pkt) {if (pkt) {av_packet_free(&pkt);}});av_init_packet(pkt.get());ret = avcodec_send_frame(c, frame);if (ret < 0) {return ret;}do {ret = avcodec_receive_packet(c, pkt.get());if (ret >= 0) {store->save(pkt.get());size += pkt->size;av_packet_unref(pkt.get());}else if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) {return ret;}} while (ret >= 0);return size;
}

? ? ? ? 在編碼完數據后，我們將其保存到一個文件中。

    std::string&& file_name = gen_pic_name(avframe);std::unique_ptr<std::FILE, std::function<int(FILE*)>> file(std::fopen(file_name.c_str(), "wb"), std::fclose);packets_store->traverse([&file](AVPacket* packet){fwrite(packet->data, 1, packet->size, file.get());});
}

總結

以上是生活随笔為你收集整理的ffmpeg api的应用——提取视频图片的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： bug诞生记——const_cast引发
下一篇：一套使用注入和Hook技术托管入口函数的