日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DataLoader 与 Dataset

發(fā)布時(shí)間:2025/4/5 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 DataLoader 与 Dataset 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、總體概覽

?

二、具體詳解

DataLoader源碼

class DataLoader(Generic[T_co]):r"""Data loader. Combines a dataset and a sampler, and provides an iterable overthe given dataset.The :class:`~torch.utils.data.DataLoader` supports both map-style anditerable-style datasets with single- or multi-process loading, customizingloading order and optional automatic batching (collation) and memory pinning.See :py:mod:`torch.utils.data` documentation page for more details.Arguments:dataset (Dataset): dataset from which to load the data.batch_size (int, optional): how many samples per batch to load(default: ``1``).shuffle (bool, optional): set to ``True`` to have the data reshuffledat every epoch (default: ``False``).sampler (Sampler or Iterable, optional): defines the strategy to drawsamples from the dataset. Can be any ``Iterable`` with ``__len__``implemented. If specified, :attr:`shuffle` must not be specified.batch_sampler (Sampler or Iterable, optional): like :attr:`sampler`, butreturns a batch of indices at a time. Mutually exclusive with:attr:`batch_size`, :attr:`shuffle`, :attr:`sampler`,and :attr:`drop_last`.num_workers (int, optional): how many subprocesses to use for dataloading. ``0`` means that the data will be loaded in the main process.(default: ``0``)collate_fn (callable, optional): merges a list of samples to form amini-batch of Tensor(s). Used when using batched loading from amap-style dataset.pin_memory (bool, optional): If ``True``, the data loader will copy Tensorsinto CUDA pinned memory before returning them. If your data elementsare a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,see the example below.drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,if the dataset size is not divisible by the batch size. If ``False`` andthe size of dataset is not divisible by the batch size, then the last batchwill be smaller. (default: ``False``)timeout (numeric, optional): if positive, the timeout value for collecting a batchfrom workers. Should always be non-negative. (default: ``0``)worker_init_fn (callable, optional): If not ``None``, this will be called on eachworker subprocess with the worker id (an int in ``[0, num_workers - 1]``) asinput, after seeding and before data loading. (default: ``None``)prefetch_factor (int, optional, keyword-only arg): Number of sample loadedin advance by each worker. ``2`` means there will be a total of2 * num_workers samples prefetched across all workers. (default: ``2``)persistent_workers (bool, optional): If ``True``, the data loader will not shutdownthe worker processes after a dataset has been consumed once. This allows to maintain the workers `Dataset` instances alive. (default: ``False``).. warning:: If the ``spawn`` start method is used, :attr:`worker_init_fn`cannot be an unpicklable object, e.g., a lambda function. See:ref:`multiprocessing-best-practices` on more details relatedto multiprocessing in PyTorch... warning:: ``len(dataloader)`` heuristic is based on the length of the sampler used.When :attr:`dataset` is an :class:`~torch.utils.data.IterableDataset`,it instead returns an estimate based on ``len(dataset) / batch_size``, with properrounding depending on :attr:`drop_last`, regardless of multi-process loadingconfigurations. This represents the best guess PyTorch can make because PyTorchtrusts user :attr:`dataset` code in correctly handling multi-processloading to avoid duplicate data.However, if sharding results in multiple workers having incomplete last batches,this estimate can still be inaccurate, because (1) an otherwise complete batch canbe broken into multiple ones and (2) more than one batch worth of samples can bedropped when :attr:`drop_last` is set. Unfortunately, PyTorch can not detect suchcases in general.See `Dataset Types`_ for more details on these two types of datasets and how:class:`~torch.utils.data.IterableDataset` interacts with`Multi-process data loading`_."""dataset: Dataset[T_co]batch_size: Optional[int]num_workers: intpin_memory: booldrop_last: booltimeout: floatsampler: Samplerprefetch_factor: int_iterator : Optional['_BaseDataLoaderIter']__initialized = Falsedef __init__(self, dataset: Dataset[T_co], batch_size: Optional[int] = 1,shuffle: bool = False, sampler: Optional[Sampler[int]] = None,batch_sampler: Optional[Sampler[Sequence[int]]] = None,num_workers: int = 0, collate_fn: _collate_fn_t = None,pin_memory: bool = False, drop_last: bool = False,timeout: float = 0, worker_init_fn: _worker_init_fn_t = None,multiprocessing_context=None, generator=None,*, prefetch_factor: int = 2,persistent_workers: bool = False):torch._C._log_api_usage_once("python.data_loader") # type: ignoreif num_workers < 0:raise ValueError('num_workers option should be non-negative; ''use num_workers=0 to disable multiprocessing.')if timeout < 0:raise ValueError('timeout option should be non-negative')if num_workers == 0 and prefetch_factor != 2:raise ValueError('prefetch_factor option could only be specified in multiprocessing.''let num_workers > 0 to enable multiprocessing.')assert prefetch_factor > 0if persistent_workers and num_workers == 0:raise ValueError('persistent_workers option needs num_workers > 0')self.dataset = datasetself.num_workers = num_workersself.prefetch_factor = prefetch_factorself.pin_memory = pin_memoryself.timeout = timeoutself.worker_init_fn = worker_init_fnself.multiprocessing_context = multiprocessing_context# Arg-check dataset related before checking samplers because we want to# tell users that iterable-style datasets are incompatible with custom# samplers first, so that they don't learn that this combo doesn't work# after spending time fixing the custom sampler errors.if isinstance(dataset, IterableDataset):self._dataset_kind = _DatasetKind.Iterable# NOTE [ Custom Samplers and IterableDataset ]## `IterableDataset` does not support custom `batch_sampler` or# `sampler` since the key is irrelevant (unless we support# generator-style dataset one day...).## For `sampler`, we always create a dummy sampler. This is an# infinite sampler even when the dataset may have an implemented# finite `__len__` because in multi-process data loading, naive# settings will return duplicated data (which may be desired), and# thus using a sampler with length matching that of dataset will# cause data lost (you may have duplicates of the first couple# batches, but never see anything afterwards). Therefore,# `Iterabledataset` always uses an infinite sampler, an instance of# `_InfiniteConstantSampler` defined above.## A custom `batch_sampler` essentially only controls the batch size.# However, it is unclear how useful it would be since an iterable-style# dataset can handle that within itself. Moreover, it is pointless# in multi-process data loading as the assignment order of batches# to workers is an implementation detail so users can not control# how to batchify each worker's iterable. Thus, we disable this# option. If this turns out to be useful in future, we can re-enable# this, and support custom samplers that specify the assignments to# specific workers.if shuffle is not False:raise ValueError("DataLoader with IterableDataset: expected unspecified ""shuffle option, but got shuffle={}".format(shuffle))elif sampler is not None:# See NOTE [ Custom Samplers and IterableDataset ]raise ValueError("DataLoader with IterableDataset: expected unspecified ""sampler option, but got sampler={}".format(sampler))elif batch_sampler is not None:# See NOTE [ Custom Samplers and IterableDataset ]raise ValueError("DataLoader with IterableDataset: expected unspecified ""batch_sampler option, but got batch_sampler={}".format(batch_sampler))else:self._dataset_kind = _DatasetKind.Mapif sampler is not None and shuffle:raise ValueError('sampler option is mutually exclusive with ''shuffle')if batch_sampler is not None:# auto_collation with custom batch_samplerif batch_size != 1 or shuffle or sampler is not None or drop_last:raise ValueError('batch_sampler option is mutually exclusive ''with batch_size, shuffle, sampler, and ''drop_last')batch_size = Nonedrop_last = Falseelif batch_size is None:# no auto_collationif drop_last:raise ValueError('batch_size=None option disables auto-batching ''and is mutually exclusive with drop_last')if sampler is None: # give default samplersif self._dataset_kind == _DatasetKind.Iterable:# See NOTE [ Custom Samplers and IterableDataset ]sampler = _InfiniteConstantSampler()else: # map-styleif shuffle:# Cannot statically verify that dataset is Sized# Somewhat related: see NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]sampler = RandomSampler(dataset, generator=generator) # type: ignoreelse:sampler = SequentialSampler(dataset)if batch_size is not None and batch_sampler is None:# auto_collation without custom batch_samplerbatch_sampler = BatchSampler(sampler, batch_size, drop_last)self.batch_size = batch_sizeself.drop_last = drop_lastself.sampler = samplerself.batch_sampler = batch_samplerself.generator = generatorif collate_fn is None:if self._auto_collation:collate_fn = _utils.collate.default_collateelse:collate_fn = _utils.collate.default_convertself.collate_fn = collate_fnself.persistent_workers = persistent_workersself.__initialized = Trueself._IterableDataset_len_called = None # See NOTE [ IterableDataset and __len__ ]self._iterator = None

源碼傳入?yún)?shù)主要如下所示:

DataLoader(dataset, batch_size=1, # 每一批數(shù)據(jù)大小shuffle=False, # sampler=None,batch_sampler=None,num_workers=0,collate_fn=None,pin_memory=False,drop_last=False,timeout=0,worker_init_fn=None,multiprocessing_context=None)# 功能: 構(gòu)建可迭代的數(shù)據(jù)裝載器# dataset: Dataset類,決定數(shù)據(jù)從哪讀取以及如何讀取 # batchsize: 批大小 # num_works: 是否多進(jìn)程讀取數(shù)據(jù) # shuffle: 每個(gè)epoch是否亂序 # drop_list: 當(dāng)樣本數(shù)不能被batchsize整除時(shí),是否舍棄最后一批數(shù)據(jù)# Epoch: 所有訓(xùn)練樣本都以輸入到模型中,稱為一個(gè)Epoch # Iteration: 一批樣本輸入到模型中,為一個(gè)Iteration # Batchsize: 批大小,主要是決定一個(gè)Epoch有多少個(gè)Iteration樣本81, Batchsize=8;1 Epoch = 10 drop_last=True 1 Epoch = 11 drop_last=False Datasettorch.utils.data.Dataset功能: Dataset抽象類,所有自定義的Dataset需要繼承它,并且復(fù)寫getitem: 接收一個(gè)索引,返回一個(gè)樣本class Dataset(Generic[T_co]):r"""An abstract class representing a :class:`Dataset`.All datasets that represent a map from keys to data samples should subclassit. All subclasses should overwrite :meth:`__getitem__`, supporting fetching adata sample for a given key. Subclasses could also optionally overwrite:meth:`__len__`, which is expected to return the size of the dataset by many:class:`~torch.utils.data.Sampler` implementations and the default optionsof :class:`~torch.utils.data.DataLoader`... note:::class:`~torch.utils.data.DataLoader` by default constructs a indexsampler that yields integral indices. To make it work with a map-styledataset with non-integral indices/keys, a custom sampler must be provided."""def __getitem__(self, index) -> T_co:raise NotImplementedErrordef __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':return ConcatDataset([self, other])# 例子 class Dataset(object):def __getitem__(self, index):path_img, label = self.data_info[index]img = Image.open(path_img).convert('RGB') # 0~255if self.transform is not None:img = self.transform(img)return img, label

1. 讀那些數(shù)據(jù) - Sampler輸出的Index

2. 從哪讀數(shù)據(jù) - Dataset中的data_dir

3. 怎么讀數(shù)據(jù) - Dataset中的getitem

總結(jié)

以上是生活随笔為你收集整理的DataLoader 与 Dataset的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 国产96在线 | 亚洲 | 三男一女吃奶添下面 | 黄色片一区二区三区 | 俺也来俺也去俺也射 | 国产农村妇女毛片精品 | 日韩精品一区三区 | 亚州精品国产精品乱码不99按摩 | 爆操白虎逼 | 91视频h| 自由 日本语 热 亚洲人 | 色综合图片区 | 少妇呻吟视频 | 在线视频免费播放 | 国产精品16p | 久久在线精品视频 | 综合久久伊人 | 中文字幕av在线播放 | 中文字幕一区二区三区人妻四季 | 制服.丝袜.亚洲.中文.综合懂色 | 黄色仓库av | 色爱成人综合 | 潮喷失禁大喷水aⅴ无码 | 青春草在线视频免费观看 | 久草欧美视频 | 综合五月激情 | 国产裸体美女永久免费无遮挡 | 白丝美女喷水 | 妺妺窝人体色777777 | 国产欧美亚洲一区 | 日本免费观看视频 | 曰批女人视频在线观看 | 波多野结衣视频免费观看 | 草草影院在线播放 | 三级av免费看| 久久久www免费人成人片 | 国产视频播放 | 在线能看的av | 国产精品1234| 天天色综合久久 | 日韩欧美一级在线 | 青草国产 | 欧美图片一区二区三区 | jizz处女| 婷婷精品 | www.五月婷 | 黄色正能量网站 | 丁香六月久久 | 青草青草久热 | 精品综合久久久久 | 午夜激情视频网站 | 亲子乱子伦xxxx | 看了下面会湿的视频 | 捆绑黑丝美女 | 国产区视频在线 | 天天色影网| 双乳被四个男人吃奶h文 | www.久久久 | 91麻豆精品视频 | 国产xxx69麻豆国语对白 | 国产精品视频在 | 久精品免费视频 | 日本欧美一级片 | 国产清纯白嫩初高中在线观看性色 | 久久精品人妻一区二区三区 | 91九色视频 | 综合久久久久久久久久久 | 福利视频一区二区三区 | 黄色同人网站 | 拔擦8x成人一区二区三区 | 免费在线观看日韩av | 国产一区二区三区免费在线观看 | 中文字幕一区二区不卡 | 亚洲成av人片在线观看无 | 色欲av伊人久久大香线蕉影院 | av影院在线观看 | 欧洲一区二区三区在线 | 337p亚洲精品色噜噜噜 | 男女福利视频 | 欧美性受xxxxxx黑人xyx性爽 | 国产麻豆精品久久一二三 | 99福利视频 | 涩涩国产| 五月天婷婷网站 | 91精品入口 | 精品久久久久久久久久久久久 | 农村黄色片 | 毛片大全免费看 | 中文字幕免费高清在线 | 久国产视频| 床戏高潮呻吟声片段 | 自拍偷在线精品自拍偷无码专区 | 性一交一乱一伧国产女士spa | 一级毛片基地 | 国产69页| 伊人一二三 | 久热久色 | 色多多在线视频 | 国产成人av一区二区三区不卡 | 久久538|