當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

scrapy模拟模拟点击_模拟大流行

發(fā)布時(shí)間：2023/11/29 编程问答 52 豆豆

生活随笔收集整理的這篇文章主要介紹了 scrapy模拟模拟点击_模拟大流行小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

scrapy模擬模擬點(diǎn)擊

復(fù)雜系統(tǒng) (Complex Systems)

In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give us a better understanding of their dynamics and allows us to simulate its behaviour under changing conditions. One way of modelling complex systems is by using agent-based models, meaning that we are explicitly simulating individuals and their interactions instead of deriving the dynamics of the system in an aggregate way.

在我們的日常生活中，我們會(huì)遇到許多復(fù)雜的系統(tǒng)，在這些系統(tǒng)中，人們彼此交互，例如股票市場或交通高峰時(shí)間。為這些復(fù)雜的系統(tǒng)找到合適的模型可以使我們對(duì)它們的動(dòng)力學(xué)有更好的理解，并使我們能夠模擬在變化條件下的行為。對(duì)復(fù)雜系統(tǒng)建模的一種方法是使用基于代理的模型，這意味著我們顯式地模擬個(gè)人及其交互，而不是以聚合的方式得出系統(tǒng)動(dòng)態(tài)。

In this post, we want to develop such an agent-based model using python. As an example, we try to model the behaviour of a pandemic. Please note that I am not at all an epidemiologist. The goal of this post is not to build a sophisticated model capable of making real life predictions, but rather to see how we can build a simple agent-based model and study some of the resulting dynamics. Let's start with some basic considerations.

在本文中，我們想使用python開發(fā)這種基于代理的模型。例如，我們嘗試對(duì)大流行的行為進(jìn)行建模。請(qǐng)注意，我根本不是流行病學(xué)家。這篇文章的目的不是要建立一個(gè)能夠做出真實(shí)生活預(yù)測的復(fù)雜模型，而是要了解我們?nèi)绾谓⒁粋€(gè)簡單的基于主體的模型并研究由此產(chǎn)生的動(dòng)力學(xué)。讓我們從一些基本考慮開始。

我們模型的基礎(chǔ) (Foundations of Our Model)

For our example we assume a non-lethal disease that may spread between individuals which were in contact with each other. The most basic approach is to consider three different groups:

對(duì)于我們的示例，我們假設(shè)一種非致命性疾病可能會(huì)在彼此接觸的個(gè)體之間傳播。最基本的方法是考慮三個(gè)不同的組：

Individuals that are not yet infected, called the susceptible group.

尚未感染的個(gè)體稱為易感人群。

Individuals that are infected and may spread the disease.

被感染并可能傳播疾病的個(gè)體。

Individuals that have recovered from the disease and are now immune.

已從疾病中恢復(fù)并且現(xiàn)在已經(jīng)免疫的個(gè)體。

Because of the three involved groups (Susceptible, Infected, Recovered), these models are also called SIR-Models.

由于這三個(gè)參與組(Sout usceptible，我 nfected，R ecovered)，這些模型也被稱為SIR-模型。

分析型SIR模型 (Analytical SIR-Model)

We will start with a mathematical SIR-model that will serve us as a benchmark model. In the basic SIR-model, the flow between the three groups is: S -> I -> R . It is a one-way street where in the beginning most individuals are in the S group, eventually cascading via the I group into the R group. At each time step t a certain amount of individuals are traversing from S to I and from I to R, while the total number of individuals N = S+I+R stays constant. We can write these dynamics into a set of differential equations, or, in a bit more understandable form, we can write down by how much each of the groups changes for a certain time step:

我們將從數(shù)學(xué)SIR模型開始，它將作為基準(zhǔn)模型。在基本的SIR模型中，三組之間的流為： S -> I -> R 這是一條單向街，一開始大多數(shù)人都屬于S組，最終通過I組級(jí)聯(lián)到R組。在每個(gè)時(shí)間步t ，一定數(shù)量的個(gè)體從S遍歷到I ，從I遍歷到R ，而個(gè)體的總數(shù)N = S + I + R保持不變。我們可以將這些動(dòng)力學(xué)寫成一組微分方程，或者以一種更易理解的形式，寫下每個(gè)組在特定時(shí)間步長變化的量：

Basic SIR-Model基本SIR模型

The dynamics are governed by two variables β and γ. While β is the rate with which infectious individuals infect others, γ is the rate at which infectious individuals recover. These dynamics are visualized below for a fixed β and γ:

動(dòng)力學(xué)由兩個(gè)變量β和γ控制。 β是感染性個(gè)體感染他人的速度，而γ是感染性個(gè)體康復(fù)的速度。對(duì)于固定的β和γ，這些動(dòng)力學(xué)如下所示：

You can see that the number of infected individuals grows fast, peaking around day 40 which is when the number of susceptible individuals drops significantly, slowing down the rate of infections. This is simply because by then a significant amount of individuals already had the disease and cannot be infected anymore. Towards the end, the number of infected individuals drops to zero, eradicating the disease. Note that by then around 20% of the individuals were never infected. This so-called steady-state solution can also be calculated analytically and depends on the parameters β and γ.

您可以看到感染個(gè)體的數(shù)量快速增長，在第40天左右達(dá)到峰值，此時(shí)易感染個(gè)體的數(shù)量顯著下降，從而降低了感染速度。這僅僅是因?yàn)榈侥菚r(shí)大量的個(gè)體已經(jīng)患有該疾病并且不再被感染。最終，被感染的人數(shù)降至零，從而根除了這種疾病。請(qǐng)注意，到那時(shí)，大約20％的個(gè)人從未感染過。所謂的穩(wěn)態(tài)解也可以解析地計(jì)算，并且取決于參數(shù)β和γ。

With this simple SIR-model we can already observe some basic dynamics for our problem. However, we are looking at our groups only in an aggregate way. We assume that the individuals are a homogeneous, unstructured set organized into three well defined, perfectly mixed groups. The interactions that are modeled are only on average. Every infected individual infects on each day a fixed number of contacts and a constant fraction of all infected individuals is cured each day. There is no way of implementing complex social interactions of individuals within this model. In order to relax some of these assumptions we will now set up an agent-based model simulating each individual separately.

使用這個(gè)簡單的SIR模型，我們已經(jīng)可以觀察到問題的一些基本動(dòng)態(tài)。但是，我們僅以匯總方式查看我們的組。我們假設(shè)個(gè)體是一個(gè)均勻的，無結(jié)構(gòu)的集合，分為三個(gè)定義明確，完全混合的組。建模的交互僅是平均水平。每個(gè)受感染的個(gè)體每天都會(huì)感染固定數(shù)量的接觸者，并且每天治愈所有受感染個(gè)體的一定比例。在這種模式下，無法實(shí)現(xiàn)個(gè)人之間復(fù)雜的社會(huì)互動(dòng)。為了放寬這些假設(shè)，我們現(xiàn)在將建立一個(gè)基于代理的模型，分別模擬每個(gè)人。

基于代理的模型 (Agent-Based Model)

Our first goal is to reproduce the results from the analytical SIR-model. As a data structure we want to use pandas dataframes. Let's start with initializing 10'000 agents represented as rows in the dataframe:

我們的首要目標(biāo)是從分析性SIR模型中復(fù)制結(jié)果。作為數(shù)據(jù)結(jié)構(gòu)，我們要使用pandas數(shù)據(jù)框。讓我們從初始化以數(shù)據(jù)幀中的行表示的1萬個(gè)代理開始：

Currently, the dataframe has only one row called state which indicates the health state of the agent. We encode susceptible with 0, infected with 1 and recovered with 2.

當(dāng)前，數(shù)據(jù)幀只有一行稱為狀態(tài)的行，該行指示代理的運(yùn)行狀況。我們將敏感編碼為0，感染1，然后恢復(fù)2。

Now we need some function that infects an agent. We want this function to take a list of agents that were in contact with an infected agent. Additionally, we want to give a probability with which these contacts actually get infected. Here some Monte Carlo methods come into play in order to add randomness. The function below does the required job.

現(xiàn)在，我們需要一些感染代理的功能。我們希望此功能獲取與受感染代理聯(lián)系的代理列表。此外，我們希望提供這些接觸實(shí)際上被感染的可能性。在這里，一些蒙特卡洛方法開始發(fā)揮作用，以增加隨機(jī)性。下面的功能完成所需的工作。

def infect(df, contacts, probability=1.0):unique, counts = np.unique(contacts, return_counts=True)roll = np.random.uniform(0,1,len(unique))# accounts for several contacts of the same agentprobability = 1 - np.power(1-probability, counts)change = np.array(roll <= probability).astype(int)state = df.loc[unique,"state"]# If change == 0, state is not updated# If change == 1, change the state only if the agent belongs# to the susceptible group: state 0 -> 1, 1 -> 1, 2 -> 2df.loc[unique,"state"] = state + change*np.maximum((1-state),0)

The list of contacts allows to hold the same agent multiple times. We roll a random number between 0 to 1 for each unique agent in the contact list and update the state from susceptible (0) to infected (1) if this roll is below a probability threshold. The last line of the function is updating the state column accordingly.

聯(lián)系人列表允許多次保存同一座席。對(duì)于聯(lián)系列表中的每個(gè)唯一代理，我們?cè)?到1之間滾動(dòng)一個(gè)隨機(jī)數(shù)，如果此滾動(dòng)低于概率閾值，則將狀態(tài)從易感(0)更新為受感染(1)。函數(shù)的最后一行將相應(yīng)地更新狀態(tài)列。

Similarly, we need a function that recovers infected agents with a certain probability. Here, we use a flat chance of recovery in every time step.

同樣，我們需要一個(gè)能夠以一定概率恢復(fù)受感染代理的函數(shù)。在這里，我們?cè)诿總€(gè)時(shí)間步均使用恢復(fù)的機(jī)會(huì)很小。

def recover(df, probability): roll = np.random.uniform(0,1,len(df[df["state"] == 1]))chance = np.array(roll <= probability).astype(int)df.loc[df["state"] == 1,"state"] = 1 + chance

The infect and recover functions are called at every time step. For this we create a step function. Here, we are generating the list of random contacts which has a length of a constant time the number of infected agents.

感染和恢復(fù)功能在每個(gè)時(shí)間步都被調(diào)用。為此，我們創(chuàng)建了一個(gè)步進(jìn)函數(shù)。在這里，我們正在生成隨機(jī)聯(lián)系人列表，該列表的長度是受感染代理程序數(shù)量的恒定時(shí)間。

def step(df):nInfected = np.sum(df["state"] == 1)contacts = np.random.choice(df.index, _randomContacts * nInfected, replace=True)infect(df, contacts, _chanceOfInfection)recover(df, _chanceOfRecovery)

In order to get a feeling for the variations in the outcome of our agent based model we will run the simulation ten times. For each experiment we initialize a set of 10'000 agents with 5 infected patients zero to start with. We then perform 150 time steps.

為了了解基于代理的模型結(jié)果的變化，我們將運(yùn)行十次模擬。對(duì)于每個(gè)實(shí)驗(yàn)，我們從5個(gè)被感染的患者零開始初始化一組10,000個(gè)代理。然后，我們執(zhí)行150個(gè)時(shí)間步。

_nExperiments = 10 _nAgents = 10000 _nSteps = 150_nPatientZero = 5for iExp in range(_nExperiments):df = init(_nAgents, _nPatientZero)for i in tqdm(range(_nSteps)): step(df)

基準(zhǔn)結(jié)果 (Baseline Results)

Visualizing the size of each of the three groups (susceptible, infected and recovered) at each time step, we can see that the dynamics of our agent based model are in agreement with the basic SIR-model.

可視化每個(gè)時(shí)間步長的三個(gè)組(易感，感染和恢復(fù))的大小，我們可以看到基于代理的模型的動(dòng)力學(xué)與基本SIR模型一致。

β=0.225 andβ= 0.225和 γ=0.1 .γ= 0.1。

The solid lines show the median of our 10 runs of the simulation, while the shaded area shows the area between the 25%-75% quantile. Even though there is some variance in the central part of the simulation, all models arrive at a very similar endpoint, which equals to the analytical solution.

實(shí)線顯示了10次模擬的中位數(shù)，而陰影區(qū)域顯示了25％-75％分位數(shù)之間的面積。即使模擬的中心部分存在一些差異，但所有模型到達(dá)的端點(diǎn)都非常相似，這等于解析解。

Up to now we have not gained much in comparison to the basic SIR-model, but we have setup an agent-based baseline model and verified that it behaves similar. With this setup we can now start to add extra complexity.

到目前為止，與基本SIR模型相比，我們還沒有獲得多少好處，但是我們已經(jīng)建立了基于代理的基線模型，并驗(yàn)證了它的行為類似。通過此設(shè)置，我們現(xiàn)在可以開始增加額外的復(fù)雜性。

基于空間代理的模型 (Spatial Agent-Based Model)

It is intuitive that the assumption that an infected agent will have contact with a set of completely random agents may not hold true in real life. You would rather expect some social neighborhood, a group of contacts the infected agents acts with on a regular basis. An easy way of simulating this effect is to place the agents on a lattice and let them interact with their nine closest neighbors.

直觀地認(rèn)為，感染者將與一組完全隨機(jī)的代理接觸的假設(shè)在現(xiàn)實(shí)生活中可能并不成立。您寧愿期望有一些社交鄰居，也可以是感染者定期與之聯(lián)系的一組聯(lián)系人。模擬此效果的一種簡單方法是將代理放置在晶格上，并使它們與9個(gè)最近的鄰居進(jìn)行交互。

β=0.54 andβ= 0.54和 γ=0.1 .γ= 0.1。

Note the prolonged x-axis. You can see that the dynamics are now much slower for the spatial agent based model. I even had to increase the chanceOfInfection significantly, to get it going. The structure that we introduced to the contacts leads to the fact that an infected agent lives in an environment were there are already many agents who are infected as well or have recovered already thus leading to a significant decrease in the spreading of the disease. We can have a look at the spatial distribution of the agents visually in the animation below:

注意延長的x軸。您可以看到，對(duì)于基于空間代理的模型，動(dòng)力學(xué)現(xiàn)在要慢得多。我什至不得不大幅度增加感染的機(jī)會(huì) ，才能使感染持續(xù)下去。我們介紹給聯(lián)系人的結(jié)構(gòu)導(dǎo)致這樣一個(gè)事實(shí)，即被感染的病原體生活在環(huán)境中，因?yàn)橐呀?jīng)有許多病原體也被感染或已經(jīng)康復(fù)，因此導(dǎo)致疾病傳播的顯著減少。我們可以在下面的動(dòng)畫中直觀地查看代理的空間分布：

Blue: Susceptible, Yellow: Infected, Green: Recovered藍(lán)色：易感，黃色：已感染，綠色：已恢復(fù)

添加隨機(jī)聯(lián)系人 (Adding Random Contacts)

We saw that when we introduce spatial structure to the social interactions of the agents, the dynamics of the disease are slowed down significantly. What happens when we introduce for every agents an additional random contact besides its nine spatial neighbors?

我們看到，當(dāng)我們?cè)诖砣说纳鐣?huì)互動(dòng)中引入空間結(jié)構(gòu)時(shí)，疾病的動(dòng)力學(xué)會(huì)大大減慢。當(dāng)我們?yōu)槊總€(gè)代理引入除其9個(gè)空間鄰居之外的其他隨機(jī)接觸時(shí)，會(huì)發(fā)生什么情況？

β=0.6 andβ= 0.6和 γ=0.1 .γ= 0.1。 Blue: Susceptible, Yellow: Infected, Green: Recovered藍(lán)色：易感，黃色：已感染，綠色：已恢復(fù)

With only one additional random contact the dynamics of the infection are again much faster, quickly breaking the structure we introduced by placing the agents on the lattice.

只需要再進(jìn)行一次隨機(jī)接觸，感染的動(dòng)力學(xué)就會(huì)更快得多，通過將藥劑置于晶格上，可以Swift破壞我們引入的結(jié)構(gòu)。

日益復(fù)雜 (Increasing Complexity)

We have a working setup that one can now play with by increasing the complexity. One could think of modeling different separate clusters of agents that are only interconnected weakly, or introducing an age structure for the agents reflecting different kinds of interactions for different age groups. Additionally, one could start introducing measures to reduce the chance of infection at a certain time step or reducing the number of contacts.

我們現(xiàn)在可以通過增加復(fù)雜性來進(jìn)行工作。可以想到的是，對(duì)僅相互聯(lián)系較弱的代理的不同單獨(dú)群集進(jìn)行建模，或者為代理引入年齡結(jié)構(gòu)，以反映不同年齡組的不同類型的交互。另外，人們可以開始采取措施以減少在特定時(shí)間步的感染機(jī)會(huì)或減少接觸數(shù)量。

性能 (Performance)

One word about the performance of the model. Usually, I like using an object oriented approach for building agent-based models. Modelling the agents as a class makes the simulation and the coding quite intuitive. However, in python the simulation may quickly become relatively slow. By storing the data into pandas dataframes, where one row represents one agent, we are loosing a bit of flexibility, but we can rely on numpy functions doing the major workload, thus making the simulation reasonably fast. The presented examples run with about 50 steps per second on my machine for 100'000 simulated agents, producing the output of the simulation within a few seconds.

關(guān)于模型性能的一句話。通常，我喜歡使用面向?qū)ο蟮姆椒▉順?gòu)建基于代理的模型。將代理建模為一個(gè)類可以使仿真和編碼變得非常直觀。但是，在python中，仿真可能很快變得相對(duì)緩慢。通過將數(shù)據(jù)存儲(chǔ)到熊貓數(shù)據(jù)幀(其中一行代表一個(gè)代理)中，我們失去了一定的靈活性，但是我們可以依靠numpy函數(shù)來完成主要工作量，從而使仿真速度相當(dāng)快。所提供的示例在我的計(jì)算機(jī)上以每秒約50步的速度運(yùn)行，以處理100'000個(gè)模擬代理，并在幾秒鐘內(nèi)產(chǎn)生了模擬輸出。

結(jié)論 (Conclusions)

I showed you how to set up a basic agent-based model from scratch. We looked at the example of modelling a spreading disease. As a first step we were validating a minimal version of our model against a known mathematical model. We then started changing parameters in order to investigate changes in the dynamics of the system. By introducing a lattice structure to the agents we observed that the spread of the disease slowed down significantly, but allowing for only one random contact again lead to increasing dynamics. The presented implementation is a flexible setup that allows for an easy implementation of more complex interactions, heterogeneity and structure within the agents. Also, we are capable of studying agents on an individual level, or subgroups of agents within a complex, large scale simulation.

我向您展示了如何從頭開始建立基于代理的基本模型。我們看了一個(gè)模擬傳播疾病的例子。第一步，我們根據(jù)已知的數(shù)學(xué)模型驗(yàn)證模型的最小版本。然后，我們開始更改參數(shù)，以調(diào)查系統(tǒng)動(dòng)態(tài)的變化。通過向藥劑中引入晶格結(jié)構(gòu)，我們觀察到疾病的傳播速度顯著減慢，但僅允許一次隨機(jī)接觸又導(dǎo)致動(dòng)力學(xué)增加。提出的實(shí)現(xiàn)是一種靈活的設(shè)置，可以輕松實(shí)現(xiàn)代理中更復(fù)雜的交互，異構(gòu)性和結(jié)構(gòu)。而且，我們能夠研究單個(gè)級(jí)別的代理商，或在復(fù)雜的大規(guī)模模擬中研究代理商的子組。

Feel free to use this setup as a starter and play with it. The full code can be accessed here:

隨意使用此設(shè)置作為入門工具并進(jìn)行嘗試。完整的代碼可以在這里訪問：

翻譯自: https://towardsdatascience.com/modelling-a-pandemic-eb94025f248f

scrapy模擬模擬點(diǎn)擊

總結(jié)

以上是生活随笔為你收集整理的scrapy模拟模拟点击_模拟大流行的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

Scrapy

上一篇： jdk重启后步行_向后介绍步行以一种新颖
下一篇：梦到吃喜面是什么意思