Privacy-preserving WiFi-based Crowd Monitoring

The process of estimating the number of individuals within a deﬁned area, commonly referred to as people counting , is of paramount importance in the realm of safety, security and crisis management. It serves as a crucial tool for accurately monitoring crowd dynamics and facilitating well-informed decision-making during critical situations. In our current study, we place a special emphasis on the utilization of the WiFi ﬁnger-print technique, leveraging probe request messages emitted by smart devices as a proxy for people counting. However, it is essential to recognize the evolving landscape of privacy regulations and the concerted eﬀorts by major smart-device manufacturers to enhance user privacy, exempliﬁed by the introduction of MAC addresses randomization techniques. In this context, we designed a crowd monitoring solution that exploits Bloom ﬁl-ters for ensuring a formal deniability , aligning with the stringent requirements set forth by regulations like the European GDPR [1] . Our proposed solution not only addresses


I. INTRODUCTION
In the aftermath of the COVID-19 pandemic, social distancing measures have reshaped our world, leading to unprecedented restrictions on public gatherings.Even with the subsiding pandemic, limitations on gathering sizes continue to affect public events.Despite these challenges, we anticipate a resurgence of large public events in the post-COVID era, heralding excitement and also posing formidable challenges.The return of these mass gatherings introduces significant security and congestion management challenges.In this context, the role of crowd management analytics is essential for facilitating effective decision-making and ensuring public safety.The ability to meticulously assess crowd dynamics, estimate resource requirements, and optimize emergency response efforts is critical for authorities tasked with managing large public events.Nevertheless, the task of accurately counting and tracking individuals within large-scale gatherings remains a complex challenge.Traditional techniques, including the use of surveillance cameras, LiDAR and infrared systems, as well as WiFi and Bluetooth fingerprint tracking, have been extensively employed for this purpose.Nonetheless, these methods are now grappling with a set of new challenges, most notably those stemming from the European General Data Protection Regulation (GDPR) [1] and heightened concerns regarding user privacy, especially from prominent smart device manufacturers.Apart from pandemic-related concerns, crowd monitoring is crucial for ensuring safety during large events for several reasons.It enables authorities to anticipate and mitigate potential security threats, manage crowd flows to prevent stampedes or congestion, and allocate resources effectively in emergencies.In the event of an unforeseen crisis, such as a terrorist threat or a natural disaster, accurate people counting and crowd tracking can be instrumental in orchestrating rapid evacuations and providing timely medical assistance.
In this study, we harness IoT technology to address the challenge of simultaneously characterizing flows of people and quantifying the number of devices/people in a crowd, all while safeguarding user privacy.
Within our proposed crowd counting framework, we have harnessed a versatile data structure capable of facilitating both single scanner counting and flow analysis involving multiple scanners.When handling data captured from the WiFi scanners, our process begins with an initial stage of outlier removal.Subsequently, we leverage a derandomization technique, as proposed in [2], to address the randomization of MAC addresses.This technique allows us to assess the probability that different probe requests, each with distinct randomized MAC addresses, belong to the same device.This step significantly enhances the counting accuracy.The output of the derandomization process is then stored in a Bloom filter, which is initialized with a set of n random MAC addresses.This initialization aligns with the previously introduced anonymization noise technique [3].Through this technique, we establish the formal concept of 1-deniability.This ensures that for every MAC address present in the Bloom filter, there exists at least one other MAC address not in the filter that, if queried, would yield an indistinguishable result from the original, thereby allowing us to deny its presence with certainty.Essentially, for every element stored, there is at least one other element not in the Bloom filter that, if added, would not change the filter's bitmap.With the added security of the 1-deniability property, we can securely transmit the Bloom filters over the network and perform server-side intersections.This allows us to analyze the flow of people within a specific time window or simply count the number of elements stored within a Bloom filter.
The remainder of this paper is structured as follows: In Section II, we delve into pertinent related work.Section III provides an in-depth exploration of the methodology and techniques employed in designing our privacy-preserving crowd monitoring solution.Subsequently, Section VI presents an in depth analysis on the anonymization noise applied to Bloom filters, showcasing the effectiveness of 1-deniability and introducing the concept of γ − K− anonymity.Section VII unveils the flows-monitoring architecture, offering a comprehensive overview of the entire pipeline, starting from data detection and culminating with Bloom filter intersection for harnessing flow data.Lastly, in Section VIII, we present our concluding remarks and insights.

II. RELATED WORKS
In recent years, the importance of estimating crowd sizes and understanding people's movements in specific areas has become increasingly evident.This knowledge can significantly improve the overall management of various situations and events.
Over the past few years, numerous techniques have been proposed to address the challenge of people counting.In [4], [5], the authors exploit a multi-camera surveillance system.Their method combines partial body detection and person reidentification to accurately count individuals in overlapping areas.In contrast, recent works, such as [6], [7], have employed LiDAR sensors, which, compared to video camera techniques, address privacy issues.However, in both solutions, hardware costs play a pivotal role and the suitability of LiDAR in different environmental scenarios remain problematic.Moreover, due to the recent regulations imposed by the GDPR [1], recording and storing face detection data raises privacy concerns.
Conversely, studies presented in [2] and [8] explore the use of WiFi probe request messages as an effective method for crowd-monitoring in various scenarios, achieved by collecting WiFi fingerprints from mobile devices.Solutions utilizing such messages can be applied in both indoor and outdoor settings, demand budget-friendly equipment, have low computational requirements, and can mitigate privacy concerns.New approaches investigate the possibility to apply artificial intelligence algorithms to divide probe requests to identify the single devices that sent them.An example is explained in [9] and [10], where the authors employed clustering techniques to retrieve groups of messages, each one representing a device.The former focused on some probe requests fields, such as MAC addresses, arrival timestamp, throughput capabilities and RSSI.While the latter considered the length of probe request fields values.Nevertheless, WiFi fingerprints have their limitations, primarily linked to the absence of ground truth data essential for testing and fine-tuning counting algorithms.In particular, all the reported works obtained some results that are not comparable with any certain ground truth.The number of devices that send messages inside the captures can only be estimated with methods that are approximate and not easy to apply, especially in highly-crowded environments.
In light of recent developments in European regulations, the GDPR [1] imposes significant restrictions on the storage and management of sensitive information.As previously mentioned, mobile devices regularly emit probe request messages that contain details, such as MAC addresses, which are crucial for device identification and monitoring.Notably, the GDPR categorizes MAC addresses as personal data, necessitating the implementation of privacy protection measures [11].Numerous approaches have been presented in literature to tackle this challenge, including [12] and [13].Both these solutions address privacy concerns by utilizing Bloom filters to store MAC addresses information and employing an asymmetric homomorphic encryption system to process the data within the Bloom filter.Unfortunately, as demonstrated in [14], while Bloom filters can indeed safeguard the privacy, they may fail to meet the stringent anonymity constraints mandated by the GDPR.Additionally, the authors of [14] introduce two essential concepts for safeguarding anonymity: γ− deniability and γ − K− anonymity.In simple terms, a small number of inserted elements do not guarantee anonymity.Regrettably, this observation is not taken into account in [12] and [13], rendering their proposed solutions viable only for sufficiently large crowds.
In this current work, we present an extensive analysis of privacy techniques applicable to Bloom filters.These techniques are designed to meet the stringent privacy requirements mandated by GDPR regulations.Additionally, we introduce a crowd-monitoring framework, which is focused on understanding the dynamics of crowd movement across multiple scanners.Importantly, all of our approaches adhere to GDPR-compliant privacy standards, guaranteeing the protection of user data.

III. CROWD MONITORING THROUGH WIFI PROBE
REQUEST MESSAGES As we have detailed earlier, our methodology harnesses WiFi signal detection and subsequent data analysis to precisely estimate the presence and variety of smart devices, encompassing smartphones, tablets, laptops, and smartwatches, among others.Our method involves scanning the WiFi spectrum to capture packets emitted by these smart devices.The process begins when a smart device activates its WiFi interface, emitting probe requests as it searches for nearby Access Points (APs) -a fundamental step for establishing a connection to a WiFi network.In the ensuing discussion, we will provide an overview of the fundamental characteristics of WiFi that we exploit to garner valuable insights from the probe request messages.Furthermore, we will introduce the Bloom filter data structure, which is integral to our development of the concept of 1-deniability.
Probe requests constitute a specific category of WiFi management frames primarily utilized for the purpose of network discovery.Notably, these frames lack encryption, as their primary role is to facilitate the identification of available networks.Whenever a device activates its WiFi interface, it automatically initiates the transmission of these broadcast messages.Furthermore, even when a device is already connected to a network, it continues to send probe request messages with the objective of identifying potentially better access points, thereby aiming to enhance the quality of its network connection.Upon dispatching a probe request, a Probe Timer is initialized.If the device fails to receive a response, it automatically switches to the next channel frequency and reiterates the network discovery process.In the event a probe response is received, the device proceeds to initiate the authentication process.This authentication procedure involves It is worth noting that this behavior bears a resemblance to Bluetooth, where similar messages are referred to as inquiry requests and inquiry responses.These Bluetooth messages serve a parallel purpose of discovering nearby devices and establishing connections.
Within the probe request messages, the MAC address serves as a primary key field.This 48-bit identifier is designed to uniquely distinguish each device globally.However, since 2014, device manufacturers have increasingly adopted MAC address randomization techniques as a means of bolstering user privacy [15].These privacy-enhancing techniques involve the use of "pseudo" or "fake" MAC addresses when transmitting probe request messages.The objective of this technique, denoted as MAC randomization, is to make it more challenging to track a device, thereby safeguarding people's identity.Notably, there is no standardized method for randomizing MAC addresses, resulting in a variety of methods employed by different vendors.
In our earlier research [16], we highlighted this evolving trend, particularly prevalent in newer devices.Our observations have revealed variations in MAC address randomization practices among these devices.While some devices choose to randomize the entire 48-bit MAC address, others selectively randomize only the second half of the address, retaining the first 24 bits, which are known as the Organizationally Unique Identifier (OUI).This selective approach to randomization strikes a balance between user privacy and network compatibility, ensuring seamless interaction with existing network infrastructure while enhancing personal data protection.Some past works focused on derandomization methods, which are able to identify approximately the same device behind a set of random MAC addresses.Works [2], [9], [10], [17] represent only a portion of the extensive body of literature contributing to the field of derandomization techniques in the context of probe request messages.Actually, such methods are heuristic and their accuracy depends on the device models and on the usage conditions.
A possible framework designed for the analysis of crowd monitoring and the study of people's movements, based on multiple access points distributed over a wide area, is depicted in Figure 1.It provides an overview of the operations conducted by the sniffer.The process starts with the sniffing of the WiFi probe request messages, followed by the application of a filter, i.e., based on RSSI signal strength.This filter is instrumental in eliminating potential outliers from the analysis.Subsequently, a derandomizer script is employed to counteract the effects of MAC address randomization.Finally, the resulting MAC addresses are stored in a Bloom filter, which is initially populated with random MAC addresses, referred to as anonymization noise.

IV. BLOOM FILTERS AS A PRIVACY TOOL
Bloom filters are a probabilistic data structure well known in the literature [18], [19] which have been introduced to solve the approximated set membership problem.They have been devised to optimize the performance of data storage systems, whenever a set must be efficiently implemented with minimum memory footprint.
A Bloom filter is used to represent a collection of elements.It is constructed using an array of bits, represented as BF ∈ {0, 1} m , where m is the array's length, and k independent hash functions, denoted as H 1 , H 2 , . . ., H k .These hash functions map an input element x to one of the m bits within the bit array.We refer to the i-th bit of BF as BF [i].Initially, all bits are initialized to 0. When adding an element x to the Bloom filter (a visual representation can be found in Figure 2), the k hash functions are applied to x, and the bits in BF associated with the positions generated by the hash functions are set to 1: Fig. 2: When adding a new element to the Bloom filter, the input is processed using a set of k independent hash functions.
The resulting values from these hash functions (i.e., 0,2,8) serve as indices to access the Bloom filter array.At each corresponding index, the bit is set to 1.This process is repeated for each new input value.
To confirm the presence of an element in a Bloom filter, the element undergoes the same set of k hash functions, and the resulting output is cross-referenced with the current values of the corresponding bits in the Bloom filter (BF ).If all the 1s in the output align with the corresponding bits in BF (i.e., both are set to 1), the element is regarded as likely present in the Bloom filter.This means that an element could be considered within the Bloom filter, even if this is not correct.This event is named "false positive".On the contrary, if even a single bit in the match is set to 0, the element is conclusively deemed absent in the Bloom filter.It is crucial to emphasize that a Bloom filter has the potential to yield false positives, i.e., it may incorrectly indicate that an element is present in the set when it is not.However, it does not yield false negatives, meaning it cannot erroneously indicate that an element is absent when it is, in fact, present.
Notably, it is only possible to add elements in a Bloom filter, and it is not possible to remove elements.Thus, for the WiFi scanner scenario, the Bloom filter keeps accumulating MAC addresses and its probability of false positive increases, until it saturates (i.e., all bits are ones).To avoid saturation, the bloom filter must be periodically reset.
It is worth highlighting that: • A smaller value of k raises the number of 0 bits in the array, making them more likely to be available for elements that are not part of the set S. • Conversely, a larger value of k heightens the likelihood of encountering at least one 0 bit for an element that is not a member of S. To minimize false positives, using standard results [19], it is possible to determine analytically the optimal value of k based on the available memory m according to (2): Bloom filters offer a versatile solution for storing MAC addresses observed by a WiFi scanner.To illustrate the process of parameter tuning, we will examine three different scenarios where the Bloom filter is reset periodically with a period denoted as T .
In the first scenario, denoted as A, at most 1,000 people are expected to be observed by the scanner in a period of time of 120 s, so the flow rate is approximately 8 people per second.Assuming to have a memory of 10,000 bits, the optimal value k opt of hash functions, according to (2), is equal to 7.This first scenario can be a use case where we want to monitor the flow of people in a very crowded plaza.Let us now consider a different scenario, denoted with B, where we want to analyze the flow of people and cars at an intersection during the green light phase.In this case, the period of time is shorter than before, let us assume it is 40 s and that we expect to collect 4 probe requests per second, with only 1,000 bits available.In this case the value of k opt is 4. Finally, we will focus on a different use case, e.g., the monitoring of people entering a lecture hall in the morning.In this case, we can use a longer period, e.g., 10 minutes, with an expected turnout of around 200 people.With a 2,000-bit Bloom filter, the optimal value for k would be 8.Table I provides a concise summary of the three previously mentioned scenarios.
Tuning the parameters of a Bloom filter is a fundamental aspect related to the specific use case under consideration, because it depends on the number of bits available, the length of detecting time window and the number of people/devices expected to be detected.
To determine the count of elements stored within a Bloom filter, following [20], we can resort to the following formulation (3): where m corresponds to the total number of bits in the Bloom filter, k represents the count of used hash functions, and t indicates the number of bits set to 1.

V. DENIABILITY AND ANONIMITY
False positive events have been traditionally considered as weaknesses, but authors in [14] instead showed the potential of using Bloom filters to "hide" the presence of elements in the bitmap, thus achieving formal levels of privacy.Specifically, they introduced the concepts of deniability and anonymity.
Deniability refers to the ability to plausibly deny the presence or association of specific elements within the Bloom filter.It implies that, even if an entity gets hold of the Bloom filter, there can be no certainty as to whether a particular element was added or not.
Anonymity, instead, pertains to the protection of the identity or specific information associated with the elements stored in the Bloom filter.It ensures that the elements themselves remain concealed or unidentifiable when queried or retrieved from the Bloom filter.
In a more formal way: Definition 1: (taken from [14]) Hiding Set: A set V is called Hiding Set for a Bloom filter BF (S) if V contains all the elements v i ∈ U such that v i / ∈ S and a query for v i in the Bloom filter returns 1.Where |U | represents a large set, approximately equal to 2 48 ≈ 2.8 10 14 .In other words, a hiding set is a set of elements not present in the Bloom filter that, when queried, falsely indicate their presence in the filter.Figure 3 shows an example of "Hiding Set".In this case, a 10-bit Bloom filter, using 2 hash functions, has been employed to insert 3 elements x 1 , x 2 , x 3 .Additionally, 3 elements v 1 , v 2 , v 3 , belonging to the Hiding Set are incorrectly identified as false positives when querying the filter, yielding positive results for these elements.
In other words, an element is termed "deniable" when it can be replaced with items not originally included in the Bloom filter's stored set, all without altering the filter's bit map.
It is essential to emphasize that the 1-deniability approach ensures that an element correctly inserted into the filter is indistinguishable from just one element that was never part of the filter to begin with.To enhance the level of protection, we can resort to the concept of anonymity outlined in Section 3.
Definition 3: (taken from [14]) γ −K−Anonymity: Considering a Bloom Filter BF (S) and x ∈ S inserted in BF (S), . Consequently, it is possible to say that a Bloom filter BF (S) is γ − K− anonymous if each randomly chosen element is K− anonymous with probability γ.
Using (4) proposed in [14] is possible to compute the level γ related to the γ −K− anonymity property for a specific Bloom filter storing n MAC addresses, according to the following formula, for K ≥ 2: where |U | is the number of all the possible MAC addresses, equal to 2 48 .

VI. ANONYMIZED COUNTING
We introduce the concept of anonymization noise applied to a Bloom filter.It is a privacy-enhancing technique that introduces uncertainty and randomness into the Bloom filter's data, making it more challenging to infer specific elements from the filter's contents.This concept is particularly useful when preserving the anonymity and confidentiality of data elements stored in a Bloom filter is of utmost importance.
The anonymization noise involves adding random mac addresses to the Bloom filter whenever the Bloom filter is reset.These random mac addresses are not associated with any actual data but are used to obscure the data subsequently stored in the filter.The inclusion of anonymization noise makes it impossible, if the 1-deniability property is satisfied, for an external observer to make any assumption if an element is stored in the Bloom filter or not, thereby protecting the anonymity of the actual elements.This adds a layer of privacy and security to the Bloom filter.It is worth to note that while anonymization noise enhances privacy, it can also introduce a trade-off by potentially increasing the rate of false positives.If the anonymization noise introduced is too much then we will have a very good privacy but a bad accuracy as the false positive rates increase.While, too little noise will brings to bad privacy, not satisfying the 1-deniability property, but a good accuracy in terms of false positive rate.
In essence, it is essential to strike the right balance when determining the optimal number of fake elements to be inserted as anonymization noise.This balance ensures that the level of privacy and anonymity is maximized while minimizing the impact on the filter's performance and accuracy.
Due to the recent GDPR regulations in terms of users' private information, sniffing and analyzing WiFi probe request messages has become a sensitive issue.Therefore, it is crucial to incorporate anonymization noise.This ensures that from the very first insertion of legitimate MAC addresses, each of them is shielded by a minimum of K − 1 additional elements.
Figure 4 visually represents the results of applying (4) to a Bloom filter with m = 10, 000 bits and k = 7, for various values of K.This graph facilitates the evaluation of c K min in a manner that ensures, for each element randomly drawn from the filter, there are at least K − 1 non-inserted elements available to provide the necessary cover and privacy protection.E.g., setting c min = 30 is enough to guarantee 1 − K− anonymity for K = 2, 3, 4. We now explore the counting process in a Bloom filter while ensuring the 1-deniability through the anonymization noise.In Algorithm 1, we present the pseudocode algorithm to demonstrate the key steps involved, including initialization of the Bloom filter, insertion, and count estimation while accounting for the anonymization noise.
More in detail, at the beginning of every new iteration the Bloom filter is initialized with all bits equal to 0. Then the random noise, consisting in c min "fake" MAC addresses, is inserted into the Bloom filter.Notably, thanks to the standard properties of independent hash functions used in Bloom filters, an efficient implementation of this step requires just to generate c min × k random positions in the Bloom filters, since this is equivalent to adding c min elements with k hash functions.After this initialization phase, for every new probe request detected the MAC address is inserted into the filter, setting the proper bits to 1.The indexes of the bits are given by the output of the k hash functions applied to the MAC address.When the capturing window is over, the first operation is counting the number of ones in the Bloom filter and then (3) is applied to estimate the number of element present in the filter.Finally from the estimated value, a c min value is subtracted, due to the anonymization noise inserted at the beginning.

VII. APPLICATION TO CROWD-FLOW ANALYSIS
Analyzing crowd flows through WiFi probe requests entails identifying common MAC addresses among various WiFi scanners.Consequently, the analysis cannot be performed within a single scanner; instead, the data from each scanner must be transmitted to a central server.This central server then analyzes the data from different scanners to derive insights and valuable information regarding crowd flows.In order to preserve privacy in this task, we exploit Bloom filters beyond the representation of individual sets; indeed, they can also facilitate set unions and intersections.The intersection operation is particularly crucial, especially in the context of flow detection.The core concept is that by performing the intersection of two Bloom filters, we can determine the number of MAC addresses that have been detected in both Bloom filters.Consider two subsets, S1 and S2, derived from the universal set U .Each subset is respectively represented by Bloom filters BF 1 (S1) and BF 2 (S2), configured with the same parameters (m and n).In order to find the intersection between these subsets, a bitwise logical AND operation is executed between the two Bloom filters.This operation yields a new Bloom filter: representing the intersection.To determine the count of elements within this intersection, taking into account the anonymization noise, we can leverage (6), derived from [21].
where t i represents the number of bits set to 1 in BF i .This approach allows for efficient determination of the cardinality of the intersection between the sets represented by these Bloom filters.
It is worth to notice that the approach can be extended to the intersection of any number of Bloom filters, allowing to identify very specific paths across a sequence of WiFi scanners.

A. Numerical evaluation
In order to evaluate the accuracy of the intersection between two Bloom filters we ran an experiment where we started from two empty Bloom filters, namely BF 1 and BF 2 , with the configuration of m = 10, 000 and k = 7.We set c min = 30, according to the reasoning done when discussing Figure 4. Then we proceeded through the following steps: Table II shows the estimated number of common MAC addresses, computed on BF 3 , and compares it to the actual number n c .The results show an accurate estimation of the common MAC addresses.Notably, when no common address is present in the two Bloom filters (i.e., n c = 0), the estimator is still quite accurate, even if 200 + c min distinct MAC addresses were previously inserted for each of the Bloom filters.This is due to the approximations behind the adopted formulas and the adopted probabilistic approach.Figure 5 shows the relative error in estimating the common MAC addresses, based on the complete data set from which the results in Table II were derived.Clearly, the relative error keeps very small also for a large number of common MAC addresses, thanks to the law of large numbers.

B. Implementation
As illustrated in Figure 1, the comprehensive data capture and processing pipeline terminates with the insertion of detected MAC addresses into a Bloom filter.This Bloom filter is subsequently relayed over the network to a central server, which serves as the hub for collecting, storing, and processing incoming Bloom filters from various access points.The architectural depiction in Figure 6 provides a visual representation of this seamless flow.
This approach extends our capabilities beyond merely assessing individual Bloom filters derived from single scanners.It allows us to perform intersections on the Bloom filters originating from diverse access points, each corresponding to distinct geographical areas.Through this intersection process, we gain the capacity to extract precious insights into the movement and flow of individuals across these areas.This level of analysis yields rich data that is instrumental in understanding crowd dynamics and behavior.Fig. 6: Architecture for crowd-flow analysis.Each AP scanner sends periodically its Bloom filter to the server, where all the Bloom filters are processed to infer crowd flows.

VIII. CONCLUSIONS
In this paper we have tackled the significant challenge of crowd counting and tracking within large-scale gatherings, with a strong focus on the growing importance of privacy and compliance with regulations, such as the European GDPR.
Our proposed crowd monitoring framework leverages IoT technology and utilizes WiFi probe request messages as a key tool to offer an innovative solution that not only accurately characterizes people flows and quantifies crowd size but also places paramount importance on preserving individual privacy.The core objective is to ensure precise people counting while simultaneously upholding the principles of privacy, aligning our approach with the stringent regulations set out in the GDPR.We have employed advanced techniques like 1deniability and anonymization noise within Bloom filters to guarantee the formal deniability of each element's presence.This approach ensures privacy when transmitting data over the network and conducting server-side intersections for flow analysis.
Looking ahead, our future research endeavors will focus on further refining these techniques and exploring additional privacy-enhancing concepts to advance the field of crowd counting and monitoring.Additionally, we are planning to develop an advanced machine learning-driven derandomization algorithm to enhance device counting accuracy within the coverage of an AP scanner.Furthermore, within the European project TrialsNet [22], we plan to deploy several AP scanners in a public area to rigorously test the effectiveness of our solution.

Fig. 3 :
Fig. 3: Elements v 1 , v 2 , v 3 belonging to the hiding set of Bloom filter of 10 bits storing x 1 , x 2 , x 3 .Arrows indicate the bits that are set to one according to two hash functions H 1 and H 2 .

Fig. 4 :
Fig. 4: Achievable level of γ-deniability, for different values of anonymity K, when n MAC addresses are inserted into a Bloom filter with m = 10, 000 bits and k = 7.

Fig. 5 :
Fig. 5: Relative error in counting the number of MAC addresses resulting from the intersection of two Bloom filters.

TABLE I :
Real-world examples for Bloom filter settings Algorithm 1 Counting algorithm with anonymization noiseInput: Bloom filter (BF ) of size m bit, Number of elements cmin for the anonymization noise, k independent hash functions Output: Estimated count of elements inserted 1) Reset each Bloom filter with anonymization noise equal to c min .2) Insert 200 random MAC addresses into BF 1 .3) Insert 200 random MAC addresses into BF 2 .4) Generate n c random MAC addresses and insert all of them into both BF 1 and BF 2 .5) Compute the resulting Bloom filter BF 3 computed from the intersection of BF 1 and BF 2 .6) Count the MAC addresses observed in both WiFi scanners based on (6) minus c min (to compensate for the anonymization noise).At the end of step 3, by construction, the two Bloom filters have no common MAC addresses.At the end of step 4, the two Bloom filters have stored 200 + c min different MAC addresses, plus n c common MAC addresses, modeling n c devices detected by both scanners.

TABLE II :
Numerical results of the estimation of common MAC addresses