Methods, Developments, and Technological Innovations for Population Surveys

This article reviews contemporary issues in survey research, connecting established methods to innovative tools and technologies like real-time sensors and computer vision. This link takes into account the idea about the “organical” nature of Big Data, which represents a challenge toward a modernization of population studies in the light of technological innovations. The adopted dominant paradigm of data gathering is web survey (computer-assisted web interviewing), which is explored through the formalization of chain-referral methods as respondent-driven sampling. The general orientation is toward a computational social science approach. Weaknesses of such methodology is studied and solutions are provided with insights from empirical research on panel management. Contribution from gamification techniques is critically discussed.


Introduction
In his attempt to record a historiography of survey research as a "scientific domain," the ex-director of U.S. Census Bureau Robert Groves (2011) offers a scheme of the evolution of the discipline in three "Eras": -1930-1960, Era of Invention: In this period, public institutions, in the United States in particular, shared enthusiasm about the possibility of mapping broad societal trends with probabilistic sampling and measurable sampling errors. The data collection methods were dominantly face-to-face interviewing and mailed questionnaires ("paper assisted" or PA). Telephone surveys entered in the private sector only toward the end of the era. According to the author, response rates often were over 90% in this Era.
- , Era of Expansion: These decades saw the ubiquitous dissemination of the telephone as the main communication medium among families. Already by the late 1960s, computer technology was used to present questions on a monitor and to record the answers entered by telephone interviewers, from which the computer-assisted telephone interviewing (CATI) and computer-assisted personal interviewing (CAPI), which indicate a data collection where there is interviewer but both the answering and the recording phase are mediated through computer technology. The developments did not take place in the academic or government sector but in the private sector. The dominance of CATI was impactful on the development of survey research as a discipline. Waksberg (1978) noticed a shift in sampling designs from stratified samplings to cluster-based sampling methods and eventually to hybrid designs. The telephone brought with itself the possibility to terminate the interview by sudden "hang up". Consequently, the rate of partial interviews and missing values rose, so that questionnaires went gradually shorter and lose most of their potential as a tool to discover "latent features" in social research. -1990 to the Present, which Groves refers as "Designed Data Supplemented by Organic Data Era": Groves stresses the continuous inflation of costs of research due to the increase of factors of disengagement of interviewed for the tasks requested. At the same time, the author notices that "the rise of the Internet reenergized research by self-administered instruments" and that "volunteer Internet panels arose as a tool". Then, Groves presents the second and probably the most important feature of the present Era, something that he addresses as the growth of "organic data": "Collectively, society is assembling data on massive amounts of its behaviours. Indeed, if you think of these processes as an ecosystem, the ecosystem is self-measuring in increasingly broad scope. We might label these data as 'organic,' a now-natural feature of this ecosystem". Such flows of organic information are currently exploited in several applications (see details in the following Sensory New Technologies section). His further argument is that the future of survey research will be defined by the development of methods able to implement "organic data" within traditional design, an argument widely shared by Salganik (2017).
New possibilities of research arise from Internet, "web" technologies, and more generally what Groves refers as "organic data" but according to Xu et al. (2019), concerns should be raised on the general validity of the research process. Figure 1 describes the data generation process and the research workflow of organic data. Authors assert that "threats to validity can arise in any part of the research process illustrated" (p. 5). They also provide a definition of organic data that fits the framework of Salganik (2017): "data that are generated without any explicit research design elements and are continuously documented by digital devices" (Xu et al., 2019, p. 1).
The consensus from authors is that while experimental design and procedures are regarded as an optional (Figure 1), the absence of it should be a well-justified exception. In social research, experimental design is strongly tied with the concept of representativeness of the sample. While PA, CAPI, and CATI are uncontroversial terminologies, we argue for a case that computer-assisted web interviewing (CAWI) may be a misleading concept in relation to (supposedly) experimental versus observational data. Baker et al. (2010) provided a classification of three types of CAWI: -not probability-based methods, which are also reported to be very common, -probability-based methods, having to deal with Internet diffusion among the population, -the so-called river sampling, which is a specification of not-probability-based methods where respondents are recruited through web advertising.
Only representative samples can be furtherly prompted into an experimental design without the risk of compromise to the process of statistical inference, and commonly representativeness is tied to probabilistic samples. A comprehensive theoretical framework on the expected issues in representativeness of survey samples is total survey error (TSE) paradigm (Biemer, 2010;Groves & Lyberg, 2010). Yaeger et al. (2011) provide an empirical study about the representativeness of CAWI methods.
However, in practice, these distinctions are not always clear, nor is always clear if the web is just the medium to reach a sample already drawn (i.e., sending a mail into a mailing list) or if it is also a way to collect data from what Groves calls "volunteers." In all cases, we will present evidence that web-based solutions impact research mostly through radically lower costs to assess bigger data sets.
The emergence of what Groves refers as "the volunteers" may not be related to an effective increase in willing to participate in a panel, instead being due pervasiveness of mobile technologies and social media, which emphasize acts of "sharing information". It is interesting to explore how such trends in relationship between technology and society can be understood with the raise of "gamification techniques" of data gathering. These trends in "sharing" can be also linked to the formalization of snowball methodologies (Goodman, 2011). The concept of "snowball" sampling was indeed already under development in the era of PA interviews, as organically exposed in Handcock and Gile (2011).
The paper is organised as follows: -in A not Probabilistic Method for a Connected Population: Snowball Sampling section, we will cover the topic of mathematical models that provide a methodological foundation for statistical inferences from snowball sampling developed under the common framework of respondent-driven sampling (RDS), as we found this framework the most promising in population studies.
-in Issues in Panel Studies and Technological Solutions section, we will propose the analysis of the panel methods and the topic of attrition rate to improve the quality of data -in Gamification Techniques for Increasing Engagement section, we will cover techniques of "gamification" for development of user experience (UX) of survey tools, characteristics of webbased survey tools (Introduction section) -in New frontiers of UX Design of Survey and Data Analysis of Pictures with Computer Vision section, we will cover the role of UX design and the use of photography in surveys -in Panel Management by Web-Based Solutions CAPI-CAWI section, we will consider a CAPI-CAWI design for panel studies -in Monitoring in Survey Studies section, we will get in deep in monitoring of survey studies and implementation of "organic data" or even Big Data in population studies. We will see how data collection and user involvement can be influenced by modern habits to use information technology, even considering the massive use of social networks and sensory technologies -at the end, we aim to have covered major issues in sampling for population studies, providing as complete as possible overview of the most recent methodologies and tools useful in performing survey studies.

A not Probabilistic Method for a Connected Population: Snowball Sampling
The terminology "snowball sampling" refers to that methodology of social research where "a small [random] sample of persons [is interviewed], asking who their best friends are, interviewing these friends, then asking them their friends, interviewing these, and so on" (Coleman, 1958, p. 29). Goodman (1961) proposed the first mathematical model to "make statistical inferences about various aspects of the relationships present in the population" by "data obtained using an s stage k name snowball sampling procedure" (Goodman, 1961, p. 148).
Goodman's model is rather rigid. It assumes k, the amount of interviewed people per s stages of the chain as a constant. "Stage 0" or s ¼ 0 is a random sample at the begin of the survey. Snijders (1992) offered a different model where there is no fixed k but new participants are randomly sampled from a J i list of "nominee friends" of the i participant.
First proponent of the terminology James Coleman called snowball the procedure of data collection, while Goodman called snowball the technique in data analysis. But when Goodman (2011) returned on the topic of snowball sampling, he noticed that almost all literature in the previous 50 years adopted the terminology of snowball only referring to the procedure of data collection by chaining respondents and furtherly applied this method to gather data from "hidden populations", which are relatively small groups of people associated to deviant activities. Epidemiology of drug consumption and sexual transmissible diseases has seen a broader recognition of the validity of this method, also referred as chain-referral sampling (Atkinson & Flint, 2001). Goodman (2011) highlights a major issue here: While the model he developed after Coleman assumed that what he calls "Stage 0" -the first sample of interviewed -is designed to be random, this assumption is lacking in almost all of the studies on hidden populations, where the sample is often "convenient," which means that is made of people who are willingly to join the research. Snijders (1992) already expressed concern about the actual possibility to random sample Stage 0 for hidden populations. Lee et al. (2017) integrated key elements of RDS into the TSE perspective for a systematic assessment of RDS errors.

Advancements in Snowball Sampling
RDS is strongly related to the work of mathematician Douglas Heckathorn who firstly proposed it in his seminal work on HIV (Heckathorn, 1997). The addition from RDS is that for each participant is made an estimation of degree of potential links and that this estimation, related to the average value of it in the population, is adopted as correction factor. Further developments (Volz & Heckathorn, 2008) completed the method demonstrating that if the in-group bias estimator or "homophily index" was a constant (Crawford et al., 2017), the RDS estimations are asymptotically unbiased.
The concept of "homophily" predicts that the higher the amount of connections within a group of individuals, the more the individuals in the group will show statistical commonality in their observed variables (i.e., sociobehavioral traits). If groups are replaced by clusters in a network, a higher homophily results in one or more clusters strongly interconnected within themselves (but not necessarily between themselves) with every cluster expressing uniformity in observed traits among its nodes. Goel and Salganik (2010) asserted on the topic of estimation by RDS: Across a variety of traits we find that RDS is substantially less accurate than generally acknowledged and that reported RDS confidence intervals are misleadingly narrow. [ . . . ] Notably, the poor performance of RDS is driven not by the bias but by the high variance of estimates, a possibility that had been largely overlooked in the RDS literature. Given the consistency of our results across networks and our generous sampling conditions, we conclude that RDS as currently practiced may not be suitable for key aspects of public health surveillance where it is now extensively applied. (p. 6743) The methodology reached a definitive status as innovative epidemiologic procedure of both data collection and statistical analysis after the publication of White et al. (2015) providing a unified protocol (STROBE-RDS) to report cross-sectional RDS-based studies in epidemiological journals. The authors claimed that more than 450 studies were performed under this framework since mid-2013. Griffith et al. (2016) claimed that RDS produced unbiased estimations for data about geographical distribution of their sample. This is a relevant contribution because it helps to understand how snowball sampling can be thought as conceptual extension of stratification in sampling.
A possible interpretation on why studies performed by "chain referrals" (see, Figure 2) surged in amount after 2010, is that researchers shifted their procedures of data collection through web from mailing lists to active engagement through social media, as we highlighted in Figure 1. The consensus is that social media easily connect people who can live very afar (i.e., in two different countries) by few interests in common, but this does not translate into commonality in sociobehavioral traits; that through a "sharing" function on social media, participants can recruit members of the own same hidden population with a much lower homophily than in traditional "provide a contact" process of referrals.
The opinion that nominal probability online panels are a "gold standard" in survey research should not be accepted without understanding common practices of modern survey research. The lists of contacts willing to participate in the panels (often referred as the "panel" itself) are bought from dedicated vendors. Vendors claim that their gathering procedures are randomized but Craig et al. (2013) found that on seven surveyed panels bought from vendors for the same target populations, on average a fifth of participants of six of the seven panels was involved in at least one other of the six. We like to think that samples can support a certain "quota" of no-randomized individuals within, without a great loss of precision in estimates.
Theoretically, a CAWI should work exactly like any probabilistic method: A sample is randomly drawn from a finite list that approximates a population (i.e., in the past, telephone books proxied a population of households for CATI). In practice, we notice that often the researchers have no tools to enlist an approximation of the target population. We found Etter and Perneger (2000) very noteworthy in this sense. Authors surveyed a random sample of 1,000 residents in Geneva aged 18-70 (primary participants). They asked every contacted subject, even those not willing to participate, to transmit the questionnaire "as much as they can" into smoker and ex-smoker residents in Geneva (secondary participants). In 1997, at the end of the data gathering process, 3,300 residents were mailed with the questionnaire and 1,167 individuals (35%) returned the questionnaire filled as smokers or ex-smokers. Of these, 578 were primary participants and 566 were secondary participants. Primary participants were 1.7 years older than secondary participants (p-value ¼ .03) and were more likely to be men (50% vs. 43%, p-value ¼ .009). However, proportions of current smokers, stages of change, confidence in ability to quit smoking, cigarettes per day, and attempts to quit smoking were similar into the two groups. Among ex-smokers, primary participants were less active than secondary participants in coping with the temptation to smoke. Associations between other smoking-related variables were not very different in primary and secondary participants, however.
Literature shows a consensus about snowball CAWI procedures being less costly than CATI standards. A priori quantification of the final saving for the research budget can be hard but in Siddiqui et al. (2016) a traditional statistical approach ("gold standard") is compared to a snowballbased approach to estimate morbidity and mortality of a rare disease in two districts of India in 2011. The study covered stratified population of 537,153 people from 271 municipalities or villages and concluded that snowball approach was not found sensitive enough to be adopted instead of gold standard probabilistic approach to estimate morbidity of the rare disease. At the same time, authors noticed that comparing costs, snowball approach required one sixth of man-days and half the financial costs of alternative. While the core issue for analytical and empirical studies is to demonstrate that snowball sampling is asymptotically unbiased, we should not miss the opportunity to reach unbiased estimates with costs (in terms of time and money) that are inferior to classical sampling by further developments of snowball methods. Authors of the Indian study noticed that expectations on the effect of network's size of recruiters were misleading, for example, schoolteachers were assumed to be "good" recruiters because of their influence over their class network. This is coherent with the research in TSE in Lee et al. (2017). Instead, mobility of recruiters (e.g., traveling merchants) was the most impactful element of recruitment. Squiers et al. (2016) offered a protocol to measure differences in a treatment for smoking cessation among young smokers in UK. In their empirical study, participants were recruited through a mix of web advertising and panel-based recruitment. They reached 27,360 contacts after collecting 153,936 unique visitors in their website, so 17.3% of visitors were converted into potential contacts of the survey. Of these, 5,604 passed a double screen of: -declared age between 18 and 29 years -anti-fraud algorithm that censored those multiple participations aimed to get financial benefits of participation in the program.
The authors admit that through the mere adoption of anti-fraud, they introduced a bias in their final sample but their result is noteworthy.

Issues in Panel Studies and Technological Solutions
A scientific practice is to observe a phenomenon in at least two different points in a time line ("waves"). Analyses performed over a time line are also called "longitudinal". Trivellato (1999) supports adoption of panels in longitudinal methodology instead of repeated cross-sectional observations in population studies when the object of the study is an observable feature of individuals and not a latent variable of a population. This implies that researchers are also interested in change over time and that they desire to infer causal models from data. Two general issues involved in panel studies are: -firstly, at some wave a previous sample unit may refuse to take part in the survey. This is often called 'attrition': "The analytic problems caused by attrition are more connected to the nature of attrition than to its amount. Indeed, random attrition only affects the efficiency of estimates. But non-random attrition, especially if it is associated with unobserved individual characteristics, can result in unacceptable biases" (Trivellato, 1999, p. 342). According to Fitzgerald et al. (1998), attrition is the main factor in decreasing validity of panel studies, because it does not only inject bias in the population's estimates but it also makes very hard to produce inferences over time by the effect of reducing sample's size. -the second issue is related to the quality of measurement of item response: "Measurement errors are a major concern in panel studies precisely because the main aim of such studies is the measurement and analysis of change, and, typically, panel data on change tend to be more subject to measurement errors than are cross-sectional data on levels" (Trivellato, 1999, p. 342).
We think the two issues fall under the common qualifier of "engagement to the study", which can be declined as willingness to provide reliable and complete answers.
Until recently, the dominant strategy to decrease attrition was to correctly allocate monetary incentives to respondents. Monetary incentives can increase the nominal engagement in the survey but they can still bias the sample by lowering the quality of measures from item responses (Millar & Dillman, 2011;Parsons & Manierre, 2013). We think that an engagement plan should implement progressively increasing monetary incentives so to encourage each respondent to remain on the panel. Recent works show that further adoption of strategies to engage participants in the scientific research as a "civic value" or to retrieve missing values by explicitly requesting them again, has a stronger effect on decreasing attrition, improving quality of answers, and reducing costs of the survey (Biemer et al., 2017;Olsen, 2018).
In design of the study, we can identify two different phases: 1. UX of the tool, the set of technological features which (i) help the respondent of a survey tool to fulfill a task faster and is a less stressful way (Tractinsky et al., 2000) and (ii) enable the researcher to collect new kinds of data through new possible interactions with the tool or to make old research questions from new perspectives. 2. management of panel, all the other strategic approaches to increase engagement without improving usability of the tool.
The research on attrition reduction is recent and still lacks a coherent established framework. Knowledge on attrition reduction strategies is generally built on empirical discoveries (Fumagalli et al., 2012). We noticed a lack of established protocols for panel management.

Gamification Techniques for Increasing Engagement
We define "gamification of experience" all the times a task is perceived as entertaining by a user. This definition has two core ideas: (i) involves something that aims or at least is meant to be entertaining to a degree and (ii) this entertainment is not primarily designed for leisure's or personal's sake of the user, instead is designed toward a task that then must be converted into a desired and measurable result by the experience's provider.
The above definition, which can be understood also as a special case of nudging (Thaler & Sunstein, 2008), may be controversial mostly because it excludes educational purposes. Huotari and Hamari (2012) issued both the theoretical complexity to understand what exactly a "game" is, even from the perspective of "game studies" 1 and the history of gamification techniques in marketing scholars out of mainstream thought in the 70's and 80's. This knowledge is important to understand the true historical route that made gamification the "buzzword" to qualify any psychological technique aimed at optimization of digital architectures (like web surveys).
The real issue is that gamification does not refer to the same abstract idea of game that Bateson (1972) had in mind and neither to an established praxis of predigital era in marketing. Since the emergence of this terminology in 2010, "to do the gamification" means to employ knowledge and designs matured in the context of video game design. This engagement does not come only from the process of gaming itself or "content" (e.g., gaining badges, doing a score) but from what is referred as "mechanics": audios or visuals (e.g., fanfare sound) providing good interactions with the items of the digital tool (Chorney, 2012). Bogost (2015) provided very harsh words against what he perceives as a mere fad for firms. At the same time, an extensive review of the reliability of gamification (Hamari et al., 2014) expresses a moderately positive outlook. We think that gamification can be implemented into CAWI methodologies if it is coherent to the purposes to solve issues of panel survey: attrition and validity of data. Bailey et al. (2015) propose a useful classification for level of implementation of "gamified" experience within a web tool: soft gamification is a set of features impacting experience, aimed to increase psychological engagement in users: sounds, badges of completed tasks, scores, and rankings among participants. This is made to enable sort of competition or sense of progression toward a common aim and so on. hard gamification is the development of a video game within the tool. Tasks of the video game will provide survey data for the research.
We think that while soft gamification increases engagement and validity of data, hard gamification remains controversial because it lacks a clear methodological framework to understand data. Hard gamification is also difficult to integrate into traditional questionnaires. It requires both skill in software programming and creativity in game design (Bailey et al., 2015).
We already encountered the dichotomy between "mechanics" and "content" in video games. It can be helpful to clarify how those can act in soft gamification: mechanics increase engagement directly through the act of fulfilling a task. In this case, where possible, images replace texts, questions are made more "gestural," the answers, instead of being provided by clicking buttons or writing, can be transmitted by moving a cursor, dragging an object, highlighting a text with the mouse, or recording a short audio message. content increases engagement indirectly by raising up interest in being in the project. While responses to mechanics are empirically testable, approaching the content requires an abstract approach to infer user's motivations, instead.
Our practical proposal to implement gamification in a panel study is a peculiar system of storable rewards. Completed the task, the participant receives "points" that are piped through a ranking system. Participants may also lose points if they are late in carrying out the duties. Extra points can be achieved by providing extra personal information under, that is, linking the account in the research platform with their social networks accounts (Facebook, Twitter, Instagram, etc.) and authorizing data gathering of information from those. The rationale for implementing a ranking is that while not all participants may be eligible for a monetary reward, the "best contributors to the research" surely should be. This system prompts participants to be "active part of the research" and provides more information for panel management's operations.
According to our definition of gamification, we found that spending time watching an introductive video may be potentially classified as a gamification technique. But to accommodate the extra costs of videomaking and distribution, we should be sure that this is going to increase the engagement among participants. According to Kalleitner et al. (2020), evidence leads to conclusion that in general an introductive video decreases willingness to participate in a survey, although different demographic groups react differently. This issue gives an insight on an alternative employment of gamification: The UX of the tool may be geared to obtain a deliberate oversample effect of certain demographics. Wenz et al. (2019) carried out a behavioral study in UK on the willingness to complete a web survey or a more generic task (e.g., taking a photograph) on a mobile device. They found that respondents were less willing to participate in downloading and installing an App, while their engagement increased when they were asked to perform tasks that are generally perceived as enjoyable, like taking a photography of something. Their engagement was lower when data were collected passively, instead. The explanation the authors give to this lower engagement is that is the result of a general fear of passive monitoring and lack of control on what data are extracted by their device (e.g., GPS position and other info). Moreover, respondents who reported higher concerns about the security of data collected with mobile technologies and those who use their devices less intensively are less willing to participate in mobile data collection tasks.

New Frontiers of UX Design of Survey and Data Analysis of Pictures with Computer Vision
Another difference is among engagements in mobile devices: 65% of smartphone users in the research was estimated to be "willing" or even "very willing" to use the camera of their smartphone to take photos or to scan bar codes for a survey, while only the 55% expressed the same enthusiasm on filling a questionnaire; 54% of users was estimated to use the camera on a tablet but 68% was willing to fill the questionnaire on this device (see, Figures 3 and 4). The interpretation is that the larger the screen, the easier to fill the form. An estimated Kendall's correlation of .49 in willingness to fill the questionnaire between the two digital devices confirm that willingness to perform task is rather independent from the device. The research concludes that the two main factors of willingness to perform into the survey are the type of requested data (e.g., GPS vs. opinions) and familiarity with the device. Results are consistent with studies made in South America (Revilla et al., 2018) and Germany (Keusch et al., 2019).
The possibility of taking photographs enables the research to ask "old questions" obtaining more robust and more reliable answers. If it is demonstrated that it is also more engaging and less stressful for participants, we could choose this as a good example of advancement in UX of the tool. Some works (Couper et al., 2004;2007) report some experiments to explore how images affect web surveys. These studies suggest that visual contents can facilitate the survey competition but may also affect the provided answers. Meanwhile, receiving an answer in the form of a picture poses a challenge for statistical analysis of data. Therefore, we suggest the idea that computer vision  technologies are already useful tools of analysis in survey data and their presence should be implemented already in research design. Indeed, state-of-the-art computer vision algorithms on image classification and object recognition achieve impressive high-performance levels. Moreover, computer vision technologies are nowadays able to infer more than simple categorical information but even high-level semantic predictions such as text detection from images and detection of the different objects depicted by a picture without limits on the number of objects (Gu et al., 2018). These algorithms can detect multiple objects in cluttered images and indicate the position of each object and its category. Wojna et al. (2017) propose a method that allows to extract structured text information by reading only the interesting parts of the whole image. This system achieved 84.2% accuracy on a challenging task, which consists in detecting and recognizing the text from real-world images taken on streets. The detected texts were related to several categories, such as street signs, street names, and business names. Yin et al. (2019) propose a framework named "Context and Attribute", grounded Dense captioning (CAG-Net), able to localize semantic regions from a given image and describe these regions with short phrases or sentences in natural language (i.e., image dense captioning). The main improvement provided by the authors consists on modeling the contextual coherence among the detected regions of interest.
Combining the knowledge in computer vision and natural language processing (Shah et al., 2019), modern systems are able to answer about the semantic content of an image interacting with the user in natural language (i.e., visual question answering).

Panel Management by Web-Based Solutions CAPI-CAWI
Browser-based surveys with dedicated software are establishing themselves as the standard for small-scale studies, social research, and business. Among other things, they make possible both to carry out cheaper surveys and manage the anonymity of users more easily. Flynn (2018) noted that a substantial feature of the browser-based online survey is the speed at which respondents reply. In this study on a sample of 4,787 contacts selected from a premade panel of 60,000, it is noted that 10% of the contacts provided a response in the first 2 hours, while roughly half of the sample provided a response in the first 2 days. The author noted from the literature review that this type of response pattern is reversed for email, where contacts tend to delay responses to survey activities. Another empirical finding in Flynn (2018) is that the literature seems to suggest that over the period 2010-2018, the dropout rate for the browser-based survey decreased, although it was still lower than CATI.
Based on the information we have collected, our proposal for panel management is to adhere to the practice to contact respondents through other means than web (Lugtig & Toepoel, 2015) and, among these, our choice would be CAPI because planning a first wave of CAPI has many advantages: -to explain in detail the objectives and relevance of the study, -to collect and verify participants' eligibility criteria and consent, -to assist users in web-based registration and follow-up procedures to switch to CAWI and collect further contacts (telephone numbers and addresses), -to assist participants in completing the interview.
In each of the follow-up waves, data can be collected through CAWI and CATI surveys of participants enrolled during the first CAPI wave but CAWI methods are suggested because they: -allow faster response times for faster questionnaire review, -are device independent. The questionnaire, indeed, can be completed with different devices (PC, laptop, tablet, and smartphone), -are self-presented, allowing for greater honesty and minimizing transcription errors, -allow the randomization of the order of the questions and of the statistical elements, -are generally more engaging through visual and interactive features.
The CAWI questionnaire must be perfectly compatible with mobile devices and can therefore be used from any browser on tablets and mobile phones, with an advanced and functional user interface. A mobile web application is preferable to an App because it is compatible with a greater number of devices and operating systems, reducing sample bias. Toepoel and Lugtig (2014) concluded that adapting web surveys to make them easier to complete on mobile phones is an important step in optimizing survey studies.
Recent work by Wenz et al. (2020) investigated the effect of offering personalized data feedback to study participants. The experimental study showed that offering personalised feedback on reported data did not change the reported data, nor did it have the effect of increasing the level of participant engagement. The comparison among probabilistic approaches is useful (Table 1) to evaluate some specific features. Couper and Peterson (2017) found that mobile users take longer to respond to the survey on the web than those using PCs. But smartphone users are probably more familiar with that device and would expect them to complete a survey via a mobile faster. The number of words and the type of question asked play an important role in the response times to be measured in the comparison of response times within and between surveys and devices (Couper & Kreuter, 2013). Finally, further research developments are needed for the time to completion and data quality issues to find the best alternatives for smartphones that responded to surveys.

Monitoring in Survey Studies
Monitoring refers to the act of continuous observation over a period of the progress or emergence of a feature on an object. There are many applications of monitoring protocols in the scientific literature: monitoring of a health condition on patients or epidemics in medicine; monitoring the levels of substances, agents, or other more general qualities from a natural environment; sensory technology and Internet of Things for industrial applications; evaluation of public policies in political science; and so on. Today, thanks to available digital technologies, social researchers can observe behaviours, ask questions, perform experiments, and collaborate in ways that were impossible until a few years ago. Most human activities and sources are now digital (e.g., books, newspapers, payments, photography). As a result, the amount of information available has increased. Digitizing such a large amount of information facilitates its analysis, protection, and transmission. Survey studies always refer to a statistical population, so the implementation of advanced monitoring technologies is not an alternative to the sampling process but a complement to enrich data collection.
Some innovations of digital technologies include: -use of social media such as Facebook, Twitter, and Instagram which requires interaction and sharing between subscribers; -connection with smart devices or activity trackers to analyse the behaviour of panel participants (what they do) along with their survey responses (what they say); -online crowd evaluation. This terminology, used in Geiger et al. (2012), refers to a method of collecting data from user opinions. Opinions can be expressed from textual data or, even more commonly, from a quantitative judgment on an ordinal scale (a "ranking"). This methodology is commonly adopted through online rating platforms such as GoogleMaps, TripAdvisor, or Booking.com (Tomaselli & Cantone, 2020).
Strategic tools for responding to surveys: -a dedicated Internet platform with the same gamified structure of smartphone Apps. A lower dropout rate is expected; -a website dedicated to the study with a public and a private section. Each participant receives the credentials to access the private section to check any news about the study and any other communication collected during the survey. Furthermore, participants can verify and update their personal data (e.g., new addresses or contacts) or preferences for the following waves (CAWI, CATI, etc.); -at the end of the questionnaire, respondents receive a virtual badge that could be used to receive their rewards such as Amazon vouchers, PayPal credits, Apple store, and Play store gift cards. All subjects who complete the follow-up waves could participate in a final lottery. -to maintain contact through the study phases, participants receive regular personalized notices.

Sensory New Technologies
Data collection, in the context of current studies, is a crucial phase to guarantee safe and reliable protocols. The user involved in the data collection needs to be continuously monitored to maintain high-level engagement. In the current section, we will sketch some possible scenarios. Figure 5 shows a general chart that describes the interactions between new sensory technologies and continuous data collection and analysis of survey data and related specific tasks (e.g., tracking, profiling, monitoring, measuring). User tracking and profiling could be implemented by considering users' behaviour by different modalities involving both hardware (and physical tracking, e.g., mobile tracking in shops) and/or software. In some cases, also facial recognition systems could help in practical situation to recognize the users in different places/shops. It could also be possible to implement user profiles by considering online web-tracking and profiling strategies (e.g., cookies) and/or the possibility to implement specific mobile Apps that send proper alerts whenever some condition about the user will be happened.
One of the main source of information exploited through new technologies for survey design is represented by the so called 'organic data' (Couper, 2011). The diffusion of smart devices, wearable sensors, and the sharing of high amount of information about ourselves, our behaviours, and preferences furthered the definition of systems able to exploit the users' information produced by using common personal devices (e.g., smartphones) or browsing the web. When a person interacts with any web-based application, it leaves traces of his activities. Nowadays, several techniques and technologies able to collect, analyse, and obtain useful inferences on this information have been developed and continuously process generated data everyday by users. Such approaches are classified as "intercept surveys" in the topology defined by Couper (2000).
This continuous flow of data allows these systems to increase their performances over time. An iconic example of such system is represented by the field of recommender systems , able to predict users' preferences starting from a few cues (e.g., a web-page visit, a click, or a scroll on Facebook's home page) by exploiting the large knowledge of previous users' interactions that represents the system experience. These systems are now able to suggest movies, online purchases, and holiday destinations by creating advertising campaigns personalised on the single user profile.
Face recognition is one of the most powerful technologies to keep track of people. Indeed, current techniques can recognise a person from a single frame exploiting the knowledge of a picture of that person (Schroff et al., 2015). Future scenarios could involve a network of connected cameras placed in proper and authorised places (e.g., hospitals, clinics, pharmacies, and so on) and be programmed to recognise participants when present. Such a system can be combined with the position of the user that can be obtained in different ways: cellular signal, GPS sensor, social media interactions, and tagging, among others. When such an event is detected, specific software activities can be triggered by means of the Apps installed on the user's personal devices. As instance, if the system detects that the user is standing in a line (e.g., waiting for a doctor's visit), the system can propose a questionnaire to the user who can be tempted to participate while he or she is waiting.
Social networks are another pervasive media by which is possible to monitor the user's activity and, at the same time, involve the user to specific activities aimed at gathering specific information by means of the gamification paradigm. In this context, in the last years the research field of sentiment analysis applied on multimedia contents shared through social media platforms registered a rapid increment in terms of applications, algorithms, and public large-scale data sets (Ortis et al., 2019a). The aim of sentiment analysis is to analyse and infer the opinion of individuals or specific groups of people toward a specific topic such as an event or a product. Until the large spread and availability of images, the main tasks related to this field were limited to the textual domain (e.g., review analysis). Now, the growth of social media platforms allowed several applications that benefit from the automatic analysis of textual and visual information produced and/or consumed by social media platform users every day to predict whether an image/text will evoke positive or negative reactions to viewers (i.e., content polarity; Ortis et al., 2020c), what content will be shared/liked most by users (content popularity; Ortis et al., 2019b), or what parts of the post most contributed to the "virality" of the content (i.e., content "virality"; Deza & Parikh, 2015).
Some of these methods can be further specialised on the single user (i.e., profiling) exploiting the specific user's preferences learned from the history data of their interactions in the social media platform (e.g., past image/text reactions in terms of likes, shares, comments, and interactions with other users). As social media platforms are used daily by people to share their personal opinions and to be acquainted with news and other opinions, advanced research methods for the remote monitoring of people activities as well as for the estimation of their opinions toward specific topics have been developed in the last decade (Ortis et al., 2020b). These technologies and methods can be exploited to increase the engagement of social media users in participating and/or continue to participate to scientific surveys.
Wearable sensors (Mukhopadhyay, 2014) are now able to detect several signals such as photoplethysmography (PPG), electrocardiography (ECG), skin conductance, temperature, and so on, that allow the inference of physiological values (e.g., blood pressure, heart rate, etc.) often correlated with emotional and mental states. Rundo et al. (2018) proposed a PPG-ECG pattern recognition system which exploits a mathematical correlation between the ECG signal and the derivative of the PPG one. This system provides a way to obtain a reliable PPG signal, useful for monitoring physiological parameters over time such as the blood volume pulsations just employing sensor on the skin, hence in a noninvasive way. Most of wearable sensors can be connected to the smartphone through Bluetooth technology, the collection of all this information can be sent through the smartphone to a cloud platform, which collects and tracks statistics about the user's state over time (Bonato, 2003;Pentland, 2004). A set of conditions based on values reached by the means of the collected information can be defined. Each condition can then trigger a specific alert to the user. For example, wearable sensors and smartphone applications are combined to develop "smart" smoking detection systems, able to automatically infer the number of cigarettes that have been smoked by a subject within a period of observation (Ortis et al., 2020a). In almost all cases, some formal agreements with users will be explicitly provided and shared with them to be aligned with current regulation in terms of data privacy concerns and related responsibility.

Final Remarks
This study represents an attempt to collect and critically present various modalities of data collection, which properly employ new technologies to solve common issues related to traditional methods for data gathering. In particular, the advantages of using the available technologies, their potentialities, and the current limitations of each alternative are detailed. We reported how different studies, in various fields, have already experimented the impact of such strategies, especially in terms of level of engagement of the involved people.
Although several solutions have been proposed in the last few years, the most promising areas, in our opinion, regard the possibility to monitor, follow, and trigger reactions in the participants employing user-tracking strategies combined with ad hoc involvement strategies (e.g., gamification). Instruments and methodologies offered by the world of Innovation Computer Technology (ICT) can be very useful but must be tailored and adapted to each user and referred to the specific context under observation.
The wide landscape of social media and sensory technologies allows access to huge amount of user-generated data at large scale, which requires proper strategies to be collected, analysed, and archived leading to Big Data management. Considering the pervasively presence of the social media platforms on our daily lives, such methods have been successfully used to infer people opinion toward topics or events, resulting in a large body of research.
Future efforts can be devoted to the application of such strategies to monitor and raise the level of engagement of participants involved in survey studies. With this in mind, we are planning to carry out a simulation research about social media-based sampling, which will involve the monitoring of a large audience of users' behaviours on multiple social media platforms. Focusing on selected topics, we aim at showing that new technologies represent new possibilities to address the issues of traditional methods as discussed in this study.