The Social Dilemma of Big Data: Donating Personal Data to Promote Social Welfare

When using digital devices and services, individuals provide their personal data to organizations in exchange for gains in various domains of life. Organizations use these data to run technologies such as smart assistants, augmented reality, and robotics. Most often, these organizations seek to make a profit. Individuals can, however, also provide personal data to public databases that enable nonprofit organizations to promote social welfare if sufficient data are contributed. Regulators have therefore called for efficient ways to help the public collectively benefit from its own data. By implementing an online experiment among 1,696 US citizens, we find that individuals would donate their data even when at risk of getting leaked. The willingness to provide personal data depends on the risk level of a data leak but not on a realistic impact of the data on social welfare. Individuals are less willing to donate their data to the private industry than to academia or the government. Finally, individuals are not sensitive to whether the data are processed by a human-supervised or a self-learning smart assistant.


Introduction
In this study, we examine several factors that encourage individuals to provide personal data to a database that promotes social welfare. Modern data-driven technologies have the potential to improve the well-being of ordinary citizens in various domains of life. Organizations have implemented these technologies, for example, through smart assistants, augmented reality, and robotics. Individuals who benefit from these technologies provide sensitive and valuable data to academia, the government, and the private industry. Firms can generate additional profits using these data. By making their data available, individuals may, however, also support organizations that promote social welfare. The United Nations (UN) (2014) has publicly called for the mobilization of individual data and initiated a global data partnership (www.data4sdgs.org) and database (https://www.sdg.org/#catalog) to promote environmental and social sustainability. Yet the available data for digital technologies still need improving. As the Global Partnership for Sustainable Development Data (2020) notes, whole groups of people and important aspects of their lives are still not captured digitally. More diverse, integrated, and trustworthy data could lead to better decision making and real-time citizen feedback that has the potential to promote social welfare.
The necessity to mobilize individual user data has recently become omnipresent in the context of governmental policies to fight the COVID-19 pandemic. Various countries around the world have developed and implemented tracking apps that use personal infection and location data to help control the spread of the virus and to protect public health (e.g., Germany: Corona-Warn- the mobilization of individual user data can promote social welfare include the tracking of human migration to ensure medical support during an earthquake (UN 2015) and the tracking of deforestation combining satellite imagery and citizen-generated data (UN 2020). In the future, data mobilization will become technologically feasible in ever more scenarios and increasingly relevant because of the simultaneous increase in data availability and global challenges such as migration and climate change.
Data donations not only are beneficial to society but also, unfortunately, constitute a social dilemma. While society at large could benefit from data donations, each individual has an incentive to deviate from donating personal data, because data donations come with personal privacy risks. Individuals might thus freeride on the contributions of others. If not enough people donate their data, the respective database is not usable to operate a data-driven technology, and everyone is worse off than had they cooperated and donated their data. In other words, if enough people donate their data, a public good emerges: a database large and diverse enough to operate data-driven technology to increase social welfare. Regulators therefore need to introduce efficient ways to help the public benefit from its own data.
To examine the factors that encourage individuals to provide their data, we focus on smart assistants as a data-driven technology to increase social welfare. Smart assistants can convert large amounts of data into personalized information and help users make socially desirable choices, for example, by selecting relevant information according to consumption patterns and providing tips that are tailored to individual habits and easy to follow. However, to develop and operate such an assistant that offers informed and comprehensive decision support, a smart assistant must have access to a sufficiently large database of diverse, timely, and trustworthy data. Popular examples of smart assistants are Amazon's Alexa and Apple's Siri. Decisionsupport systems, however, also allow for efficient energy management in households (Kolokotsa et al. 2009). Other examples include big data-based smart farming systems (Wolfert et al. 2017) and clinical decision-support systems that improve health care decisions by providing intelligently filtered information to health care professionals (Musen et al. 2014). In the following analysis, we provide insights into what makes people donate their data to a database that is used to develop and operate a smart assistant to increase social welfare. In an online experiment, we investigate how the willingness to donate personal data (WDPD) to a database changes with the risk of data getting leaked and a data-driven smart assistant's impact on social welfare, the organization that manages the database to operate a smart-assistant, and a potential human component of the underlying algorithm of the smart assistant. We compare two domains of social welfare: a sustainable environment and a sustainable health system.
In the past two decades, an extensive literature on the sharing of data has emerged. However, this literature often neglects the social dilemma of data sharing and frequently presumes that data from a single individual alone are utilizable (UN 2014, Cai andZhu 2015). Our study aims to better understand how individuals donate personal data in a scenario in which the corresponding database must be sufficiently large and diverse to develop and operate a technology that promotes social welfare. To some degree, we thereby contribute to solving the social dilemma of big data and help society benefit from the value of its personal data. The remainder of the article proceeds as follows: in section 2, we relate our research question to existing literature and develop testable hypotheses. In section 3, we describe the empirical implementation of our online experiment. Section 4 presents the data and outlines the empirical results. Section 5 presents a discussion of our findings and concludes. example, on an individual or institutional level, on a local or international level, and in a centralized or decentralized way (Chawinga and Zinn 2019). In recent years, data sharing on the individual level has received growing interest by academics and policy makers. This interest, among others, results from citizens generating enormous amounts of data through the increasing use of digital devices and services as well as increasing concerns with data privacy.

Theoretical Considerations and Hypotheses Development
Different disciplines have investigated data sharing under privacy risks. Information systems scholars developed the privacy calculus model in the context of technology usage (Dinev and Hart 2006), which has become a well-established concept to investigate data sharing under privacy risks on an individual level. 2 In the privacy calculus model, the term "calculus" refers to the rational cognitive cost-benefit trade-off that technology users face: realizing the expected utility by disclosing personal data versus avoiding the anticipated costs of a privacy violation by not disclosing personal data (Culnan andArmstrong 1999, Dinev andHart 2006). The individual decision of whether to disclose personal data or not depends on the respective outcome of this cost-benefit analysis. Empirical research evidences that the individual benefits are often perceived as outweighing the costs of privacy risks. Individuals are willing to share their personal data and take privacy risks, for example, to participate in social media networks (Dienlin andMetzger 2016, Choi et al. 2018), to receive financial rewards (Grossklags and Acquisti 2007), to make e-commerce transactions (Dinev and Hart 2006), and to receive personalized content or recommendations (Sun et al. 2015, Kim andKim 2018). The emergence of the Internet of Things (IoT) and the use of mobile devices have further accelerated the type of technology that enables data sharing through, for example, smartphones (Keith et al. 2010) and IoT devices such as refrigerators (Kim et al. 2019).
The digital marketing and behavioral sciences literature has extended the rational perspective of the privacy calculus model by examining attitudinal and contextual factors that work as antecedents of individual data disclosure. Factors that complement the cost-benefit analysis of the privacy calculus model include trust (Joinson et al. 2010), anonymity (Pu and Grossklags 2017), sensitivity of information (Mothersbaugh et al. 2012), past privacy experience (Xu et al. 2012), extroversion and attitude (Chen 2013), and social privacy norms (Zlatolas et al. 2015).

Data
Philanthropy. In many cases, using digital services with privacy risk equals an exchange of data for personal benefits (Xu et al. 2009). However, as in the case of sharing health data during a pandemic, data can also benefit society at large. How do individuals decide in a cost-benefit trade-off whether they should disclose their personal data to increase social welfare rather than their personal utility? When individuals rely on the rational approach of the privacy calculus model, they may not find it worthwhile to share their data and risk their privacy to promote social welfare. Nevertheless, individuals might engage in self-sacrificing behavior and disclose personal data, even though it may be rational for them to prioritize the protection of their privacy rights over societal benefits. Such behavior can occur if individuals view data disclosure less as a rational exchange of goods and more as a morally and emotionally motivated donation to a good cause. Kirkpatrick (2013) first coined the term "data philanthropy" for the donation of data by individuals or profit-oriented organizations without expecting personal benefits in return. The literature on general donation behavior provides important insights into why individuals might be willing to donate their data and take a privacy risk to promote social welfare. De Groot and Steg (2008) and Schwartz (1970) show that individuals might consider a self-sacrifice morally obliging if it helps others or maximizes social welfare. In addition to acting according to one's moral values, individuals often donate to provide public goods, because they have a desire for what is called a "warm glow" (e.g., Andreoni 1990, Ferguson et al. 2012. The term "warm glow" refers to the need to perform prosocial acts (e.g., helping others) and the simultaneous expectation to feel good afterward (Null 2011, Luccasen andGrossman 2017). Perceived moral obligations and the warm-glow effect might thus foster individuals' willingness to donate their data and to waive their right to privacy to promote social welfare.
Data donation behavior has been studied first and foremost empirically in three contexts: data donation in academia, medical data donation, and data disclosure for terror and disaster control.
Regarding data donation in academia, scholars have studied the donation of data from nonresearchers to academia (e.g., Liu et al. 2017) and the sharing of data sets between researchers (e.g., Fecher et al. 2015). The donation of data from nonresearchers to academia is associated with individual costs, such as effort and loss of control, while the benefits are favorable to the public in general, for example, in the form of new basic knowledge (Breeze et al. 2012, Bezuidenhout 2013. Through the sharing of data sets between researchers, new knowledge is generated by reanalyzing existing data (Woolfrey 2009). Open access to data can also provide transparency and protect against academic misconduct (Chawinga and Zinn 2019).
Research investigating what drives nonresearchers to donate their data to academia shows that key determinants include the perceived need for donation (Nov et al. 2014), the perceived reputation of the organization (Liu et al. 2017), altruism (Rotman et al. 2012, Goncalves et al. 2013, and social signals and attitude (Liu et al. 2017). Studies investigating what prevents researchers from sharing their data with the wider academic community have identified factors such as a loss of control and fear of misuse (e.g., Acord andHarley 2012, Bezuidenhout 2013), time and effort (e.g., Breeze et al. 2012, Huang et al. 2013, Chawinga and Zinn 2019, and sociodemographic variables such as age, nationality, and character traits (e.g., Acord and Harley 2012, Enke et al. 2012, Fecher et al. 2015. While medical data are particularly sensitive (Soni et al. 2020), the factors influencing data donations in a medical and health context have considerable overlap with the findings from the academic domain. Research has shown that, for example, time and effort (Rudolph and Davis 2005, Morse 2007, Wright et al. 2010, and the fear of misuse (Lopez 2010) influence the decision to donate data in the medical and health context. Moreover, research investigating the trade-off between security and privacy in the contexts of terrorist crises and disaster control shows that fear contributes significantly to people's willingness to disclose their personal data Silver 2004, Pavone andEsposti 2012). Even if people do not benefit directly from the disclosure of their data, they are still willing to donate their data to protect the population from terrorist attacks (Reuter et al. 2016).

The Social Dilemma of Big Data.
Many people disclose their data to digital service providers and risk their privacy in exchange for even small rewards (Acquisti et al. 2013).
However, the data's positive impact on social welfare cannot unfold because the organizations managing these data often do not pursue social welfare goals. To generate benefits for society at large, the data could alternatively be managed by organizations that primarily strive for increasing social welfare. These organizations may include, for example, academia, the government, and the private industry. If enough people voluntarily provide these organizations with their personal data, they could operate technologies such as smart assistants and thereby help promote social welfare.
As outlined in the introduction, data donations for a public good constitute a social dilemma.
Social dilemmas are "situations in which a non-cooperative course of action is (at times) tempting for each individual in that it yields superior (often short-term) outcomes for self, and if all pursue this non-cooperative course of action, all are (often in the longer-term) worse off than if all had cooperated" (Van Lange et al. 2013, p. 126. The social dilemma of big data involves not only a social conflict (individual vs. collective interests) but also a temporal conflict (short-term vs. long-term consequences). For the individual, the protection of privacy expires immediately, while for society, the positive impact of a sufficiently large database comes with a time delay . We define the social dilemma of big data as a delayed public good dilemma, which means that individuals must give their data so that, over time, a large and diverse data set emerges that various organizations can then use with the goal to increase social welfare. Individuals cooperate in public good dilemmas, for example, because they feel a moral obligation (Chen et al. 2009), because they know that their cooperation contributes positively to the public good (Kerr 1992), and because they believe that other individuals will also cooperate and contribute (Dawes et al. 1976).
Another important characteristic of the social dilemma of big data is uncertainty. When disclosing personal data for a public good, individuals face two types of uncertainty: environmental uncertainty (i.e., uncertainty about the situation and conditions for obtaining the public good) and social uncertainty (i.e., uncertainty about the decisions of others) (Orbell et al. 1988). Thus, individuals do not know the exact threshold of data required to generate a usable database that can increase social welfare. The critical data mass depends on factors such as data quality, variety, and use. Furthermore, individuals do not know whether a sufficient number of other people are also cooperating and donating their data and, thus, whether an increase in social welfare can be achieved. Both social and environmental uncertainty lead to lower cooperation rates in public good dilemmas (Wit and Wilke 1998), in some cases through a reduced perceived obligation to cooperate (Fleishman 1980).

Hypotheses Development
The costs and benefits of a product or service affect individual behavior. This relationship applies to charitable donations (e.g., Acord andHarley 2012, Bezuidenhout 2013) and technology usage behavior (Dinev and Hart 2006) as well. Privacy costs are central in the disclosure of personal data. If an individual's privacy is violated, he or she may face severe negative long-term consequences. For example, the leakage of financial, health, or location data can serve as a diagnostic measure of sensitive individual attributes, such as religious or political views and possible health concerns (Gambs et al. 2011). Consequently, and as we argued previously, the sharing of personal data for a public good structurally resembles a social dilemma.
In developing our hypotheses, we rely on the literature that investigates cooperative behavior in self-sacrificing dilemmas under risk and social uncertainty. What influences individual decision making in a social dilemma is its payoff structure (e.g., Rapoport 1967, Komorita andParks 1994). It is well documented that negative payoffs such as personal costs lead to significantly lower cooperation rates in social dilemmas (e.g., Dawes 1980, Cress et al. 2006, Gangadharan and Nemes 2009). Thus, we hypothesize the following: Hypothesis 1a: A lower privacy risk increases the WDPD to a database that can be used to promote social welfare.
The WDPD varies with the nature of the generated benefit (e.g., Sun et al. 2015, Dienlin andMetzger 2016). Benefits can be symbolic, hedonic (e.g., additional values such as better service or offer personalization), and utilitarian (e.g., goods, monetary advantages) (Xu et al. 2009, Sun et al. 2015. Individuals weigh their own privacy costs against the enhancement of social welfare using a cost-effectiveness analysis, a subcategory of a cost-benefit analysis (Newcomer et al. 2015). The perceived effectiveness of donations or social behavior increases the willingness to actually donate or perform social behavior. The greater the positive outcome of a donation, the greater is the willingness to donate (Ye et al. 2015).
The same is true for certain contexts of data donation. For example, people are more willing to release their data for terrorism protection if they believe the data will have an impact (Reuter et al. 2016). In social dilemmas, the positive outcome of individual cooperation is expressed in payoffs. Uncertainty in payoffs typically reduces the willingness to cooperate (Budescu et al. 1990, Levati andMorone 2013), for example, by providing a justification for noncooperative behavior (Van Dijk et al. 2004). Moreover, individuals are more willing to incur personal costs and contribute to a public good the higher the payoff levels, even if they are uncertain (Dawes 1980, Dickinson 1998, Balliet et al. 2011). Efficacy plays a major role in cooperative behavior as well (Kerr 1992). The greater the impact an individual can have through a cooperative action such as data disclosure, the more willing he or she is to incur personal costs such as privacy risk. We therefore hypothesize that the more willing individuals are to donate their personal data to promote a database as a public good, the greater is the positive impact of the database on social welfare.

Hypothesis 1b:
A greater positive impact of the database-driven smart assistant on social welfare increases the WDPD.
In a social dilemma with privacy risk as a personal cost, no direct personal payoffs, and uncertain and delayed societal payoffs, it may not be rational for individuals to donate their data based on a cost-benefit analysis. However, individuals might do so anyway, because they perceive data donation as the morally appropriate action. Decisions are not always an outcome of a cost-benefit analysis, but of personal beliefs about what is right and wrong. The importance of normative concerns in the context of social dilemmas is emphasized in popular models, such as the appropriateness framework of Weber et al. (2004). The appropriateness framework posits that cooperation decisions are essentially influenced by three factors that make individuals ask themselves, "what should a person like me do in a situation like this?" One of the three factors is the use of decision rules and heuristics (e.g., treating others as one would like to be treated).
Morality plays a central role in general prosocial and environmental behavior (Van Liere and Dunlap 1978), in charitable-giving behavior (Sanghera 2016), and cooperative behavior in public good dilemmas (Chen et al. 2009). People often judge the morality or the moral obligation of certain decisions based on utilitarian criteria (Kahane et al. 2015). According to classical utilitarianism, decisions should be made according to the criterion of maximizing social welfare, regardless of what would be best for oneself or loved ones (Bentham 1789, Sidgwick 1907. Moral judgments play a critical role in motivating and enforcing human cooperation in social dilemmas (Gray et al. 2012). One of the underlying mechanisms is that people experience positive emotions after behaving according to their perceived moral obligations (Andreoni 1990) and negative emotions such as guilt or remorse when ignoring perceived moral obligations (Rivis et al. 2009). We therefore expect individuals to be more likely to donate their data if they perceive data donation as morally obligatory based on their internal norms.

Hypothesis 2a:
The perceived moral obligation to donate data to a database is associated with a greater WDPD.
Emotion-based moral judgments are based on intuitions and feelings and are often formed quickly and intuitively (Greene et al. 2001, Wheatley and Haidt 2005, Haidt 2007). Moral reasoning follows ex post. Emotion-based moral evaluation has historical connections with the view of Hume (1751) and Smith (1759) (see also Cubitt et al. 2011). In quick and intuitive gut reactions, moral evaluations may differ even in nearly identical scenarios: moral evaluations are situation-specific and dependent on framing (e.g., Krebs 2008). Judging with utilitarian criteria, the greater the positive impact on social welfare from an action, the greater is the perceived moral obligation to perform this action. We thus argue that donating data could be perceived as morally more obligatory the greater the impact of the database-driven smart assistant on social welfare. The impact of the smart assistant on social welfare may thus support the decision to donate data because of an increased perceived moral obligation to do so. Moral judgments can also be self-serving if people evaluate actions differently when the consequences affect them personally and their loved ones than when a third group is affected (Greene 2014).
Thus, when a prosocial action implies negative consequences for the individual, such as the risk of a data leak, he or she subconsciously tends to evaluate an action as less morally obligatory.
In this way, individuals intuitively reduce cognitive dissonance and negative emotions, if a prosocial action is not actually undertaken. We argue that donating data could thus be perceived as less morally obligatory the higher the privacy risk the individual thereby incurs.

Hypothesis 2b:
The perceived moral obligation to donate data mediates the effects of the privacy risk and the impact of the smart assistant on the WDPD.
If a person chooses to donate his or her personal data to support a public good such as a database, he or she will subsequently have no insight into whether the data will actually be used for the declared purpose, such as to increase social welfare. Given this uncertainty, the reputation of the data-collecting organization is an important factor when making the donation decision. Bednall and Bove (2011) find that a positive reputation of the collecting organization motivated people to donate more blood. The more positive the organization's reputation, the greater are the perceived integrity and trustworthiness and the lower is the perceived risk associated with the donation. Comparable effects are also observed for other donation behavior. A charity's reputation has a significant influence on whether a donation is taken into consideration (Bendapudi et al. 1996). Drawing on these findings, Liu et al. (2017)  outcomes. We therefore assume that individuals ascribe different attributes to organizations that collect data to build a database to increase social welfare and, accordingly, vary in their willingness to disclose data to them. We expect the willingness to provide data to academic and governmental organizations to be greater than that to the private industry, because the private industry primarily pursue profit-maximizing interests (Bhattacharjee et al. 2017;Eyster et al. 2020) and are trusted less to promote social welfare (Lin-Hi et al. 2015).

Hypothesis 3: The WDPD is different for a database operated by academia, the government, and the private industry to develop and run a smart assistant.
Computers and algorithms become increasing important components of decision-making processes (Esmaeilzadeh et al. 2015;Inthorn et al. 2015). Although individuals consistently rely on technological support to make decisions, they tend to rely less on algorithm-generated information than on human-generated information (Önkal et al. 2009). People tend to have an algorithm aversion (Dietvorst et al. 2015). This aversion is particularly pronounced when people have seen an algorithm generate erroneous information; even if the algorithm is known to provide better decision support on average than a human (Dietvorst et al. 2015), people are more intolerant of small errors made by algorithms than of large errors made by humans (Dietvorst et al. 2015). The technical nature of algorithms is increasingly characterized not only by automation but also by autonomy. De Visser et al. (2018, p. 1409) define autonomy as "technology designed to carry out a user's goals, but that does not require supervision." Smart assistants are based on autonomous algorithms as well. When investigating data donation choices, it is therefore important to consider that the technical nature of a smart assistant determines how the data are analyzed to derive personalized information (e.g., specific tips and action recommendations).
We assume that, in simplified terms, two types of algorithms vary in their autonomy degrees.
In case of a smart assistant with a self-learning algorithm, rules for personalization autonomously change depending on how the user reacted to past information. Consequently, the selected personalized recommendation will also change over time, depending on the rules the smart assistant automatically modified. In case of a smart assistant with a human-supervised algorithm, rules for personalization do not autonomously change depending on how the user reacted to past information; however, a human can manually change the rules. Consequently, the selected personalized recommendations will change over time, depending on the rules a human manually modified. We hypothesize that because of algorithm aversion, individuals would prefer a smart assistant whose decision support is not fully automated but can, to some degree, be modified by a human. Research on how to overcome algorithm aversion shows that people do not prefer complete autonomy and are significantly more likely to use even imperfect algorithms if they can easily modify the algorithm (Dietvorst et al. 2018).
A data-driven technology's service such as a smart assistant's decision support for a large group of people or entire societies could not or only with disproportionate effort be entirely provided by humans. Despite this, the autonomy of the smart assistant could, however, be designed to varying degrees, as in the case of a human-supervised and self-learning smart assistant.
Drawing from the literature on algorithm aversion, we therefore hypothesize that individuals would be more likely to donate their data to a database if the data were used to operate a smart assistant with reduced autonomy.

Hypothesis 4:
The WDPD is greater for a database that is used to develop a humansupervised smart assistant than for a database that is used to develop a self-learning smart assistant.

Experimental Design and Interventions
We conduct an online experiment with treatments that rely on between-subjects and withinsubject designs to test our hypotheses. The experiment has a 3 × 3 design and is followed by an online survey to control for potential confounding variables and characteristics. The experiment considers two domains, both of which include the identical 3 × 3 design but vary in the social welfare domain promoted by the smart assistant: a sustainable environment (domain 1) and a sustainable health system (domain 2). The experiment has been preregistered at the AEA RCT Registry and obtained ethical approval from the Ethics Commission of the authors' university.
Before participating in the experiment, individuals received an explanation that the UN has launched a call for more data to support the Social Development Goals and how public goods benefit from that data. Participants learned what a smart assistant is, how it can use data to promote the goals, and why it needs a sufficient amount of data to do so. Participants were further advised that the disclosure of data always involves certain privacy risk. We then provided the participants with the following scenario (domain 1): "A smart assistant could support US users in living environmentally friendlier everyday lives, thereby promoting a sustainable environment. Every English-speaking person with a smartphone in the United States could use the smart assistant. However, to develop and operate an assistant that offers informed and comprehensive decision support on environmentally friendlier behavior, there must be access to a sufficiently large database of diverse and trustworthy data. The database requires a given list of data sets in an anonymized form." We asked participants to imagine that they could easily and anonymously upload their personal data to the database. We presented participants with two options. Either they could donate their data to the database, accept a certain level of privacy risk, and contribute to the development of a smart assistant that has an impact on a sustainable environment or they could not donate their data, completely avoid the associated privacy risk, and not contribute to the development of a smart assistant that has an impact on a sustainable environment.
The actual experiment consisted of three parts. In part 1, we provided participants with one of three varying levels of risk of their data getting leaked (treatment 1) combined with one of three varying levels of the impact of the smart assistant on social welfare (treatment 2). We randomly assigned the participants to the domains and nine treatment combinations through a designated function of the software Unipark. Figure 1 depicts an overview of the treatments per domain.
[ Figure 1 around here] The operationalization of the risk treatment was identically for both domains using the following wording: "The risk of data being leaked from this type of database is approx.
[0.001/10/20]%. This corresponds to the leakage of data from [1 of 1,000/10 of 100/20 of 100] individuals." Because no reliable academic quantification of data leak probabilities exists, we consider the risk interval between 0.001% and 20% realistic, in line with a recent report of a large cybersecurity company (Varonis 2019). We also calibrated the chosen risk levels in a pretest with 195 students from the faculty of business studies and economics at the authors' university. In the online experiment, we showed all participants a list of data types they would provide if they donated, because the willingness to provide data clearly depends on the categories of data to be provided (Phelps et al. 2000). The shown list of data categories came from the Personal Information Protection Commission (2013) (see also Lim et al. 2018).  (2016) and Molinari et al. (2007).
All participants took part in part 1 of the experiment (see Figure 1 for an overview). Then, they were randomly assigned to either part 2a or part 2b of the experiment. In part 2a and 2b, participants could choose between different databases when donating their personal data. All databases required the same data, had an identical privacy risk, and were used to develop a smart assistant that promotes one of the two social welfare domains. The risk and impact levels corresponded to the treatment combination assigned in part 1. The databases in part 2a differed in terms of the organization that operates the respective database to develop and run a smart assistant: academia was operationalized by an Ivy League university, the government was operationalized by a federal US agency, and a profit-oriented organization was operationalized by a large US tech company. The databases in part 2b differed in terms of the technical nature of the smart assistant, which would be developed depending on the respective database: a smart assistant using a self-learning algorithm and a smart assistant using a human-supervised algorithm to derive personalized information (e.g., specific tips and action recommendations) to promote environmentally friendlier or more healthful user behavior. We did not operationalize the individual algorithm type further but briefly explained it to participants (see Supplementary Materials, B5).

Target Population and Sample
We test our hypotheses on US citizens. Participants were recruited from the crowdworking platform Amazon Mechanical Turk (MTurk). Although a sample from MTurk is not necessarily representative of the US population, various studies have successfully replicated a wide range of established economic and psychological effects and empirically validated the use of MTurk as a useful data collection tool (Schnoebelen and Kuperman 2010, Gibson et al. 2011, Becker et al. 2012, Crump et al. 2013, and relevant research that relies on MTurk respondents has achieved robust results (Bonnefon et al. 2016). Furthermore, MTurk samples are considerably more heterogeneous than student samples from laboratory studies (Hussy et al. 2010).
Crowdworkers on MTurk have a particularly diverse backgrounds (Mason and Suri 2012), which are crucial for the external validity of our results. We targeted the online study exclusively at workers who are US citizens and over 18 years of age. 3 We executed the experiment by posting a human intelligence task (HIT) on MTurk. The HIT provided a description of the task, the participation requirements, compensation, and instructions on how to proceed. Interested MTurk workers were instructed to click on a survey link, which forwarded them to an online survey in Unipark. Participants received between US$0.40 and US$0.55 for taking part in the HIT, depending on how they answered incentivized items on the expected donation behavior of others and their social value orientation in the survey. On the last page of the survey, the workers received an automatically generated unique code that they had to enter back on the MTurk website to trigger their payment. Workers could only participate once in the HIT.
We collected the data over a two-day period (September 14 and 15, 2020). Of the 2,552 workers who clicked on the link, 1,883 filled out the online survey completely. Responses from workers were excluded from the data set if they answered at least one comprehension question incorrectly, they stated being non-US citizens, or they reported an age of less than 18 years.
The final sample includes responses from 1,696 participants with an average response time of 12 minutes. In a pretest, the average response time was 19 minutes. Because we requested participants in the pretest to carefully check all potential mistakes in our survey, we consider the shorter response time during the HIT reasonable. The participants were randomly assigned to the domains and treatments.

Variables
We consider two dependent variables. First, we investigate participants' WDPD depending on the risk level and impact of the smart assistant. Second, we investigate their relative WDPD to different managing organizations and types of algorithms. To test hypotheses 1a and 1b, participants needed to indicate their WDPD on a 1-100 slider. We adapted the original wording of Bonnefon et al. (2016) to the activity of data donation. The variable is the response to the following question: "How inclined are you to upload your personal data to the database?" (0% = not at all likely; 100% = extremely likely).
To test hypotheses 2a and 2b, we measure the moral obligation to provide data. Participants indicated their moral obligation on a 5-point Likert scale. We adapted the original wording of Kahane et al. (2015) to the activity of data donation. The variable is the mean response to the following two questions: "Do you think that there is a moral obligation for people to upload their personal data to the database?" (1 = It would be wrong for people to upload their personal data to the database; 3 = People don't have to upload their personal data to the database, but it would be nice if they did; 5 = People must upload their personal data to the database) and "How morally wrong is it if people do not upload their personal data to the database?" (1 = perfectly fine; 3 = neither fine nor wrong; 5 = deeply wrong).
To test hypothesis 3, we investigate the relative WDPD to each of the presented operating organizations of the database (academic, governmental, and profit-oriented organizations).
WDPD and relative WDPD rely on the same question; however, the question items differ in their respective answering options (Bonnefon et al. 2016). For WDPD, participants responded using a single slider. To identify the relative WDPD variable, participants needed to use multiple sliders in relation to each other, with the sum of the sliders equaling 100. Thus, indicating their individual willingness to provide data to one of the databases negatively correlated with their willingness to provide data to the alternative database. We used related sliders because we are primarily interested in ranking the WDPD per operating organization rather than the absolute magnitude of WDPD per operating organization. Survey items that use fixed total budget partitioning are particularly suitable for examining rankings and differences between interdepend options (Conrad et al. 2005, Fabbris 2013 (Madsen and Gregor 2000), interpersonal distrust (Eurobarometer 2014), future time orientation (specifically time perspective and anticipation of future consequences) (Gjesme 1979, Steinberg et al. 2009), self-reported health (Idler and Angel 1990), self-reported environmentally friendly behavior (Idler and Angel 1990), and risk attitude (Weber et al. 2002).
Given their particular relevance in explaining cooperative behavior under uncertainty, we collected social value orientation (Murphy et al. 2011) and the anticipated behavior of others using monetary incentivized tasks, to encourage honest and realistic responses (see Supplementary Materials, B7). We collected the following demographic variables: gender, age, income, political and religious orientation, education, income, living standard, and citizenship.
All control variables are balanced across treatments and domains (see Appendices A and B).

Empirical Approach
To test hypotheses 1a and 1b, we use analyses of variance (ANOVAs) to determine whether we can reject the following H0 in a between-subjects design (mean value of WDPD = ; treatments: R = risk of data getting leaked, I = impact of the smart assistant; treatment levels: l = low, m = medium, h = high): The alternative hypotheses, which we test using Tukey's method, are To test hypothesis 2a in a within-subject design, we use the following ordinary least squares regression: (1) WDPD = ! + ( where MO is perceived moral obligation to provide data and C is a vector of control variables, such as risk attitude and social value orientation. Hypothesis 2a is identified when a high MO is associated with a greater WDPD.
We expect a low risk of a data leak and a larger positive impact of the smart assistant on social welfare to have a positive and direct effect on WDPD. However, MO may mediate the effect of risk of a data leak and the impact on social welfare on WDPD. We therefore perform a mediation analysis to investigate the extent to which the effects of these two explanatory variables on WDPD pass through MO in our baseline specification. For mediation analysis, we need to also estimate the following two regressions.
The mediation for the risk of a data leak (hypothesis 2b) is identified when four conditions are met. First, the risk of a data leak variable (R) has a significant effect on WDPD in Model 2.
Second, the risk of a data leak (R) has a significant effect on the mediator variable MO in Model 3. Third, in Model 1 the mediator variable MO has a significant effect on WDPD. Fourth, the coefficient of ( must be smaller in absolute terms in Model 1 than in Model 2. The mediation for the smart assistant's impact on social welfare is identified analogously. To test hypothesis 3, we examine whether we can reject the following H0 in a between-subjects design: We test the specified relationships of the parameters with a common two-step multiple comparison test procedure (Kao and Green 2008). First, we perform an ANOVA to check whether there are differences between the mean values. Second, we perform a post hoc analysis using Tukey's method to test the direction of the differences between mean values. Tukey's method has the special characteristic of keeping the type I error level constantly close to 5%, thus avoiding running into a type II error too often (Chen et al. 2017).
To test hypothesis 4, we use a one-sided t-test to examine whether we can reject the following H0 in the between-subjects design: where SL is the self-learning algorithm and HS is the human-supervised algorithm. Figure 2 reports the mean willingness and the mean moral obligation to donate personal data.

The willingness and perceived moral obligation to donate data for public goods
The average willingness to provide data to a database to promote social welfare is 54.31 (SD = 30.30) on a scale from 1 to 100 (50 = neutral score), which is significantly different from 0 (p < .001). Thus, participants show a significant tendency to donate their data for a public good.
With an average of 55.10 (SD = 30.31), the WDPD in the environment domain is no greater than that in the health domain, with an average of 53.50 (SD = 30.29), as a t-test shows no significant differences (p = .139). The average perceived moral obligation to donate personal data to promote social welfare is 3.03 (SD = .935) on a scale of 1 to 5 (3 = neutral score). Thus, participants feel an above-average moral obligation to donate their data. Note that there are differences between the domains. The participants feel a slightly greater moral obligation to donate their data for a sustainable environment ( @A_6C3 = 3.06) than for a sustainable US health system ( @A_%68#5% = 2.99), which is statistically significant at conventional levels (p = .046).
One potential reason for this difference is that the distribution of the perceived benefits to social welfare differs across domains. To account for the different characters of the two domains in explaining the results, we asked participants to indicate on a scale from 1 to 100 how much an individual smart-assistant user and the general public benefit in each domain of social welfare.
We find that participants believe that individual users benefit significantly more from a smart assistant that improves their personal health status ( DE6,_%68#5% = 47.25) than from a smart assistant that improves their carbon footprint ( DE6,_%68#5% = 44.80; p = .004). By contrast, the public apparently benefits significantly more from a smart assistant that promotes a sustainable environment ( ;DF_6C3 =54.74) than from a smart assistant that promotes a sustainable US health system ( ;DF_%68#5% =52.15; p=.003). The higher ascribed utility for the general public could be an explanation for a greater perceived moral obligation to donate data in the environment domain, though this greater moral obligation does not translate into a greater WDPD.

Effects of risk of data leak and data's impact on social welfare when donating data
Is a low risk of a data leak and/or a high impact of the smart assistant on social welfare associated with a greater WDPD (H1a/H1b)? To answer this question, we compare the willingness to provide data to a database. Figure 3 summarizes the results per treatment and domain.
In the risk treatment, the ANOVA and Tukey's method results show that the WDPD across the three treatment conditions varies significantly depending on the level of risk (p = .007). The higher the risk that the data is leaked, the lesser is the WDPD. The willingness differs between a low risk level ( -/._"# =57.60) and higher risk levels (medium: -/._"$ = 52.43, p = .012; high: -/._"% = 52.91, p = .024). However, whether the risk of a data leak is medium or higher is irrelevant for individuals' WDPD, as the Tukey's method results show no significance differences in the WDPD (p = .960) under medium (10%) or high (20%) risk levels. The insensitivity to higher risk levels is striking given that the percentage-point difference between the medium (10%) and high (20%) risk levels is nearly identical to the percentage-point difference between the low (0.001%) and medium (10%) risk levels. It seems that individuals consider the difference between 0.001% and higher risk levels binary (i.e., a 0.001% risk is considered "no" risk of a data leak, while a 10% and 20% risk are considered "some" risk of a data leak). While the WDPD differs between low and medium risk levels in the environmental domain (p = .033), we find no such differences in the US health system domain (p = .133).
Overall, hypothesis 1a receives support in the sustainable environment domain but not in the US health system domain.
In the impact treatment, the ANOVA and Tukey's method results show that WDPD does not differ for the varying treatment levels of the smart assistant's impact on social welfare, neither overall (p = .808) nor in either of the two domains (environment: p = .442; US health system: p = .438). The extent of the positive impact of the database-driven smart assistant on a sustainable environment or US health system appears largely irrelevant to the decision to donate data for realistic impact levels. Thus, we find no support for hypothesis 1b.

Effect of perceived moral obligation when donating data
Is a high perceived moral obligation to donate personal data associated with a greater WDPD (H2a)? If so, does the perceived moral obligation mediate the effect of risk or impact on the WDPD (H2b), or does it have a direct effect only? To answer these questions, we run three regressions.   Table 2 gives on overview of the mediation analysis results from regression Models 2 and 3.
Overall and in both domains, three of the four mediation conditions are met in the risk treatment.
Only the significance of the negative effect of risk on the perceived moral obligation is not statistically significant at conventional levels (overall: = .032; p = .112). In the impact treatment, the mediations conditions 2 and 4 are mainly not met. A differentiation between the domains also shows that, though the results suggest that moral obligation mediates the effect of risk on the WDPD, there is no empirical support for hypothesis 2b.

How the operating organization of the database matters when donating data
Is the WDPD different for a database operated by academia, the government, or the private industry? Figure 4 reports the mean relative WDPD per domain and operating organization.
The ANOVA results show that the WDPD varies significantly depending on which organization manages and operates the database to develop a smart assistant (p < .001). The WDPD is significantly lesser when the database is operated by the private industry ( ;, =29.54) than by academia (  [ Figure 4 around here] Our data provide a potential reason for the significantly lesser WDPD in case of an operation by the private industry. The control questions asked participants to indicate how skilled they believe each operating organization is to develop a smart assistant and how trustworthy and profit oriented each organization is. The results of an ANOVA show that participants perceive the private industry as more skilled in the development of a smart assistant than academia (p < .001) or the government (p < .001), but also as significantly less trustworthy (academia: p < .001; government: p = .002) and more profit oriented (academia and government: p < .001). A regression analysis on the WDPD to the private industry verifies that high trustworthiness and low-profit orientation are significantly associated with a greater relative WDPD, but skill in developing a smart assistant is not (see Supplementary Materials, Table SM8). An analogous comparison of the relative WDPD to academia and government also contributes to the explanation of the results. Participants perceive academia as slightly more trustworthy and skilled than the government. However, the absolute differences are marginal, and there is no statistical difference in their profit orientation.
In summary, while there is no significant difference in the WDPD depending on whether the database is operated by academia or the government, operation by private industry is associated with a lesser WDPD overall and in each domain. Overall, we find support for hypothesis 3, which cannot not be empirically rejected.

How the type of algorithm matters when donating data
Is the WDPD greater for a database that is used to develop a human-supervised smart assistant than for a database that is used to develop a self-learning smart assistant? To answer this question, we compare the willingness to provide data to each database using a one-sided t-test.
The results show that the algorithm type does not affect the WDPD either overall (p = .553) or in the health domain (p = .553). In the environment domain, the results show a tendency for participants to prefer a self-learning over a human-supervised smart assistant (p = .080).
Overall, we find no empirical support for hypothesis 4. The type of algorithm is not decisive for the WDPD.

Robustness and manipulation checks
To understand how a database can increase social welfare and the role of data-driven technologies such as smart assistants, it is important that participants carefully read the introduction of the HIT. As a robustness test, we therefore conducted all analyses reported in this article for a restricted sample that excludes participants who spent less than half of the average time to participate. Given that it took participants an average of 12 minutes to finish our HIT, in the restricted sample we deleted 360 observations with a participation time of less than 6 minutes.
The ANOVA and Tukey's method results to identify hypotheses 1a and 1b are identical for the restricted sample but show stronger significance levels for the findings that were previously significant. In the restricted sample, the risk level of a data leak still affects the WDPD, while the smart assistant's level of impact does not. In the unrestricted sample, we found that the risk level is only decisive for the WDPD overall and in the environment domain. In the restricted sample, the risk level also affects the WDPD in the health domain. The Tukey's method results in the restricted sample are identical to the findings overall and in the environment domain.
Participants respond to an increase in risk level from low to medium and from low to high risk, but they show no sensitivity to changes from medium to high risk levels. In the restricted sample, this finding also holds for the health domain. The regression analyses, mediation analysis, ANOVAs, and t-test to identify hypotheses 2 to 4 show identical results in the restricted sample but, in general, show somewhat stronger levels of significance if treatment effects were previously significant.
As a manipulation check, we asked participants to assess the likelihood that their personal data would be leaked from the database and to assess the smart assistant's impact on a sustainable environment or health system. We used the following items on a 5-point Likert scale: "Assess the likelihood that your personal data will be leaked from the database" (1 = extremely unlikely; 5 = extremely likely) for the risk treatment, and "Assess the impact of the smart assistant on a sustainable environment [health system]" (1 = no impact; 5 = major impact) for the impact treatment. We ran ANOVAs followed by Tukey's method to check whether participants actually perceive the risk and impact levels to be different across treatment conditions (see Appendix C for details).
The results of the manipulation check are identical to those of the ANOVAs used to identify hypotheses 1a and 1b. In the manipulation check, we find that participants perceive the risk differently in case of a low risk level as compared with a medium risk level and high risk level.
However, participants do not perceive a medium and high risk level to be different from each another, which indicates that participants indeed consider the difference between 0.001% and higher risk levels as binary. Participants, however, do not perceive the smart assistant's impact levels to be different. This finding is in line with the notion that the benefits of a smart assistant are attributed to society at large, and the individual share might be considered too small in large groups to make contributions worthwhile (Olson 1965). The perceived indifference between impact levels is also a potential reason for the results of the main analysis, according to which the impact is not decisive for the individual willingness to disclose data to promote social welfare. We chose all impact levels on the basis of realistic assumptions and in line with realworld conditions. Although we could have selected more strongly varying impact levels and calibrated them in a pretest, insights of potential effects would be questionable because of the impact levels' detachment from reality and participant deception.

Discussion and Conclusion
We provide empirical evidence that individuals would donate their data to a database to promote social welfare, even if their data is at risk of getting leaked. The evidence of our online experiment further shows that the risk level of a data leak is decisive for the WDPD but that varying levels of impact that data realistically can have on social welfare are not. A potential explanation for this finding is that for the individual, the consequences of a data leak are direct and privacy protection expires immediately in case of a data leak, while the positive impact of a sufficiently large database arises with a time delay, and the individual only benefits to a small degree from its contribution . We find the risk of a data leak is important for databases that are used to promote a sustainable environment but not for a sustainable health system. Moreover, the stronger an individual's perceived moral obligation to donate data, the greater is his or her WDPD. Furthermore, individuals are less willing to provide their data to profit-oriented organizations than to academia or the government. In contrast with the algorithm aversion literature (Önkal et al. 2009), individuals are not sensitive to whether the data is processed by a human-supervised or self-learning smart assistant.
Our online experiment is not without limitations. First, our findings rely on a sample of US citizens. A sample of participants from other countries and cultures might yield different results.
For example, cultural influences are crucial for individuals' self-disclosure on social media sites (Krasnova et al. 2012) and for general risk perception (Weber and Hsee 1998). We further anticipate that WDPD will be greater in collectivist societies than in individualist societies due to greater social value orientation (Shahrier et al. 2016) and willingness to cooperate in social dilemmas (Probst et al. 1999).
Second, we conducted our experiment using a hypothetical choice scenario. On the one hand, we expect the self-reported WDPD to be greater than the actual data donation behavior because information privacy research shows an intention-behavior gap in self-disclosure (e.g., Joinson et al. 2010, Liu et al. 2017). On the other hand, when data are donated in a real-world setting without a scenario description, individuals might disclose more information about themselves because the privacy risk is less salient (Marreiros et al. 2017). Conducting a field experiment in which participants actually donate their data would not have been possible from an ethical and regulatory standpoint, because putting individual data artificially under the risk of a data leak would violate relevant regulations, such as the General Data Protection Regulation. As more real-world data donation use cases such as coronavirus tracking apps emerge and their users suffer from accidental data leaks, ex post natural experiments might allow researchers to glean further insights into revealed user preferences regarding data donations to promote social welfare.
Third, participants indicated their WDPD from a specific list of data types to be donated and for two specific domains of social welfare. Because the willingness to disclose data varies with the type of data being disclosed (Phelps et al. 2000, Lim et al. 2018, we expect WDPD to be greater the fewer data types are required and the less sensitive the data is. We provide first evidence that the WDPD depends on the domain of social welfare by investigating a sustainable environment and health system. However, extant literature gives no empirical guidance that allows us to build assumptions on how WDPD might change with other domains of social welfare. Although the disclosure of data for different purposes has been studied in the past (Morse 2007, Pavone and Esposti 2012, Liu et al. 2017, survey designs and data types often vary, which prevents us from drawing conclusions about the role of the donated data's purpose.
To the best of our knowledge, our online experiment is one of the first empirical studies to keep the data types constant and vary only the different domains of social welfare. From our results, we would expect that individuals are more likely to donate data the more the promoted social good benefits society at large rather than any individual member of society.
Notwithstanding these limitations, insights from our online experiment extend the research on the disclosure of personal data by investigating data disclosure as a voluntary donation to promote social welfare. Because data donations involve privacy costs without providing personal benefits, individuals have an incentive not to cooperate, which results in the social dilemma of big data. Our results provide first evidence for how individuals donate personal data in a scenario in which a database resembles a public good and must be sufficiently large and diverse to enable technology to promote social welfare. Our results are novel in that they show that individuals would donate their data despite personal privacy costs, uncertainty about whether enough other people are donating their data, and uncertainty about the amount of data required for the database to increase social welfare.
Understanding the drivers and barriers of socially directed data donation is relevant for the research community, but also to legislators and practitioners such as non-profit organization representatives. The COVID-19 pandemic illustrates the high potential of data in promoting public health as a domain of social welfare. However, the severity of the pandemic also underscores the failure of governments to adequately encourage public debate and educate the public on the prosocial use of citizen data before crises. The majority of governmental data collection measures were discussed and implemented in the middle of a global emergency, a time when people may be fearing for their health, the health of others, and the consequences for society. The timing of the discussion on data disclosure may raise ethical concerns because fear favors consent to voluntary and mandatory data disclosure to the state (Hillebrand 2021).
To ensure ethical use of citizen data, policy makers and legislators need to address population preferences and understand what factors should be considered when using data to promote social welfare-if data disclosure is voluntary, to create conditions that motivate individual data donation; if data disclosure is mandatory, to create conditions that reflect society's preferences to ensure ethical use (Ali and Bénabou 2020). According to our findings, legislators should particularly focus on the risk of a data leak, the organization that collects and manages the data, and the purpose for which the data will be used. With these insights, we hope to further support non-profit organizations such as the UN, which has been working for years to mobilize citizen data, in designing structures that encourage more individuals to voluntarily donate their data.

Supplementary Materials
The Social Dilemma of Big Data: Donating Personal Data to Promote Social Welfare

Research
This study is part of a research project that investigates ways to help the public benefit from its personal data. The study is run by Professor Lars Hornuf and Kirsten Hillebrand from the University of Bremen.
You will receive between $0.40 and $0.55 for taking part in this study, which can be completed in approx. 20 minutes. Your actual payment will depend on the choices you make during the study. On the next page, we start with a brief explanation of • Why individuals need to grant access to their personal data to promote public welfare, • how individuals can support the promotion of public welfare by providing their personal data, and • the risks associated with providing personal data.
We then show you two scenarios, in which you can choose between uploading and not uploading your personal data to a database, so that these data can be used to promote the public good. You will answer questions about moral obligation and your willingness to provide data.

Consent
Your decision to complete this study is voluntary. Your answers will be collected and analyzed in an anonymous form, which means that you cannot be reidentified. Any data we process will be encrypted with a random session ID. We will neither collect your IP-address, name, location nor any other data that could identify you. Besides your answers, we will exclusively collect the following data: duration of participation, the survey page on which participation ended, browser, mobile device, date and time of access.
Because this study is conducted by a German university, the collected data will be transferred to and stored on university servers in Bremen, Germany. The anonymized data will be stored for 10 years and deleted afterwards. All data is collected via the service provider Unipark. Hence, the collected data will be additionally stored on servers of this service provider. Uniparks's server park is located in Frankfurt, Germany, BSI-certified and is subject to the security requirements of ISO 27001. To the best of our knowledge, Unipark will not collect any additional data to the one stated above and your anonymized data will be deleted from the Unipark servers as soon as the data collection is completed, i.e. the online survey is taken offline. The aggregated results of the research may be presented at scientific meetings, published in scientific journals or in other ways shared with the scientific community in an anonymized form. Clicking on the checkbox below indicates that you are at least 18 years of age and agree to complete this study voluntarily.
At the end of the study you will receive a unique survey completion code. You can then enter this code at MTurk in order for us to verify your participation and release your payment. During this process, we can observe but will not store your worker ID assigned by MTurk. We will not observe your name, banking details, or any other individual-related data.

Questions/Concerns
Please contact the researchers behind the study if you have any questions or concerns via kihi(at)unibremen.de.

Personal data for public welfare
In the report "A World That Counts," the United Nations (UN) calls for the mobilization of data to promote public welfare. Data and new technologies have the potential to transform societies and to protect public goods such as a sustainable environment or health system. Thus, data can maximize individual and social welfare in the United States. Like the private industry or academia, US-American civil society is equally part of the global data ecosystem. Data-driven technologies such as smart assistants enable individuals to make choices that are good for them and the world in which they live.
However, available data for such technologies need improving. As the UN report states, whole groups of people and important aspects of their lives are still not captured digitally. More diverse, integrated, and trustworthy data lead to better decision making and real-time citizen feedback. According to the UN, providing access to such data is essential to promote social welfare.

This study: smart assistants & privacy risks
In this study, we examine if individuals provide their data to improve data quality and decision making.
We specifically focus on smart assistants, a data-driven tool that converts large amounts of data into personalized information. This information is available when and how the user wants it. A smart assistant could help users make more environmentally friendly daily choices and thus contribute to a sustainable environment-for example, by selecting relevant information according to consumption patterns and providing tips that are tailored to habits and easy to follow. (To demonstrate that you have carefully read the instruction, please do not tick the check box regarding your age below.) Although a smart assistant can promote eco-friendly choices, leading to a sustainable environment, it holds a privacy risk: providing your data to a smart assistant means risking the possibility that your data will be leaked. For example, your data could be hacked by a third party, you could be identified even though your data have been anonymized, or your data could be used for purposes other than what you agreed to.

Optional additional information
On the following pages, we present our best estimates on how the smart assistant might perform in helping users live environmentally friendlier everyday lives. The predictions are based on the following scientific studies: •

Presentation of the Scenario and Trade-off to Participants (Exemplary for Domain 1 with low
Risk and high Impact)

Scenario 1
Imagine a smart assistant that supports US users in living environmentally friendlier everyday lives, thereby promoting a sustainable environment. Every English-speaking person with a smartphone in the United States could use the smart assistant.
However, to develop and operate an assistant that offers informed and comprehensive decision support on sustainable behavior, there must be access to a sufficiently large database of diverse and trustworthy data. The database requires the following data sets in an anonymized form: • Basic personal information such as age, gender, level of education, and job and contact information; • personal purchase lists and payment information; • personal medical records and information; • personal posts and likes on social networking sites; • personal browsing history; and • personal location information.
Uploading your data to the database does not obligate you to ever use the smart assistant yourself. You have the right to obtain the erasure of your personal data at any time. Please read the following text carefully as the questions are related to this scenario. By giving informed and relevant decision support, the smart assistant decreases the yearly CO2 emission of each user by approx. 30%. This corresponds to planting 264 trees per year per user. The risk of data being leaked from this type of database is approx. 0.001%. This corresponds to the leakage of data from 1 of 100,000 individuals. Imagine that you could easily upload your personal data to the database anonymously. You have two options: You upload your data to the database: You do not upload your data to the database: • You contribute to the development of the smart assistant that helps users decrease their CO2 emission by 30% and, thus, to a sustainable environment.
• You take a 0.001% risk of your data being leaked.
• You avoid a 0.001% risk of your data being leaked.
• You do not contribute to the development of the smart assistant that helps users decrease their CO2 emission by 30% or to a sustainable environment.

Presentation of the Scenario and Trade-off to Participants (Exemplary for Domain 1 with low
Risk and high Impact)

Scenario 2
Imagine that you can decide which particular database to upload your personal data to. You can choose between three available databases. All databases • require the same personal data, • have an identical risk of being leaked (0.001%), and • are used to develop a smart assistant that helps users decrease their CO2 emission (by 30%). Data-based personalization is one of the main added values of the smart assistant to give relevant decision support on living healthier. Only by personalizing information is the smart assistant available when and how the user needs it. It is important to consider that the technical nature of the assistant determines how the data are analyzed to derive personalized information (e.g., specific tips and action recommendations). In our scenario, two smart assistants are available.
• Both assistants require the same personal data.
• Both assistants are based on a database with a (20%) risk of being leaked.
• Both assistants help users decrease their probability of getting sick (by 10%).
However, the two assistants differ in their algorithmic rules to derive personalized information such as tips and action recommendations from the database. The initial algorithmic rules of both smart assistants are programmed by a human being. Please read the following descriptions carefully, as it is important that you understand the difference between the two smart assistants.
Smart assistant with a self-learning algorithm (based on database Φ) Smart assistant with a humansupervised algorithm (based on database Ω) • Rules for personalization autonomously change depending on how the user reacted to past information.
• Consequently, the selected personalized recommendation will also change over time, depending on the rules the smart assistant automatically modified.
• Rules for personalization do not autonomously change depending on how the user reacted to past information; however, a human can manually change the rules.
• Consequently, the selected personalized recommendations will change over time, depending on the rules a human manually modified.

Instructions
In this task, you are actually paired with another real person, whom we will refer to as the other. This other person is someone you do not know and will remain mutually anonymous. All of your choices are completely confidential. You will be making a series of decisions about allocating resources between you and this other person. For each of the following questions, please indicate the distribution you prefer most by choosing the respective allocation in the dropdown menu. Your decisions will yield an additional payment for both yourself and the other person. The maximum additional payment for you and the other person is $0.10. Additional payments will be allocated between you and the other person depending on which allocations you choose in the dropdown menus below. 100 points represent $0.01.
In case of odd numbers, we will round up in your favor. There are no right or wrong answers, this is all about personal preferences.