SCOTT:BB26.G

=Overview=


 * WPs of interest
 * WP7 can be a core WP for Privacy Labels BB
 * WP21 is also good for applying Privacy Labels
 * WP11 mentions Privacy Labels.
 * It is also interesting for applying Privacy Labels because it works with complex systems that manipulate data of various kinds. Fine-grained access control is also applicable, like our 5th step in S-ABAC "Query-based AC", which can also be good to achieve better privacy.


 * We will not be involved in WP12 nor WP14

Activities

 * Related activities include those started in BB26.F Measurable on Multi-Metrics and Measurable aspects of Privacy
 * Privacy evaluation of the TellU Diabetics app demonstrator from WP21 is planned and under way, with deadline in Spring 2019.

=Division of Work and Research Directions=

Planned Outcomes
The work in the Privacy Labelling aims to provide the following tangible results.

Privacy Labelling for Decision Makers
The purpose is to have a lightweight version useful for persons that make decisions where privacy aspects are involved. Examples of use:
 * 1) Example of the CEO or Project manager, like from Statsbygg, that need to decide on whether to include many sensors in a Smart office building for monitoring the indoor air quality. What are the privacy implications of this? How should she take a decisions, based on what information?
 * 2) * This Result should give here the needed tools to help with the decision.
 * 3) Before doing any expensive and time-consuming PL certification actions, a CEO or investor needs to have initial indications of Privacy Aspects regarding the new App or IoT device or technology that is planned.
 * 4) * This Result should enable economical and feasibility calculations of the privacy implications on the business development.
 * 5) * This work will be done in collaboration with Smart Innovation Norway.

These examples clarify the Requirement RQ561.

Privacy Labelling for Technical privacy engineers
This is similar to any other certification methods, like for the SIL levels, for the ANSSI security classifications, etc. The goal is two-fold: (i) to have a methodology, i.e., a decision-tree to guide various technological decisions; as well as (ii) to have guidelines for what technologies can achieve what privacy guarantees. This includes a good overview, semantically meaningful, over the various privacy relevant concepts, like anonymization, personal identifiable data, machine learning information extraction, aggregation techniques, deletion, etc. This also includes methods and tools for measuring the various privacy relevant techniques and how much they achieve their purposes.

NOTE that all the above aspects are not related to security, i.e., the confidentiality is assumed provided by other means. For otherwise, if confidentiality of data in transit or stored in broken, then no privacy guarantees can be met any more, thus making any Privacy Labelling irrelevant.

Security evaluation should be done independently of the privacy evaluation.

Privacy Labelling for the Users
This is not present in classical standardisation methods, because those, like the goal above, are meant only for technical people, and have a very simple outcome like a number or a yes/no answer.

The main purpose of Privacy Labelling is to present the outcome of the privacy certification to Users. However, privacy is highly difficult to present, compared to classical aspects like the Energy Consumption labels where the range is the number of consumed KW/hour. Moreover, privacy is also highly personal, i.e., what for some person is highly private for another may not be, and thus does not impact her decisions. The goal is to study how to best present the various concepts involved in technically deciding a privacy label. This means that a User can understand first the privacy implications for some device/service/system that she wants to acquire, as well as how to compare with other similar products based on the privacy aspects. Examples of use:
 * 1) An older lady wishes to buy some Smart Home Energy Management system, but she is concerned that all these new gadgets are too intrusive into her rather classical style of living. Even more so since several of her family members, or friends at the Bridge Club tell her about <>, or how <>. She wants some simple indication given by some authority that she can trust (and she mostly trusts when government associations are involved) about how much this new system would expose her, so she can decide what to buy, restricted by here limited budget.
 * 2) A young adult who likes to have various new electronic devices, is interested in installing a new app on his phone for tracking his weekend hiking trips. He wants to know about how his location data is being used by the app provider so that he can make an informed decision regarding choosing the various different functionalities that the app promises against the privacy exposure that he would have to accept.

The first example clarifies the Requirement RQ558.

Privacy Labelling for certifying experts and certification bodies
These include minimal requirements, alignment with existing regulations like GDPR, adoption and relation to existing standards of relevance. Examples of use:
 * 1) An expert needs to carry out a certification job for a specific new IoT device. She needs both methodological as well as tool support.
 * 2) A regulatory body, like a GDPR national authority, needs to make a compliance evaluation.

Sub-components
The work in the Privacy Labelling is divided into several sub-components, each trying to achieve one of the above goals.

PL4Decisions
This sub-component works towards the following measurable outcomes.
 * 1) A simplified decision process from PL-CERT, with different domain-specific versions
 * 2) Tools easy to use by decision making people, like questionnaires

This sub-component works to attain the Requirement RQ561.

PL-Methods
This sub-component works towards the following measurable outcomes. This sub-component works to attain the Requirement RQ560.
 * 1) A decision process, with associated tools like UIs and databases of resources
 * 2) * This identifies concepts relevant for privacy, like
 * 3) ** what are private identifiable data,
 * 4) ** what/if data is collected,
 * 5) ** what inference algorithms (e.g., machine learning) are being applied and what for.
 * 6) * These concepts need to be (co-)related to each other (e.g., which and how one influences another), maybe prioritized or weighted.
 * 7) * Identify which of the concepts can be measured in any way.
 * 8) ** Identify what tools and techniques exist to measure the respective aspect.
 * 9) ** To combine/put together measurements from sub-components into a single measurement for the full system one can use the Multi-metrics framework detailed and exemplified in the recent book Measurable and Composable Security, Privacy, and Dependability for Cyberphysical Systems: The SHIELD Methodology
 * 10) * Making this process in full should be done iteratively, first a minimal process which is also applied to a use case. Then in further iterations increase the amount of concepts and decision nodes/questions, applied to subsequent use cases.
 * 11) Surveys of existing techniques that can be applied to answer the questions at any of the decision nodes. Include:
 * 12) Anonymization techniques
 * 13) De-anonymization methods, like machine learning algorithms
 * 14) Results about how anonymization of data influences desired learning results, e.g., before anonymization some learning and profiling can be done, whereas after anonymization such profiling cannot be done any more and thus the < > cannot be applied any more. (see survey chapter A General Survey of Privacy-Preserving Data Mining Models and Algorithms and the book that it comes from  along with the standard book Data Mining: Cncepts and Techniques 2011)
 * 15) Differential privacy (see survey  and more resources from a simple search)
 * 16) Survey of Privacy-by-Design concepts and current works. Examples include applications
 * 17) in Programming,
 * 18) in System architecture,
 * 19) in Databases,
 * 20) in Machine Learning
 * 21) Privacy for Location data (see old book Privacy, security and trust within the context of pervasive computing or survey A survey of computational location privacy)
 * 22) Privacy Patterns (starting from A Literature Study on Privacy Patterns Research) and question on finding which such patterns involve measurable aspects
 * 23) Privacy in Biometrics (see book Biometrics: theory, methods, and applications)
 * 24) Clustering techniques for Privacy (see  along with 2017 Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop or An overview of the use of clustering for data privacy or The Effect of Clustering on Data Privacy)
 * 25) Techniques for measuring various relevant privacy aspects, including tools to automatically do the measuring
 * 26) Include the multi-metrics framework
 * 27) Tools for aggregation of the privacy measurements
 * 28) Examples of use

PL-UX
This sub-component works towards the following measurable outcomes.
 * 1) Privacy Labelling color range and visual cues, including icons
 * 2) Privacy Label user friendly explanations, i.e., details
 * 3) Usability studies to evaluate the User response

This sub-component works to attain the Requirement RQ559.

PL-CERT
This sub-component works towards the following measurable outcomes.
 * 1) A certification process with all decision points identified and methods for making and evaluation and decision at the respective point.
 * 2) Clearly described relationships with existing standards
 * 3) Method of aligning Privacy Labelling to existing regulations including GDPR and Norwegian National regulations from Datatilsynet

This sub-component works to attain the Requirement RQ560.

Baseline
The baseline in starting developing this technology building block is formed of the existing technologies related to privacy. These are studied in the sub-component, and are relevant to the outcome  as well as to the outcome. This BB will not attempt to enhance these methods in any way, but instead will survey and evaluate their performances, with the goal to provide an informed decision support for each privacy aspect considered in the process developed in the sub-component.

Also part of the baseline for the work in this BB are various certification methods and processes, especially those related to security. These will make the starting point for the sub-component and the respective outcome.

There are not certification processes that can be used for certifying a service or IoT device wrt. privacy. Such a processes will be created in this BB; in fact several variants of the process, to accommodate the different objectives that we have.

Various privacy guidelines exist, like privacy-by-design or EU's general data protection regulations (GDPR). We will use these and the concepts that they identify and discuss in order to build this BB.

Enhancements during SCOTT
During SCOTT we enhance existing privacy relevant technologies and guidelines in several aspects, described by the objectives of this BB. From the point of view of the work, these are divided into the four sub-components of this BB.

One enhancement goes in the direction of technologies and the evaluation of their suitability to provide a specific privacy property/functionality. The degree to which some technology can attain some privacy property will be measured on a scale specifically designed for that privacy property. In the certification process each property with its scale will be part of a decision process. Using the multi-metric approach of the SCOTT BB on Measuring SPD we will combine all the different scales and measures for each chosen privacy technology into a single value, which would correspond on the user side to a Privacy Label.

Another enhancement goes in the direction of certification for privacy. This would be inspired by certifications for security. However, we want two certifications:
 * 1) One heavy-weight, (corresponding to sub-component ) which would be used on a service or IoT device or product. This would evaluate the privacy of the service and assign a privacy label.
 * 2) Another light-weight, (corresponding to sub-component ) which is novel in certification domain, and is intended to be used at a decision time, to guide negotiations, or purchase specifications, and decision at higher levels. This would not have too many technical aspects. However, the heavy technical certification process would be a detailed version of the light-weight one.

A final enhancement is on the user side. We need to present the certification to a user and explain the privacy label in such a way that it is useful for the end user. Usefulness is both in terms of helping the user make an informed decision when buy a service, as well as helping the user understand the privacy consequences of using the service. This is related to the sub-cmoponent and is both about graphcal/visual presentation, but also about how to explain privacy concepts to non-security experts.

Future vision 2025
The vision is that by the end of SCOTT we will have demonstrated how privacy labeling can be done. We will discuss with authorities so to have the PL taken up in their evaluation processes.

We expect by 2020 to have experimental methods for all planned sub-components.

We expect by 2025 to have such methods taken up in a standardization committee on the technology side, and to be adopted as a certification process by major certification companies like DNV-GL in Norway. We also expect regulatory bodies to have national regulations asking for such labeling, even if most products would only get a default lowest label (as it is often done when a new labeling scheme is taken up by a government, e.g., see the energy efficiency labeling of houses in Norway).

Hinders and requirements
Scientific, technological, standard or political perspectives:

Scientific hinders come from the fact that there are very many technical solutions to both provide some privacy property but also to break the same property. For example, there are anonymization techniques but there are also various de-anonymization techniques, e.g. based on machine learning or aggregation of data. Gathering and evaluating all these is a serious challenge for this BB.

Standardization requirements include those for certification processes, where both government and industry need to be on an understanding level.

Political hiders are coming from the fact that privacy is a human right and is asked for by citizens to their governments. However, private data is one of the most valuable assets nowadays, and thus companies are more and more interested in breaking individual privacy. As consequence there is a push-and-pull on the political arena between the companies and the citizens.

=RoadMap=

=Deliverables and Documents=

=Practical Aspects=

Implementations and User Testing

 * See the RoadMap

Demonstrations and Use Cases

 * In WP7 in an initial preliminary phase at M14
 * In WP21 in a more concrete phase at M24

Air Quality monitoring Use Case of WP7
The work in WP7 involves complex systems and algorithms for processing data coming from multiple sensors and other information sources (like weather report) in order to achieve a high precision monitoring of indoor air quality (IAQ), both in real-time and in quality of the measurements. Thus WP7 provides good use cases for all components of the Privacy Labelling work. Refer to the respective Deliverable D7.1 for detailed scenarios presentations. Consider the following aspects of the IAQ and related privacy questions/concerns.

An office environment, or an industrial facility (like for storage of goods or for processing of food) is equipped with various sensors like temperature, humidity, pressure, various gasses, particles, sound, luminosity, 3D camera, motion, window and doors latch/open sensors (see the slide 18 f the VTT presentation in the 14 Feb 2018 meeting). This sensors information is gathered in cloud-based systems (see the presentation of Centria from 14 Feb 2018 meeting and their FOAM platform) that use powerful tools for big data processing like Apache KAFKA and SPARK, Cassandra (see presentation of ITI from 14 Feb 2018 meeting). However, the data from the sensors first goes through a gateway/router inside the facility before reaching the cloud platform (see presentation from Centria slide 5 or from F-Secure from 14 Feb 2018 meeting). Therefore, various forms of processing can be done on the gateway, some relevant for privacy. The work in WP7 uses both poor-quality sensors (see presentation of IMEC from 14 Feb 2018 meeting) as well as high-grade sensors (see presentation of QPLOX or RDVELHO from 14 Feb 2018 meeting).
 * One Privacy question is how many of these sensors are needed for each specific air quality functionality that is desired? The more sensors the more accurate information can be inferred about the occupancy and processes/behaviours that are happening at some point in time in the respective indoor location.
 * One Privacy question is, how is this data being processed in these far-away cloud systems? Can transparency be achieved?
 * Another Privacy question is, what kind of information can these powerful systems extract from the sensors data? How much profiling can be done? These questions are related to research on privacy in static databases like differential privacy.
 * Can the gateway do some anonymization pre-processing of the sensors data? This work should be done in accordance with the cloud system so that the amount of annonymization does not interfere with the cloud functionality (e.g. the same profiling should be possible, like for various price agreements)
 * One Privacy question is how much information needs to be collected from one sensor to achieve the expected functionality? For example, with a high-grade sensor too often measurements can provide enough information to extract various privacy sensitive information, like what activity s carried in the office, i.e., typing at computer, walking, sitting, reading (see recent research on this aspect from ESORICS 2017).
 * With poor-quality one usually claims that more sensors are needed, however, how many and where to be placed is important so that not to infer more information than needed.

The above example and questions are relevant for the PL-CERT and PL-Methods components. These components would be applied both at the sensor level and immediate gateway (processing unit), as well as on the overall system, i.e., including the cloud systems.
 * Work on clustering (see presentation of Nokia from 14 Feb 2018 meeting) is relevant for anonymization. Can this be performed by the gateway, or only by the cloud?

The scenarios described in Deliverable D7.1 include several aspects regarding the presentation of the data to users, including presenting privacy aspects to the concerned users. Here is where the work of the PL-UX component would be applicable.

The work of SmartIO works with users that are at decision levels, i.e., the responsible for building a new smart office facility, or retrofitting one with smart-* capabilities. For this kind of users the PL4Decisions would be applicable. Our work would focus on the air qualify functionality and privacy questions.

Smart Grid Use Case not related to any SCOTT
Advanced Metering Infrastructures (AMI) and Smart Meters are deployed in Norway to automatically and continuously measure energy consumption. There are many Privacy Concerns around these: Papers to start from (also see who cites these on scholar.google.com): There are well known ways of using the Smart Meter data to extract behaviors and make profiles:
 * 1) ? How much Private information can be extracted from this data ?
 * 2) ? How well is this data anonymized ?
 * 3) ? How well can we measure the privacy implications of such Smart Systems ?
 * 1) Smart grid privacy via anonymization of smart metering data by Costas Efthymiou and Georgios Kalogridis, in IEEE International Conference on Smart Grid Communications (SmartGridComm), 2010.
 * 2) "Influence of data granularity on smart meter privacy." by Günther Eibl and Dominik Engel in IEEE Transactions on Smart Grid 6.2 (2015): 930-939.
 * 3) "Do not snoop my habits: preserving privacy in the smart grid." by Félix Gómez Mármol; Christoph Sorge; Osman Ugus; Gregorio Martínez Pérez in IEEE Communications Magazine 50.5 (2012).
 * 4) "Achieving anonymity via clustering." by Aggarwal, et al. in Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2006.
 * 5) "An overview of the use of clustering for data privacy." by Torra, Vicenç, Guillermo Navarro-Arribas, and Klara Stokes in Unsupervised Learning Algorithms. Springer, Cham, 2016. 237-251.
 * 1) Leveraging smart meter data to recognize home appliances
 * 2) Private memoirs of a smart meter
 * 3) Security and privacy challenges in the smart grid
 * 4) Privacy-Friendly Aggregation for the Smart-Grid

=Requirements=

=SCOTT status= From Ramiro: An overview of the instructions for updating the building blocks and the collection of the requirements can be found in this presentation (slide 19-24). https://projects.avl.com/16/0094/WP26/Documents/02_Meetings%20and%20WebEx/20170703_SCOTT_Presentation_WP26.pptx?Web=1

The official and complete instructions can be found in the following presentation from SP1 requirements management. https://projects.avl.com/16/0094/WP01/Documents/03_Deliverables/SCOTT%20REQM%20Approach_Guidance_June2017.pptx?Web=1