Security and Privacy

From wiki.gpii
Jump to: navigation, search

Overview

This describes the overall security architecture for the GPII as it was designed in the Cloud4All Project. Further information is available in the Security Dashboard and GPII OAuth 2 Guide.

As information and communication technologies (ICTs) increasingly pervade all aspects of our lives, attitudes to privacy are slowly changing. Consumers have readily adopted cloud-based technologies such as Google Apps, Apple's iCloud, Facebook, and more, enabling them to work on multiple devices and stay ubiquitously connected to friends and family. But these tools also bring with them significant concerns, including questions about how to effectively limit access to a user's personal information. This issue is even more important when people with disabilities are involved, who may be discriminated against based on their medical status or personal needs.

The goal of this security and privacy architecture is to enable a robust system that can, as it grows, provide users with friendly and comprehensive tools for sharing their data with the web sites, access features, and assistive technologies that they trust.

The Cloud4All security and privacy architecture consists of several interrelated components that are closely aligned with the project's overall architectural approach, and are being integrated with the Cloud4All/GPII real-time framework. These components include:

  • A pluggable user authentication module, which is responsible for managing user accounts, and which will (outside the scope of the Cloud4All project) include the ability to connect to existing organizational identity management tools such as LDAP or Microsoft Active Directory
  • A two-tiered application authentication and authorization architecture that prevents direct access to the Preferences Server, requiring participating applications to be issued OAuth[1]-based credentials that verify their identity
  • A framework-level means for filtering needs and preferences sets, which will help avoid privacy leakage by ensuring that preferences and settings are shared only on a "need to know basis"

These components—and the overall approach with which they’ve been architected—are outlined in greater detail below.

Introduction

Objectives of this document

This document describes the Cloud4All architecture for managing the security and privacy of user needs and preferences sets, account information, and application authentication. It is intended as a tool for Cloud4All developers to use in implementing this approach to security and privacy ubiquitously throughout the system.

This architecture is intended to provide modular, framework-level supporting infrastructure to help preserve the confidentiality, integrity, and trusted use of user data within Cloud4All.

Since security and privacy impact the Cloud4All system as a whole, this document influences a large number of the technical activities of the project. Developers who are integrating their applications as part of the SP3 effort—especially those who have web-based tools—will be directly supported by this architecture, as will the creators of framework-level applications and services such as the Preferences Server, Matchmakers, and more. The ultimate goal is to provide reusable security and privacy tools to all components and developers of the Cloud4All system.


Approach to Security and Privacy

This activity takes a holistic approach to the security and privacy of user information—both its storage and usage—throughout the Cloud4All infrastructure. It outlines a general architecture and a set of framework-level services that can be used to:

  • promote overall project security by preventing unauthorized access to a user's needs and preferences set (N&P set)
  • provide a means by which users can maintain the confidentiality of personal data within the system

In reality, security and privacy are complex tasks that involve trade-offs between usability, convenience, and ubiquity. There is no single "silver bullet" approach or technology that will guarantee absolute privacy to a user. To help promote the values of user autonomy, privacy, and system integrity, we have adopted several overarching security and privacy principles when developing Cloud4All's code, including:

  1. Avoiding data leakage within the architecture as much as possible
  2. Limiting the amount of personally identifying information a user needs to provide to the system
  3. Providing users with ways to define their “trust networks” (i.e. which sites/applications they want to share preferences with, and to what extent)

Background

Speaking with users during several Cloud4All-based pilot tests and community forums, we have heard them repeatedly voice the desire for control over how their cloud-based personal needs and preferences are used and shared. Most users have said that they're willing to store and share their data in the cloud and allow it to be connected with their online identity, but that they want a clear and understandable way to control who has access to their GPII/Cloud4all needs and preferences sets. In addition, some users have requested the ability to use the system anonymously, without having to provide personal information such as an email address or username.

This architecture is intended to help provide users with the tools and infrastructure they need to monitor and trust that their needs and preferences are being stored responsibly and used in accordance with their wishes, and to control access to this information within the Cloud4All/GPII ecosystem.

This structure must and will be implemented in the context of an ethics infrastructure that will include both clear guidelines on how any user data is collected, stored, accessed and used, and a Data Ethics Oversight Committee composed of outside members chosen from the international ethics community to create, control and monitor compliance with these guidelines.

Overview of Components and Workflow

The following diagram illustrates the relationship of the key security and privacy components to the overall Cloud4All real-time framework.

Figure 1. Architecture for N&P Set Security and Privacy.

D105.2.architecture.png

This diagram shows the primary components within the system. It shows how the user can use the Preferences Management tool to authenticate and edit account settings and privacy rules. It also shows the personalization workflow for a device or third-party web application, which provides a client token to the Flow Manager and Auth Manager. Once the application is approved, the user preference set is filtered based on the privacy rules defined for the requesting application as well as its capabilities.


User Accounts and Privacy Rules

Along the right hand side of the diagram, the user is given the ability to manage and maintain a personal GPII user account, which ensures that only they can:

  • Edit their needs and preferences sets
  • Change their security settings, passwords, and privacy rules
  • Provision of new physical login devices such as RFID tags or USB sticks

Account management is handled by the Preferences Management Tool, and is described in further detail below. Support for anonymous usage is also covered.

In addition to managing their account, users will also be provided with a means to define privacy rules, which enable them to prevent specific applications or web sites from accessing some or all of their N&P set. These privacy rules will be stored alongside the user's N&P sets in the Preferences Server database.

Device and Web Application Personalization

The security and privacy infrastructure will introduce a new refinement of the real-time user login workflow that was documented in A Detailed Tour of the Cloud4all Architecture.

In this model, all applications will be issued credentials (in the form of a unique OAuth token) that verify their identity to the Cloud4All infrastructure. This is done in order to reduce the risk where a malicious application or device masquerades as or impersonates another application in order to get access to preferences that they would not normally have access to.

When a user "keys in" to a GPII-enabled device or starts using a GPII-integrated web application, the application will send its client token to the Flow Manager along with the request for the user's N&P set. The Flow Manager will delegate to the Auth Manager component, which is responsible for verifying the requesting application's client token. This is done with help from the Solutions Registry, which is responsible for providing all information about applications that is needed for the real-time personalization process. The Solutions Registry will store the list of legitimate OAuth client tokens.

Preferences Set Filtration

If the authentication process succeeds, the Flow Manager will perform its usual workflow[2], and solutions are matched to the user's needs and preferences.

Even if all applications must authenticate themselves, a potential risk to user privacy remains. This is the inadvertent leakage of N&P set data that can occur if parts of the set are shared with an application that doesn't have the capability to respond to those needs. A well-intentioned application may just disregard this extra information, but a malicious application may use this additional information to collect a broader picture of the user than they could otherwise acquire.

As a result the real-time framework, as it is implemented today, automatically filters out any information in a preference set that an application is unable to productively use. This is accomplished using the Transformer component, which is provided as a common service throughout the architecture. During the personalization workflow, the real-time framework takes the application's stated capabilities (as described in the Solutions Registry) and removes any preferences/settings from the user's N&P set that aren't relevant to those capabilities.

In addition, a second filtration stage will be added to the process, directly at the source of N&P sets—the Preferences Server. As mentioned above, users will be able to define rules that restrict access to all or some of their needs and preferences for specific applications. These rules will be stored as JSON-based transformation rules that can be operated by the Transformer, ensuring that a single common infrastructure is used to operate preference set filtration throughout the system. By reusing the transformation infrastructure as single point of responsibility for preference set filtration, this helps to promote better code review and security audits by reducing the code surface of the system.


Figure 2. The needs and preferences set filtration process. D105.2.preferences-filtration.png


Figure 2 illustrates the preference set filtration process, starting with the original needs and preferences set at the top. This JSON document is stored in the Preferences Server, alongside any privacy rules that the user has declared for the requesting application. Privacy rules are also stored in a declarative format. In the future, these rules will be thus editable by the users themselves, using friendly tools and templates in the Preferences Management Tool. Privacy rules are operated by the Transformer component of the real-time framework. After the document has been filtered by the user’s privacy rules, it is further filtered based on the capabilities of the requesting application to actually meet the needs and preferences in question. Practically speaking, if the application isn’t capable of doing something with a particular value in the user’s N&P set, it will be removed the set.

User Accounts and Authentication

As mentioned above, a full single sign-on identity management system is outside the scope of the Cloud4All project. Nonetheless, other GPII-related projects are looking into techniques for connecting the infrastructure up with emerging trusted computing initiatives such as the U.S. government's National Strategy for Trusted Identities in Cyberspace (NSTIC)[3], Internet2’s Scalable Privacy Project[4], The Kantara Initiative[5], and others.

Cloud4All, however, will provide a simple and workable system for managing user accounts and privacy settings for use within the Cloud4all project (where actual user data only exists within the Pilot Testing component). From the user’s perspective, this functionality will be available as part of the advanced features of the Preferences Management Tool. From the architectural perspective, the Cloud4All framework will provide an LDAP[6] directory or similar means for storing and managing user accounts in an interoperable way.

With this system, users will be able to specify:

  1. A user name (which the system could help generate for them)
  2. Their email address
  3. A password

An account with the Cloud4All system will enable users to manage their physical login devices (RFID tags, USB sticks, etc.), set privacy rules, and edit their preference sets on any device. The architecture will provide an indirection mechanism between the user’s token (i.e. the internal identifier used to store their preference set in the Preferences Server) and the tokens that are actually stored on the login device, such that if a login device gets lost, the user can disable access to it without having to start over again.


Figure 3. User token indirection.

D105.2.token-indirection.png

Figure 3 illustrates the indirection process for user tokens. While the physical login device will be provisioned with one random, non-identifying token; this will not directly correspond to the identifier used to store the N&P set in the Preference Server’s database. Instead, this token will be used to look up another unique, non-identifying token that is the key for the N&P set. This ensures that a lost RFID tag, USB stick, or other login device can be disabled without requiring changes to the data stored in the Preferences Server. It also allows us to introduce login devices that create constantly changing log-in Key-Tokens.

As a further option to protect privacy, users will be able to choose to always be asked for their password, a biometric, or other user selected second level authentication before showing or editing any advanced settings and personally-identifying information within the Preferences Management Tool. This will, among other things, help to ensure that even a lost USB stick or RFID tag will not expose personally identifying information.

Anonymous Usage

Complete anonymity in the cloud is an increasingly difficult prospect to achieve. Due to ubiquitous user tracking by web-based advertisers and the practice of selling user information connected to site-specific cookies[7], it is possible for web sites to develop a very elaborate picture of a user's habits and usage patterns. More pragmatically, users are often required—generally for a good reason—to provide personalized credentials to use third-party web applications and devices. This makes it easier for sites to correlate otherwise unidentified N&P sets with a specific user.

Addressing truly anonymous use of ICT in cloud era is a substantial undertaking, and undoubtedly beyond the scope of the Cloud4All project alone. Nonetheless, we have endeavoured to provide a simple and workable means for users to avoid having to identify themselves to the Cloud4All system itself. This may serve as a building block for truly anonymous usage, and more immediately provides users with the reassurance that they aren't tracked by the GPII infrastructure.

Anonymous usage will be accomplished by bypassing the user account creation process altogether. A user will be able to use the PMT without specifying a username, password, email address, or other identifying information. Instead, they will be issued a unique, random token that can be directly stored on a physical login device such as an RFID tag or USB stick. This token will be the sole means for retrieving their preference set from the GPII Preferences Server.

In order to address the case where an anonymous user loses their physical login device, the system will allow them to provide an optional "reset code" or phrase to be able to retrieve their N&P set and re-provision a new physical login device.

In this respect, a user can remain fully anonymous from the perspective of what Cloud4All knows about them.

User can also choose to not have a personal set of preferences at all (or to not use their personal preferences on public/untrusted machines) and instead use one of the predefined preference sets that will be created and available to users. By choosing a popular preference set that closely matches their needs, they can further blur their identity by mixing their usage with that of others.


Two-Tiered Application Authentication

Initially, the Cloud4All Preferences Server was used by clients such as web applications and other environments that don't have a locally-installed Flow Manager instance running in them. As the real-time framework matured, the Cloud-Based Flow Manager was introduced as the primary server with which these applications should interact. The Cloud-Based Flow Manager provides full support for matchmaking, N&P set filtration, and automatic transformation of common terms into application-specific settings.

This migration to the cloud-based Flow Manager enables us to introduce a two-tiered architecture for authenticating applications. Figure 4 illustrates this architecture:

Figure 4. Two-tiered application authentication architecture.

D105.2.two-tiers.png

The first tier, which can access the Preferences Server directly, is restricted to applications and services that have been extensively reviewed and vetted by the GPII/Cloud4All community, ensuring that they respect their privileged access to the user's needs and preferences set. These applications currently include the Flow Manager, the Personal Control Panel, and the Preferences Management Tool. In the future, logging and metrics-gathering tools may also live in this tier alongside Matchmaker-related modules such as the Statistical Matchmaker's analysis engine, which requires access to a broader range of preference sets to derive usage patterns for statistical inference. Only applications that have been vetted and reviewed will be provided with client tokens for this tier. In addition, any use of user data beyond the specific person’s use (including use by the statistical match-maker and overall statistics gathering) will be controlled by the Data Ethics Oversight Committee, and the user will be given the option to opt out if desired.

The second tier includes clients of the Flow Manager. This includes third-party web applications and assistive technologies. Client applications will be issued tokens that identify them to the Flow Manager, and they will receive only settings that have been filtered according to the user's privacy rules and the application's capabilities declaration in the Solutions Registry.

In the long run, this two-tiered approach, combined with the declarative application specification stored in the Solutions Registry, will enable user-friendly tooling to be built that will help Cloud4All/GPII integrators, maintainers, and community stakeholders to review applications, ensuring they are only requesting access to user data that they can actually use for the personalization process.

Data Ethics Oversight Committee

Privacy and security cannot be assured by technology alone. They also depend on the behaviour of both the users and the operators. In this section we briefly discuss these issues. Note that for Cloud4all, user data are extremely limited and are only collected during the Pilot Testing, where strict experimental controls and policies on user data are in place.  Most of what is discussed here therefore deals with user data outside Cloud4all, when GPII goes public.

User behaviour is always one of the key weak spots in any security system. While the operators of the GPII will not have control over user behaviour, they do plan to expend special effort to educate users so that they can make informed decisions and are aware of the potential implications of different options and behaviours that they might choose/use. User-related security vulnerabilities were outlined in D104.2, Security, privacy and ethical policy assurance system requirements and design and in D104.3, Security, privacy and ethical policy assurance gateway, and are not repeated here.

To govern the operators of the GPII, a special Data Ethics Oversight Committee is planned as noted above. This committee would be composed of external experts with backgrounds in privacy and human rights, including members with expertise in data mining and big data security.  This committee will be charged with the creation and oversight of policies on data use for all user data in the GPII. It is expected that access for some uses (such as for the statistical matchmaker and for general population statistics) will be both sought after and be judged beneficial to users, while other access (for directing advertising or profiling individual users) will be sought but judged inappropriate. Since the operators of the GPII could stand to benefit (not personally but as an entity) from uses of these data, the plan within GPII — as recommended by Cloud4all — is to vest decision-making regarding the use of user data with this outside committee.

Conclusion and Next Steps

This architecture provides a technical blueprint for the continued development and refinement of critical Cloud4All infrastructure such as the Preferences Server, Flow Manager, and Preferences Management Tool. It provides a broad perspective of our approach to ensuring privacy, integrity, and security throughout the GPII infrastructure, and as such it represents a larger scope than what can be accomplished by the Cloud4All project alone. Nonetheless, these principles, designs, and technical strategies will be applied in an incremental fashion to ensure that Cloud4All’s technical deliverables will provide users with a safe and reliable means for storing and sharing their accessibility-related data in the cloud.

References

[1] The OAuth 2.0 Authorization Framework, http://tools.ietf.org/html/rfc6749

[7] Mozilla’s Lightbeam, a user tracking visualization tool: http://www.mozilla.org/en-US/lightbeam/

Related Wiki Pages