Quality Infrastructure

From wiki.gpii
Jump to: navigation, search

Overview

One of the goals of the Developer Space is to assist developers with discovering accessible components and frameworks. However, discovering new components and tools is only the first step; the Developer Space should also help developers find and choose amongst stable solutions that can be reliably integrated into production applications.

When confronted with the diversity and variety of tools available to them, prospective developers need information about quality and stability to make an informed choice. The cost of using poorly-tested or unsupported tools can be significant, and difficulties with such tools often arise only later in the software development lifecycle, which further compounds such costs and risk to the project. While quality is often arguably subjective and undoubtedly context-specific, there are a number of measures that can help provide developers with a more complete picture about the status, maturity, and activity of a project—criteria that will allow them to make more informed choices for themselves about the tools they employ.

At the same time, it is very difficult for many projects and small open source communities to individually support the infrastructure that is required to provide sufficient indicators of quality, stability, and healthy community growth to their prospective users. Creating, configuring, and maintaining such infrastructure requires significant engineering effort. The Prosperity4All Quality Infrastructure (QI) will address this by providing automated tools for continuous integration, testing, configuration and deployment, as well as collecting and sharing information about the health of projects within the Developer Space.

For further background information from a functional perspective about how the QI relates to the Developer Space, and the Prosperity4All ecosystem in general, please refer to D103.1 Foundational Design for Prosperity4All, particularly Figure 7 titled 'Interaction flows when user is searching for products or services.'

Motivations

The open source ecosystem is thriving today in part due to the popularity of code sharing services such as GitHub and Bitbucket. Developers (such as those represented by the SP1 persona James McFarland) are able to find software projects online that others have provided and use them in their own projects. But the sheer volume of tools and libraries available at one's disposal presents a challenge for developers such as James, who have to devote a considerable amount of time evaluating a library's quality before committing to its use. A software project's quality can be evaluated based on a variety of "health indicators," such as how active it is, how many contributors it has, whether it uses automated tests, and how often those tests fail when system changes are integrated the primary repository. For example, if James searched for "javascript charting library" on GitHub, he would have to evaluate each of the resulting projects based on these criteria—a manual and time consuming process. The P4A Quality Infrastructure (QI) will support the collection and organization of information that can be used to judge the “health” of a project – information that will help developers like James make decisions about projects in the Developer Space.

In addition to evaluating the quality of other projects, developers like James will be able to make use of Continuous Integration(CI) practices in their own projects, including automated builds and tests. Depending on the type of project and tests, there might be dependencies on third party software, a specific operating system, a particular desktop environment, etc. Configuring these independent pieces into a cohesive environment requires a considerable amount of engineering and automation, something that many developers may either not have time or experience to do himself. The QI will allow developers to declare their build and testing requirements in declarative (JSON-based) manifests, include them in their source code repositories, and—with minimal intervention—have the QI report the health of their project at any given time.

Component Architecture

The proposed architecture for the Quality Infrastructure includes custom-developed components to deliver functionality that existing tools do not offer, as well as existing, commonly-used open source solutions such as CI servers, virtual machine provisioners, source code hosting providers, etc. Components will be loosely coupled and wherever possible expose a REST API that other components can use for communication. Components will also abstract third party solution choices from other parts of the architecture to allow for flexibility and changes in the future.

Figure 1 provides a logical representation of the QI. The rest of this section describes the components in more detail.

Qi logical arch diagram.png

Figure 1: A logical representation of the Quality Infrastructure

Manifest

A manifest can be considered a form of contract between a developer and the Quality Infrastructure. The developer creates a manifest using JSON and stores it in their source code repository. This manifest contains metadata about the project (such as author name, contact information, project name, project URL, keywords, etc.), as well as declarations of the project’s build and test requirements. A basic example of a project manifest file is provided below:

{"name": "account/project",
"version": "1.3.1",
"description": "Brief project description",
"author": "Your Name",
"email": "you.name@example.org",
"support": {
"source": "https://github.com/account/project.git",
"issues": "https://github.com/account/project/issues/",
"docs": "https://example.org/docs/"
},
"keywords": [
"example-keyword",
"another-example"
],
"licenses": [
"MIT"
],
"platforms": {
"fedora": "22",
"windows": "8.1"
},
"applicationStack": {
"nodejs": "0.10.36"
},
"buildCommands": [
"npm install",
"grunt"
],
"testCommands": [
"node tests/all-tests.js"
]}

The manifest data will define the workflow of several of the QI’s subsystems. The benefit of this approach is that developers do not need to be aware of implementation details within the Quality Infrastructure in order to use it. The manifest file, located within the project’s own repository (and thus less likely to become stale or fall out of sync with the code), provides a primary, minimal interface for interacting the Quality Infrastructure. Developers can just declare their requirements and have build and test orchestration take place automatically.

Webhooks are event notifications generated by service providers. Code hosting providers such as Bitbucket or GitHub allow third parties to subscribe to various event notifications by specifying a URI. When an event takes place, such as source code getting pushed to a repository, GitHub will deliver a payload via an HTTP request to the URI specified by the third party. The payload and request are typically JSON and POST respectively. The third party, in this case the QI and its Orchestrator, can consume the JSON data and take further action.

Orchestrator

The Orchestrator will act as a gateway for all requests within the Quality Infrastructure. In a model-view-controller context, the Orchestrator can be considered the 'controller'. Its primary role is to abstract implementation-specific details and protocols from other components. It accomplishes this by providing a uniform API, which, when used, causes the Orchestrator to designate one or several components (using their own APIs) to fulfill requests behind the scenes.

The Orchestrator will process a number of possible forms of data:

  • webhook events: The Orchestrator will be responsible for ensuring that incoming webhook event information generated by external source code hosting providers is processed and that the necessary workflow is triggered in the Continuous Integration Service layer.
  • manifest changes: The Orchestrator will be responsible for updating the project's job configuration in the CI Service layer when the manifest changes and then triggering the job to run.
  • monitoring CI jobs: Once a job run has been triggered, the Orchestrator will monitor its status by polling the CI Service layer and then request that the Persistence Layer update a project's build history and health statistics.

routing metrics requests: The Orchestrator will route requests to the Persistence Layer when metrics need to be stored or retrieved.

Continuous Integration Service

The role of the Continuous Integration Service is to execute builds and tests in the environment requested in a project's manifest. It is comprised of several curated third party technologies. Its main entry point is the Jenkins REST API, also referred to as the 'JSON API' in the Jenkins documentation. The only component that will have access to the Jenkins API will be the Orchestrator. The Orchestrator will be able to create, update, delete, or trigger jobs provided by the CI Service.

Persistence Layer

The Persistence Layer (the 'model' in the model-view-controller context) is responsible for storing and serving all the dashboard related metrics that are fetched from sources such as the CI Service or a source code repository hosting service such as GitHub. It will work with a Data Store (described in the 'Technical Implementation' section) that will store data as JSON documents. The Persistence Layer will be concerned with metrics such as:

  • the number of contributors for a repository,
  • the number of repository commits,
  • the number of passing and failed builds for a project, and
  • the number of passing or failed tests for a project.

Dashboard

One of the goals of the Dashboard will be to provide a visual summary of a project's health and allow developers to compare several projects efficiently. The Dashboard (the ‘view’ in the model-view-controller context) will provide a web interface that can be used for the following tasks:

  • displaying project information,
  • performing DSpace project searches, and
  • visualizing quality metrics

The Dashboard will be a component implemented in HTML, CSS, and JavaScript (using SP2’s Infusion toolkit), which can be embedded into content management systems such as the Developer Space. For its tasks, the Dashboard will use the Orchestrator's API. For exploring quality metrics from a source that is a resource on the Internet, the Dashboard will use the resource's API directly. Initially, the most significant third party API that the Dashboard will rely on will be Github's.

For more information about which DSpace components would be involved in these scenarios, see the Integration with Developer Space section below.

Work related to the Dashboard user interface design is currently in progress. Please refer to Appendix A for mockups.

Integration with Developer Space

In addition to managing requests within the QI, the Orchestrator is also responsible for relaying requests to components within the Developer Space when data resides outside of the QI. The DSpace will provide web and search components that expose their own REST APIs. This will allow the Orchestrator, and in the future possibly other components, to issue requests to DSpace when necessary. For example, if metadata in a project's manifest is updated, the Orchestrator will request that the DSpace Search component reindex its data and that the DSpace web component reflect the changes in its data store.

Figure 2 illustrates this cooperation between the QI and the DSpace:

Qi dspace integration diagram.png

Figure 2: How the DSpace components interact with each other via the Orchestrator when project manifest changes take place

This diagram illustrates the Orchestrator’s workflow:

  • A developer makes changes to their project metadata in the manifest and pushes their commits to the project's repository.
  • The repository hosting provider notifies the QI that changes have taken place in the developer's repository by sending a webhook payload.
  • The Orchestrator receives the webhook payload and begins the process of delegating tasks to various components:
    • The DSpace Web component is asked to update its data store to reflect the changes in the project's manifest;
    • The DSpace Search component is asked to update its index as well,
    • Search requests originating from the Dashboard are routed to the DSpace Search component which will provide data from its updated index

Technical Implementation

This section describes the technologies being considered for the various components of the QI. The technology choices listed in this section are represent the current best candidates we’ve discovered for the task; as time progresses, more suitable choices might emerge. One of the objectives of the QI architecture is to be flexible enough so that individual technology choices can be replaced without severely impacting other components.

Figure 3 provides an illustration of how DSpace components will be deployed alongside various types of third party technical solutions:

Qi technical implementation diagram.png

Figure 3: How DSpace components will be deployed alongside various types of third party technical solutions

The QI will be a distributed environment, using a diverse set of open source technologies. One of the objectives of the QI is to automate the provisioning of all these disparate solutions so that robust and error-free deployments can be achieved, saving developers from having to do this themselves. Configuration management is essential to achieve this goal, and we will be using Ansible for this task. Every component and technology choice will be deployable using Ansible playbooks and roles.

It should also be noted that it will be possible to deploy several instances of the Orchestrator and Persistence Layer components and use a reverse proxy such as Nginx to scale them horizontally. Using this approach, we can ensure the longterm viability of the QI by increasing compute capacity as more projects are hosted in the DSpace.

Both the Orchestrator and Persistence Layer will rely heavily on Node.js, a runtime environment for developing JavaScript applications.

At a basic level, the Continuous Integration Service will be composed of CentOS servers running Jenkins in Docker containers. An organization in the QI could have one or more projects associated with it. Using lightweight containers, we will be able to assign a Jenkins instance to each organization. This strategy will assist with scaling the CI Service since more active projects could have extra hardware resources allocated to their Jenkins containers. As mentioned earlier, the point of entry into the CI Service is the Jenkins REST API, which the Orchestrator will be able to use to start jobs. The Orchestration will also use Jenkins Job Builder (JJB) (which uses the Jenkins API) to create new jobs or update existing ones based on manifest changes. Leveraging JJB will reduce the implementation effort at the component level. Each Jenkins instance will use containers or virtual machines managed using Vagrant for their workloads.

The data store used by the Persistence Layer will be CouchDB. Its notable features are that it stores its documents as JSON objects, does not require schema, and provides numerous replication options.

The Dashboard will be a web application implemented using JavaScript and the Fluid Infusion framework.

Development Activities

The following list outlines the primary development activities that are currently underway to implement the Quality Infrastructure. We are using an iterative approach where a "minimum viable product" version will be implemented first, supporting Node.js and Web-based projects initially. From there, we will progressively add features and flesh out the Quality Infrastructure's full architecture.

  • Dashboard design - mockups of the dashboard have been designed. The first iteration of these mockups are included in Appendix A
  • Dashboard implementation - components and services responsible for managing and rendering the dashboard will be developed based on the designs. A prototype is available at https://qi.gpii.net
  • Manifest specification - a detailed schema for the manifest format to be used by projects tracked by the QI and DSpace will be developed and evaluated with potential users
  • Persistence Layer implementation - to reduce unnecessary coupling of code and systems, a persistence layer abstracting away the choice of the specific data store used by the QI (CouchDB) will be developed
  • Orchestrator implementation - components and services responsible for the overall coordination, collaboration, and control of the systems making up the QI will be developed
  • CI Service - the continuous integration (CI) service that will build, test, and report on tracked projects will be developed and evaluated with potential users

Projects Using the QI

The following projects are using development VM configuration and CI services provided by the QI:

Appendix A

This section contains mockups that should provide an idea of design that will guide the development of the Dashboard and the Persistence Layer.

Dashboard displaying graphs of quality metrics.png

Figure 4: Dashboard displaying graphs of quality metrics

Dashboard displaying build history of a project.png

Figure 5: Dashboard displaying build history of a project

DSpace project overview page.png

Figure 6: DSpace project overview page

Dashboard displaying DSpace search results with an option to compare quality metrics.png

Figure 7: Dashboard displaying DSpace search results with an option to compare quality metrics of several projects

Dashboard displaying a metrics comparison view.png

Figure 8: Dashboard displaying a metrics comparison view