Registry - choice of database

From wiki.gpii
Jump to: navigation, search

Possibilities: MySQL,MongoDB,CouchDB
Final score(for every db solution) = Initial score(IS) from pros and cons + Criterias score;

Our requirements

Versioning (dated history of edits to each entry)

  • Multiple levels of permissions:
  • potentially on field level (eg. Read, Add new, Edit, all as separate items)
  • with (and without) moderation
  • limited to non-core
  • particular translation only

API providing access to Registry to allow read only access to terms for use with different ontologies (for displaying, sorting, browsing)

Should be able to handle the load

  • MySQL - “Look at Google or Facebook who deal with large amounts of data - they use mysql.”
  • MongoDB ~10 000 records would not be a problem
  • CouchDB ~10 000 records would not be a problem

A series of tools will be used to maintain the registry

An API should allow for this.

Ease of connection and work with Node

Limited amount of writes to the registry

For each term multiple translations can exist.

  • MySQL - Can be easily implemented with an additional table for the translations.

Load constraints & specifics:

  • The registry is expected to contain 1,000 - 10,000 terms.
  • Each term will have a set of properties (description, value range, name, ID, etc).
  • For each term, multiple translations can exist.
  • There will be a limited amount of writes to the registry - highest in the beginning, then dropping.
  • Generally more reads are expected, but even this is not expected to be that high at any point in time, as this is a basic registry used for looking up terms and their properties - not used programmatically by the GPII to lookup terms for each profile loaded, or the like.
  • Generally low usage

Questions to answer

  1. Do we need incremental versioning, knowing the number of common terms will be in the range 1000 - 10 000?
  2. How important is the scalability in our case?
  3. Will the choice of MySQL for the common terms registries add serious inconvenience, knowing that Preference server uses CouchDB?


Every one of the three variants will not be a bad choice. However, CouchDB provides a RESTfull HTTP API, which is a big advantage.To make final decision it will be quite important to have a better notion of concrete use cases.Making such a list is expected to be the subject of resent discussions.

General Comparison Relational/NoSQL databases

Relational database


  • Simplicity of relational model.
  • Solid theoretical basis and normalization rules.
  • Simple and very powerful SQL language which was resembling human language.
  • Easy data manipulation.
  • ACID properties.
  • High level of standardization.
  • Standardized API’s.
  • The more knowable way for the developers


  • Hard to store Operator fields(registries)
  • Performance problems associated with re-assembling simple data structures into their more complicated real-world representations.
  • Lack of support for complex base types, e.g., drawings.
  • SQL is limited when accessing complex data.
  • Knowledge of the database structure is required to create ad hoc queries.
  • Locking mechanisms defined by RDBMSs do not allow design transactions to be supported, e.g., the "check in" and "check out" type of feature that would allow an engineer to modify a drawing over the course of several working days.

NoSql databases


  • Good for storing and retrieving great quantities of data, not the relationships between the elements

(We don't expect many relationships between registries);

  • Mostly open source.
  • Horizontal scalability. There’s no need for complex joins and data can be easily shared and processed in parallel.
  • Support for Map/Reduce. This is a simple paradigm that allows for scaling computation on cluster of computing nodes.
  • No need to develop fine-grained data model – it saves development time.
  • Easy to use.
  • Very fast for adding new data and for simple operations/queries.
  • No need to make significant changes in code when data structure is modified.
  • Ability to store complex data types (for document based solutions) in a single item of storage.
  • Have no fixed schemas and allow schema migration without downtime
  • Queries can return specific fields of documents and also include user-defined JavaScript functions
  • JavaScript can interact with the database directly


  • Immaturity. Still lots of rough edges.
  • Possible database administration issues. NoSQL often sacrifices features that are present in SQL solutions “by default” for the sake of performance. For example, one needs to check different data durability modes and journaling in order not to be caught by surprise after a cold restart of the system. Memory consumption is one more important chapter to read up on in the database manual because memory is usually heavily used.
  • No indexing support (Some solutions like MongoDB have indexing but it’s not as powerful as in SQL solutions). It is expected that one of the most probable requests will be to see current user term preferences, so INDEXING is quite important! // For MongoDB indexing:
  • No ACID (Some solutions have just atomicity support on single object level). (I think atomicity in not so important in our case).
  • Bad reporting performance.
  • Complex consistency models (like eventual consistency). CAP theorem states that it’s not possible to achieve consistency, availability and partitioning tolerance at the same time. NoSQL vendors are trying to make their solutions as fast as possible and consistency is most typical trade-off.
  • Absence of standardization. No standard APIs or query language. It means that migration to a solution from different vendor is more costly. Also there are no standard tools (e.g. for reporting)
  • Still don’t have incremental backup

Useful links

  1. Intro to MongoDB:
  2. Intro to CouchDB(+RESTful API explanation):
  3. CouchDB advantages -
  4. Mongo vs CouchDB -
  5. databases in general -

Related Pages