Registry - choice of database
Final score(for every db solution) = Initial score(IS) from pros and cons + Criterias score;
- 1 Our requirements
- 1.1 Versioning (dated history of edits to each entry)
- 1.2 API providing access to Registry to allow read only access to terms for use with different ontologies (for displaying, sorting, browsing)
- 1.3 Should be able to handle the load
- 1.4 A series of tools will be used to maintain the registry
- 1.5 Ease of connection and work with Node
- 1.6 Limited amount of writes to the registry
- 1.7 For each term multiple translations can exist.
- 2 Load constraints & specifics:
- 3 Questions to answer
- 4 Conclusion
- 5 General Comparison Relational/NoSQL databases
- 6 Relational database
- 7 NoSql databases
- 8 Useful links
- 9 Related Pages
Versioning (dated history of edits to each entry)
- Multiple levels of permissions:
- potentially on field level (eg. Read, Add new, Edit, all as separate items)
- with (and without) moderation
- limited to non-core
- particular translation only
- MySQL - it is possible with aditional tools Some free of them are:
- Webmin : http://doxfer.webmin.com/Webmin/MySQLDatabaseServer);
- Liquibase: http://www.liquibase.org/
- MongoDB - Documental structure makes incremental versioning quite tricky, http://stackoverflow.com/questions/3507624/mongodb-nosql-keeping-document-change-history
- CouchDB - http://stackoverflow.com/questions/1333922/couchdb-versioning-strategy , each revision holds a full copy of the document(doesn’t support incremental versioning also)
API providing access to Registry to allow read only access to terms for use with different ontologies (for displaying, sorting, browsing)
- MySQL - Yes, it is possible using integrated mysql privileges (column-level and table-level): http://www.databasejournal.com/features/mysql/article.php/3311731/An-introduction-to-MySQL-permissions.htm
- MongoDB - “MongoDB access control is currently only available at the database level.”: http://stackoverflow.com/questions/7883649/can-you-have-collection-level-permissions-in-mongodb
- CouchDB - only on database level: http://marcgrabanski.com/articles/exiciting-features-in-couchdb
Should be able to handle the load
- MySQL - “Look at Google or Facebook who deal with large amounts of data - they use mysql.”
- MongoDB ~10 000 records would not be a problem
- CouchDB ~10 000 records would not be a problem
A series of tools will be used to maintain the registry
An API should allow for this.
Ease of connection and work with Node
- MySQL - There exists an extension: https://github.com/joyent/node/wiki/modules
- MongoDB - There exists an extension: https://github.com/joyent/node/wiki/modules; Node has a native MongoDB driver.
- CouchDB - There exists an extension: https://github.com/joyent/node/wiki/modules
Limited amount of writes to the registry
- MySQL - Implementation with some appropriately defined trigger or an extra small table.
- MongoDB - There aren’t any triggers right now: http://stackoverflow.com/questions/7883649/can-you-have-collection-level-permissions-in-mongodb The alternative is multi-updating.
For each term multiple translations can exist.
- MySQL - Can be easily implemented with an additional table for the translations.
Load constraints & specifics:
- The registry is expected to contain 1,000 - 10,000 terms.
- Each term will have a set of properties (description, value range, name, ID, etc).
- For each term, multiple translations can exist.
- There will be a limited amount of writes to the registry - highest in the beginning, then dropping.
- Generally more reads are expected, but even this is not expected to be that high at any point in time, as this is a basic registry used for looking up terms and their properties - not used programmatically by the GPII to lookup terms for each profile loaded, or the like.
- Generally low usage
Questions to answer
- Do we need incremental versioning, knowing the number of common terms will be in the range 1000 - 10 000?
- How important is the scalability in our case?
- Will the choice of MySQL for the common terms registries add serious inconvenience, knowing that Preference server uses CouchDB?
- Would it be worth adding a SPARQL front end to allow Semantic Web tools to query the Registry? Some tools for doing this exist, e.g. SquirrelRDF (for relational databases; see also RDF and SQL in the W3C wiki) and AllegroGraph (for MongoDB).
Every one of the three variants will not be a bad choice. However, CouchDB provides a RESTfull HTTP API, which is a big advantage.To make final decision it will be quite important to have a better notion of concrete use cases.Making such a list is expected to be the subject of resent discussions.
General Comparison Relational/NoSQL databases
- Simplicity of relational model.
- Solid theoretical basis and normalization rules.
- Simple and very powerful SQL language which was resembling human language.
- Easy data manipulation.
- ACID properties.
- High level of standardization.
- Standardized API’s.
- The more knowable way for the developers
- Hard to store Operator fields(registries)
- Performance problems associated with re-assembling simple data structures into their more complicated real-world representations.
- Lack of support for complex base types, e.g., drawings.
- SQL is limited when accessing complex data.
- Knowledge of the database structure is required to create ad hoc queries.
- Locking mechanisms defined by RDBMSs do not allow design transactions to be supported, e.g., the "check in" and "check out" type of feature that would allow an engineer to modify a drawing over the course of several working days.
- Good for storing and retrieving great quantities of data, not the relationships between the elements
(We don't expect many relationships between registries);
- Mostly open source.
- Horizontal scalability. There’s no need for complex joins and data can be easily shared and processed in parallel.
- Support for Map/Reduce. This is a simple paradigm that allows for scaling computation on cluster of computing nodes.
- No need to develop fine-grained data model – it saves development time.
- Easy to use.
- Very fast for adding new data and for simple operations/queries.
- No need to make significant changes in code when data structure is modified.
- Ability to store complex data types (for document based solutions) in a single item of storage.
- Have no fixed schemas and allow schema migration without downtime
- Immaturity. Still lots of rough edges.
- Possible database administration issues. NoSQL often sacrifices features that are present in SQL solutions “by default” for the sake of performance. For example, one needs to check different data durability modes and journaling in order not to be caught by surprise after a cold restart of the system. Memory consumption is one more important chapter to read up on in the database manual because memory is usually heavily used.
- No indexing support (Some solutions like MongoDB have indexing but it’s not as powerful as in SQL solutions). It is expected that one of the most probable requests will be to see current user term preferences, so INDEXING is quite important! // For MongoDB indexing: http://docs.mongodb.org/manual/core/indexes/
- No ACID (Some solutions have just atomicity support on single object level). (I think atomicity in not so important in our case).
- Bad reporting performance.
- Complex consistency models (like eventual consistency). CAP theorem states that it’s not possible to achieve consistency, availability and partitioning tolerance at the same time. NoSQL vendors are trying to make their solutions as fast as possible and consistency is most typical trade-off.
- Absence of standardization. No standard APIs or query language. It means that migration to a solution from different vendor is more costly. Also there are no standard tools (e.g. for reporting)
- Still don’t have incremental backup
- Intro to MongoDB: http://www.slideshare.net/drumwurzel/intro-to-mongodb#btnNext
- Intro to CouchDB(+RESTful API explanation): http://nefariousdesigns.co.uk/couchdbs-restful-api.html
- CouchDB advantages - http://marcgrabanski.com/articles/exiciting-features-in-couchdb
- Mongo vs CouchDB - http://stackoverflow.com/questions/3375494/nosql-mongodb-vs-couchdb?rq=1
- databases in general - http://www.aosabook.org/en/nosql.html