Different database methodologies

Solution

Oracle NoSQL : Key Value Stores

In its simplest form, Oracle NoSQL Database implements a map from user-defined keys (formatted as Strings) to opaque data items. It records version numbers for key/data pairs, but maintains the single latest version in the store. Applications need never worry about reconciling incompatible versions because Oracle NoSQL Database uses single-master replication; the master node always has the most up-to-date value for a given key, while read-only replicas might have slightly older versions. Applications can use version numbers to ensure consistency for read-modify-write operations.

Keys

Oracle NoSQL Database hashes keys to provide good distribution over a collection of computers that provide storage for the database. However, applications can take advantage of subkey capabilities to achieve data locality.

A key is the concatenation of a Major Key Path and a Minor Key Path, both of which are specified by the application. All records sharing a Major Key Path are co-located to achieve data locality. Within a co-located collection of Major Key Paths, the full key, comprised of both the Major and Minor Key Paths, provides fast, indexed lookup.

For example, an application storing user profiles might use the profile-name as a Major Key Path and then have several Minor Key Paths for different components of that profile such as email address, name, phone number, etc. Because applications have complete control over the composition and interpretation of keys, different Major Key Paths can have entirely different Minor Key Path structures. Continuing our previous example, one might store user profiles and application profiles in the same store and maintain different Minor Key Paths for each.

Prefix key compression makes storage of key groups efficient. 

Values

The value field is stored as a arbitrary byte array. Oracle NoSQL Database makes no internal assumptions about the structure or the data stored within the byte array. Mapping of the byte arrays to data structures (serialization and deserialization) is left up to the application. Applications with very simple data requirements may use values containing simple, fixed record structures. Other applications may use values containing complex structures, a set of named properties (name-value pairs) or other types of self-describing data formats.

There are no restrictions on the size or structure of the value field.

A key-value database is ideal when most of the access to data is done using a key, which is a unique identifier for some item of data. The key-value approach is somewhat similar to the document approach. Both offer flexible schemata, but the data in a key-value store isn’t structured using a markup language like JSON. Instead, the key-value database uses a key to get access to a bunch of data, where the data can vary from record to record.

A key-value database is similar to a document store in many ways; however, a document store embeds metadata associated with the content, enabling the user to query the data based on its contents. Key-value databases excel at session management, serving ad content and managing user or product profiles. When data is encoded in many different ways without a rigorous schema, using a key-value database can make sense.

One of the leading key-value DBMSes is Redis, an open source, BSD-licensed, key-value data store. Redis is set up using a configuration file that contains parameters to specify the working directory and to control Redis’ behavior. At its core, Redis is a key-value store, but it also supports different kinds of data structures. Whereas with traditional key-value stores you associate string keys to string values, in Redis the value isn’t limited to a simple string but can also hold more complex data structures.

Another NoSQL key-value DBMS option is Riak from Basho Technologies. Riak is a fault-tolerant, highly available, scalable, distributed multimodel DBMS. Riak open source is free under the Apache 2 license whereas Riak Enterprise requires a commercial license agreement, sold by Basho Technologies.

Cluster Point: Document Store

Cluster point database is a document-oriented database server platform for storage and processing of XML and JSON data in a distributed fashion on large clusters of commodity hardware. Database architecture blends ACID-compliant OLTP transactions, full-text search and analytics in the same code, delivering high availability, fault-tolerance, data replication and security.

Cluster point database enables to perform transactions in a distributed document database model in the same way as in a SQLdatabase. Users can perform secure real-time updates, free text search, analytical SQL querying and reporting at high velocity in very large distributed databases containing XML or JSON document type data. Transactions are implemented without database consistency issues plaguing most of NoSQL databases and can safely run at high-performance speed previously available only with relational databases.Real time Big data analytics, replication, load sharing and high-availability are standard features of Cluster point database software platform.

Cluster point database enables web-style free text search with natural language keywords and programmable relevance sorting of results. Constant and predictable search response time with latency in milliseconds and high quality of search results are achieved using policy-based inverted indexation and unique relevance ranking method. Cluster point database version 4 supports JS/SQL query language. Classic SQL queries can be combined with free text search and with custom distributed computing functions written in JavaScript, executed in a single REST API call.

Cluster point Server is database software for high-speed storage and large-scale processing of XML and JSON data on clusters of commodity hardware. It works as a schema free document-oriented DBMS platform with an open source API. Cluster point solves the problem of latency in Big data. End-users can instantly search billions of documents and do fast analytics in structured and unstructured data.

Cluster point database provides distributed, ACID-compliant transactions, including basic SQL support, in a document model database that is massively scalable for Big datavolumes. Distributed transactions, data storage, search and analytics can be performed at high performance and high availability, while delivering strong database consistency and security. It gives Cluster point performance and scalability advantage over other NoSQL document databases that are compromising on security and integrity of customer data, typically providing only limited eventual consistency at high availability.

Another distinction is programmable ranking index that can be flexibly customized through relevance rules assigned in the Document Policy configuration file. It is a small XML configuration file accompanying each Cluster point database. Database search behavior can be quickly changed through configuring of ranking index rules vs modifying software code. The increasing importance of ranking is directly derived from the explosion in the volume of data handled by current applications. The user would be overwhelmed by too many unranked results. Furthermore, the sheer amount of data makes it almost impossible to process queries in the traditional compute-then-sort approach. Customer application software code can be simplified by delegating most indexing and search sorting details, including ranking algorithms, to the Document policy configuration attributes in Cluster point database. Document policy, when customized for a particular web or mobile application need, determines the particular ranking index organization at the physical storage level by presorting the actual index data for custom relevance algorithms. Developers can avoid most of complex SQL programming for data sorting and grouping in their application software code, while database hardware can be liberated from the excessive Big data sorting per each database query. Instead the Cluster point database ranking index delivers fast search and relevance sorting functionality, without performance degradation characteristic to relational SQL databases.

Ranking index method, applied to document database model, enables Cluster point to outperform SQL databases at search by several orders of magnitude. It solves information overload and latency problem for interactive web and mobile applications processing Big data. Today limited-size mobile device screens and network bandwidth restrictions prevent users requesting and processing large size data volumes per each query. Database search and querying need to be interactive and transactional to satisfy Internet users. Cluster point ranking index was designed for this computing model. It extracts relevant data first and returns information page by page in decreasing relevance. For instance, using only free text search, latency in large databases containing billions of documents will be milliseconds, while relevance ranking will prevent overwhelming end-user with too much low-quality search results. This is also a crucial design element for distributed document database architecture: it makes its index scalable so that it can be safely shared across large cluster of servers without significant performance loss at data injection, free text search and access.

Additionally Cluster point ranking index can be fine-tuned by developers to match the natural language terms in queries to the most relevant textual data content in a customer database. When querying a distributed database with free text format keywords in natural language or with phrases, ranking index sorts out the best relevant documents where query is matching textual content parts in the database, taking into account natural language density, word statistics and language-specific grammatics attributes (incl. stemming, spelling, collation), performing automatic self-merged joins. Very few database products support similar type of self-merge joins.

Adjusting ranking rules, customers can configure various grouping, ordering and positioning algorithms for their search results through the ranking index so that it starts delivering the best end-user search experience. A set of ranking configuration rules, once established for a particular database, is then being applied and maintained automatically by Cluster point database when customer data is loaded or updated through Cluster point database CRUD API commands.

Developers can freely use full text search as the fastest information access method in Cluster point databases, while having capability to flexibly query the database structure with standard analytics using SQL. In Cluster point database both methods can be combined in a single query, enabling combined analytical and search queries in mixed structured and unstructured data content.

BigTable: Wide Column Store

Bigtables clones are a type of NoSQL database that emerged from Google’s seminal Bigtable paper. Bigtables are a highly distributed way to manage tabular data. These tables of data are not related to each other like they would be in a traditional Relational Database Management System (RDBMS). Here are the most important features from popular database choices.

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterrupted array of bytes.

Rows

Bigtable maintains data in lexicographic order by row key. The row range for a table is dynamically partitioned. Each row range is called a tablet, which is the unit of distribution and load balancing. As a result, reads of short row ranges are efficient and typically require communication with only a small number of machines. Clients can exploit this property by selecting their row keys so that they get good locality for their data accesses. For example, in Webtable, pages in the same domain are grouped together into contiguous rows by reversing the hostname components of the URLs. For example, we store data for maps.google.com/index.html under the key com.google.maps/index.html. Storing pages from the same domain near each other makes some host and domain analyses more efficient.

Column Families

Column keys are grouped into sets called column families, which form the basic unit of access control. All data stored in a column family is usually of the same type (we compress data in the same column family together). A column family must be created before data can be stored under any column key in that family; after a family has been created, any column key within the family can be used. It is our intent that the number of distinct column families in a table be small (in the hundreds at most), and that families rarely change during operation. In contrast, a table may have an unbounded number of columns. A column key is named using the following syntax: family: qualifier. Column family names must be printable, but qualifiers may be arbitrary strings. An example column family for the Webtable is language, which stores the language in which a web page was written. We use only one column key in the language family, and it stores each web page’s language ID. Another useful column family for this table is anchor; each column key in this family represents a single anchor, as shown in Figure 1. The qualifier is the name of the referring site; the cell contents is the link text. Access control and both disk and memory accounting are performed at the column-family level. In our Webtable example, these controls allow us to manage several different types of applications: some that add new base data, some that read the base data and create derived column families, and some that are only allowed to view existing data. 

Timestamps

Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp. Bigtable timestamps are 64-bit integers. They can be assigned by Bigtable, in which case they represent “real time” in microseconds, or be explicitly assigned by client applications. Applications that need to avoid collisions must generate unique timestamps themselves. Different versions of a cell are stored in decreasing timestamp order, so that the most recent versions can be read first.

The Bigtable API provides functions for creating and deleting tables and column families. It also provides functions for changing cluster, table, and column family metadata, such as access control rights. Client applications can write or delete values in Bigtable, look up values from individual rows, or iterate over a subset of the data in a table. Figure 2 shows C++ code that uses a Row Mutation abstraction to perform a series of updates. The call to Apply performs an atomic mutation to the Webtable: it adds one anchor to www.cnn.com and deletes a different anchor.

Bigtable is built on several other pieces of Google infrastructure. Bigtable uses the distributed Google File System (GFS) [17] to store log and data files. A Bigtable cluster typically operates in a shared pool of machines that run a wide variety of other distributed applications, and Bigtable processes often share the same machines with processes from other applications. Bigtable depends on a cluster management system for scheduling jobs, managing resources on shared machines, dealing.

Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers. A tablet server stops serving its tablets if it loses its exclusive lock: e.g., due to a network partition that caused the server to lose its Chubby session. (Chubby provides an efficient mechanism that allows a tablet server to check whether it still holds its lock without incurring network traffic.) A tablet server will attempt to reacquire an exclusive lock on its file as long as the file still exists. If the file no longer exists, then the tablet server will never be able to serve again, so it kills itself. Whenever a tablet server terminates (e.g., because the cluster management system is removing the tablet server’s machine from the cluster), it attempts to release its lock so that the master will reassign its tablets more quickly.

Wide column stores have tables which contains columns. The small difference is that Wide ColumnStores takes a hybrid approach mixing the declarative characteristics game of relational databases with the key-value pair based and totally variables schema of key-value stores. Wide Column databases stores data tables as sections of columns of data rather than as rows of data.

And so that’s an exception, but if you take a look at a wide column store called Big Table which is proprietary at Google that Google tends to use with his MapReduce computation engine for working with large datasets. This is best example where we will start understanding why wide column stores are used in these situations at large business sites. Because Big Table and MapReduce are proprietary to Google, but Google also published details of each publically.

Apache Graph: Graph Oriented Stores

Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilizes Apache Hadoop’sMapReduce implementation to process graphs. Facebook used Giraph with some performance improvements to analyze one trillion edges using 200 machines in 4 minutes.Giraph is based on a paper published by Google about its own graph processing system called Pregel.It can be compared to other Big Graph processing libraries such as Cassovary.

Apache Giraph is an iterative graph processing framework, built on top of Apache Hadoop.

The input to a Giraph computation is a graph composed of vertices and directed edges, see Figure 1. For example vertices can represent people, and edges friend requests. Each vertex stores a value, so does each edge. The input, thus, not only determines the graph topology, but also the initial values of vertices and edges.

As an example, consider a computation that finds the distance from a predetermined source person s to any person in the social graph. In this computation, the value of an edge E is a floating point number denoting distance between adjacent people. The value of a vertex V is also a floating point number, representing an upper bound on the distance along a shortest path from the predetermined vertex s to v. The initial value of the predetermined source vertex s is 0, and the initial value for any other vertex is infinity.

Computation proceeds as a sequence of iterations, called supersteps in BSP. Initially, every vertex is active. In each superstep each active vertex invokes the Compute method provided by the user. The method implements the graph algorithm that will be executed on the input graph. Intuitively, you should think like a vertex when designing a Giraph algorithm, it is vertex oriented. A graph oriented approach is discussed in GIRAPH-818.

The Compute method:

  • receives messages sent to the vertex in the previous superstep,
  • computes using the messages, and the vertex and outgoing edge values, which may result in modifications to the values, and
  • may send messages to other vertices.

The Compute method does not have direct access to the values of other vertices and their outgoing edges. Inter-vertex communication occurs by sending messages.

In our single-source shortest paths example, a Compute method will: (1) find the minimum value arriving on any message, (2) if that value is less than the current value of the vertex, then (3) the minimum will be adopted as the vertex value, and (4) the value plus the edge value will be sent along every outgoing edge.

There is a barrier between consecutive supersteps. By this we mean that: (1) the messages sent in any current superstep get delivered to the destination vertices only in the next superstep, and (2) vertices start computing the next superstep after every vertex has completed computing the current superstep.

The graph can be mutated during computation by adding or removing vertices or edges. Our example shortest paths algorithm does not mutate the graph.

Values are retained across barriers. That is, the value of any vertex or edge at the beginning of a superstep is equal to the corresponding value at the end of the previous superstep, when graph topology is not mutated. For example, when a vertex has set the distance upper bound to D, then at the beginning of the next superstep the distance upper bound will still be equal D. Of course the vertex can modify the value of the vertex and of the outgoing edges during any superstep.

Any vertex can stop computing after any superstep. The vertex simply declares that it does not want to be active anymore. However, any incoming message will make the vertex active again.

The computation halts after the vertices have voted to halt and there are no messages in flight. Each vertex outputs some local information, which usually amounts to the final vertex value.

Giraph directly interfaces with our internal version of HDFS (since Giraph is written in Java) and talks directly to Hive.  Since Giraph runs as a MapReduce job, we can leverage our existing MapReduce (Corona) infrastructure stack with little operational overhead.   With respect to performance, at the time of testing Giraph was faster than the other frameworks – much faster than Hive.   Finally, Giraph’s graph-based API, inspired by Google’s Pregel and Leslie Valiant’s bulk synchronous parallel computing model, supports a wide array of graph applications in a way that is easy to understand.  Giraph also adds several useful features on top of the basic Pregel model that are beyond the scope of this article, including master computation and composable computation.