Graph Database
A graph database is a system with edges and relationships between them that directly relate to big data set elements in the database. Large graphs are a key concern in a wide variety of machine learning tasks such as within medicine, social network analysis, or computational sciences in general. A benefit in contrast to traditional relational database systems is that relationships allow data elements to be linked together directly and in turn being retrieved with one operation. Research activities on graph databases try to lower the size of the graphs by using techniques like compression. The benefit is to reduce expensive I/O and improve performance by having larger fractions of data in caches. In addition the amount of hardware with storage resources needed is less when storing a graph.
There are a wide variety of different graph databases and processing frameworks available today. Pregel is a system for large-scale graph processing
that was developed at Google and more information can be obtained here. The open-source counterpart used at Facebook is called Apache Giraph that can be downloaded for free here. Another well known graph-based database known from Facebook is called ‘The Associations and Objects’ (TAO). It runs on a large collection of geographically distributed server clusters and is used with thousands of data types. It manages over a billion read requests and millions of write requests every second. The goal is to enable the Facebook social graph traversal along specific social links or networks. An interesting article related to TAO is how Facebook likes are stored that is available here. Other systems are Galois, GraphBLAS, Comb-BLAS, or Green-Marl. Many of these systems use arrays to store graphs. Also Neo4J is an open source graph database that can be downloaded here.
Graph Database Details
We refer to the following video about this subject: