GPU memory is essential for the understanding why Graphics Processing Units (GPUs) are so successful to tackle big data problems. A more general information about the architecture can be obtained from our article on the GPU Graphics Processing Unit. This article provides more pieces of information about the difference between host memory and GPU device memory and what problem occur when working with both. The key problem is that there is a large overhead when copying parts of the dataset from the host memory to the GPU memory. This overhead needs to be taken into account when programming GPUs since otherwise the GPU code will not be very much faster then the corresponding CPU code. In worst cases the GPU code can maybe even show slower performance when each dataset element is copied after one another instead many of them at once.
One concrete example is when performing big data analytics in the field of deep learning using a tool called Theano. Please refer to our Deep Learning Framework article for more pieces of information about Theano itself. A good technique is to divide the dataset in minibatches used by the (Stochastic) Gradient Descent optimization method of the deep learning tool. For the reason mentioned above it makes sense to store this dataset into shared variables and access them based on the minibatch index and using a fixed batch size. When the data is in Theano shared variables the framework is able to copy the entire data on the GPU in one single call at the time when the shared variables are constructed. With this programming technique the GPU can access any minibatch by taking a defined slice from the shared variables. This in turn means there is no need to copy any data from the CPU memory and thus bypassing the overhead mentioned above.
GPU Memory Details
Have a look on the following video about this topic: