Movie Ratings
Movie ratings refers to data sets whereby users rate the movies they see in order to give advice to other users. They are available as a free dataset from the MovieLens web site that can be found here. There are a different data sets that were collected over various periods of time and all of them vary in size. The below described datasets provided by Grouplens can be downloaded from here.. One application of this dataset is to use it with a recommendation engine.
MovieLens Latest Datasets
This dataset is recommended for education and development and can be downloaded in two versions. These datasets will change over time and are not appropriate for reporting research results. The small dataset consists of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users. The small dataset can be downloaded here [~1 MB (zip)]. The large dataset consists of 24,000,000 ratings and 670,000 tag applications applied to 40,000 movies by 260,000 users. It furthermore includes tag genome data with 12 million relevance scores across 1,100 tags. The big dataset can be downloaded here [~224 MB (zip)]. We use the Statistical Computing with R tool in order to understand the data structure of the ratings.cvs file. This file is included in both downloads and apart from other like movie named is the key file with the movie ratings. The following R command creates a data frame with the ratings in R from this file.
> ratings <- read.csv(file=../ratings.csv,header=TRUE,sep=",")
> head(ratings,10)
Output:
userId movieId rating timestamp 1 1 31 2.5 1260759144 2 1 1029 3.0 1260759179 3 1 1061 3.0 1260759182 4 1 1129 2.0 1260759185 5 1 1172 4.0 1260759205 6 1 1263 2.0 1260759151 7 1 1287 2.0 1260759187 8 1 1293 2.0 1260759148 9 1 1339 3.5 1260759125 10 1 1343 2.0 1260759131
This initial view on the data tells us that we have four values and the first entry is the header. The dataset thus consists of rows that for a given userid and movieid provide a rating between 1 and 5 and a corresponding timestamp.
Movie ratings details
There is an interesting video in the context of this topic: