Movie Ratings

by www.big-data.tips · Published March 26, 2017 · Updated March 27, 2017

Movie ratings refers to data sets whereby users rate the movies they see in order to give advice to other users. They are available as a free dataset from the MovieLens web site that can be found here. There are a different data sets that were collected over various periods of time and all of them vary in size. The below described datasets provided by Grouplens can be downloaded from here.. One application of this dataset is to use it with a recommendation engine.

MovieLens Latest Datasets
This dataset is recommended for education and development and can be downloaded in two versions. These datasets will change over time and are not appropriate for reporting research results. The small dataset consists of 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 700 users. The small dataset can be downloaded here [~1 MB (zip)]. The large dataset consists of 24,000,000 ratings and 670,000 tag applications applied to 40,000 movies by 260,000 users. It furthermore includes tag genome data with 12 million relevance scores across 1,100 tags. The big dataset can be downloaded here [~224 MB (zip)]. We use the Statistical Computing with R tool in order to understand the data structure of the ratings.cvs file. This file is included in both downloads and apart from other like movie named is the key file with the movie ratings. The following R command creates a data frame with the ratings in R from this file.

> ratings <- read.csv(file=../ratings.csv,header=TRUE,sep=",")
> head(ratings,10)
Output:

    userId  movieId rating  timestamp
1        1       31    2.5 1260759144
2        1     1029    3.0 1260759179
3        1     1061    3.0 1260759182
4        1     1129    2.0 1260759185
5        1     1172    4.0 1260759205
6        1     1263    2.0 1260759151
7        1     1287    2.0 1260759187
8        1     1293    2.0 1260759148
9        1     1339    3.5 1260759125
10       1     1343    2.0 1260759131

This initial view on the data tells us that we have four values and the first entry is the header. The dataset thus consists of rows that for a given userid and movieid provide a rating between 1 and 5 and a corresponding timestamp.

Movie ratings details

There is an interesting video in the context of this topic:

Movie Ratings

You may also like...

Subscribe to our Newsletter!

Movie Ratings

Movie ratings details

You may also like...

BDVA – Big Data Value Association

Spark Summit

Big Data Conference 2016

Subscribe to our Newsletter!