Big Data

Big Data Tips Machine Learning Mining Tools Analysis Analytics Books Algorithms Classification Clustering Regression Supervised Learning Unsupervised Tool

Some call it just a ‘hype’ but it is important for you to understand big data thus realizing why it is one of the most difficult challenges we face today. In contrast to many thick books that provide good general pointers or very detailed content without videos, this page provide you with concrete big data related examples, videos, and details in order to understand it overall better. Details are also given in order to collect, analyze, and interpret big data.

Some of these deep insights will explain you how data analysis methods and data analytics techniques are both related to big data mining and machine learning methods. The page presents material in a rather relaxed and informal way without omitting important concepts. It demonstrates a wide range of relevant issues and questions that can be addressed with the help of big data analysis, analytics, and tools.

By using our Facebook page the idea of this web site is further to communicate with readers rather than to lecture to them, and its content is to convince readers that the study of the hot topics in big data can be a lively, interesting, and rewarding experience.

Big Data Definition

One interesting article refers to a growing number of Vs: Volume, Velocity, Variety, Variability, Veracity, Visualization, and most notably Value. Today we all know that challenges go beyond just ‘the volume fact’ and therefore the definition above with a growing number of Vs seem to be more accurate and used today.

While a full accurate definition is hard to find common agreement it was originally defined as big data is data that becomes large enough that it can not be processed using traditional methods.

Big Data Challenges

Facebook users generate 100 TB of data every day while YouTube users upload 48 hours of videos every minute on every single day. Estimations reach 2.7 Zettabytes of data is available and keeps constantly growing on an ever increasing pace. There are a couple of challenges that are partly even interconnected with such a large amount of data. Usually a single machine can not process or even store all the big dataset. Another challenge is even if it fits it may take a very long time to process it with machine learning or data mining methods.

So the solution often applied is to distribute the big dataset over a large cluster enabling parallel processing of the data. This in turn works but raise further challenges and difficulties since those technologies are distributed systems. Challenges include how to efficiently split work and analysis tasks across distributed machines. Since moving data over the network is expensive perhaps there is a way to compute rather locally at the location of the dataset. Cluster or network connections may have failures and the data analysis should not be significantly disturbed by that fact.

Big Data Impacts

There are a wide variety of impacts of big data when performing data analysis or data analytics. They mostly are related to memory problems whereby necessary data does not fit into memory. Also often can be observed that big data is problematic for algorithms that would take too long to compute.

One example where big data has an impact is during the training process of a classification algorithm named Support Vector Machines (SVMs). The algorithm often used in working with SVMs is called Sequential Minimal Optimization (SMO). The big data impact of the SMO algorithm used in SVMs is discussed in our big data analysis section.

Another concrete impact of big data is found in discovering frequent itemsets in market basket transactions. This computationally intensive process is called association rule mining and the impact is discussed in our big data analysis section too.

Big Data Shortly Explained

We recommend to view the following video presentation to understand the basics much better:


Follow us on Facebook:

What is big data really? Let's have a look: http://goo.gl/yYbaJv

Posted by big-data.tips on Thursday, March 17, 2016