MAFIA stands for maximal frequent itemset algorithm and is a relatively new mining algorithm for big data initially published around 2005. It is an algorithm for mining maximal frequent itemsets in transaction datasets. The key benefit is its efficiency when itemsets in the transactions are very long. It uses innovative pruning and compression techniques and is one of the fastest published algorithm for mining long itemsets and outperforms previous work by up to an order of magnitude. A more general introduction of mining frequent itemsets can be found in our article Association rules.
A serial implementation of the MAFIA algorithm is available as part of the open source project “Himalaya Data Mining Tools” and can be downloaded at SourceForge here. There is not implementation in the Python-based scikit-learn tool available. There seem to be also no parallel and scalable MAFIA implementation available. In January 2017 there is no parallel implementation of the MAFIA algorithm in the Apache Spark MLlib available.
One key research article is named “MAFIA: A Maximal Frequent Itemset Algorithm”. It can be found here and was published in 2005 in IEEE Transactions on Knowledge and Data Engineering. It covers the basics of the itemset mining algorithm and describes experiments that show that the MAFIA algorithm performs best when mining long itemsets and outperforms other algorithms on dense data by a factor of three to 30. It is usable with transactional databases or more generally transaction datasets. It can be referenced as follows:
Doug Burdick, Johannes Gehrke, Jason Flannick, Tomi Yiu, Manuel Calimlim, "MAFIA: A Maximal Frequent Itemset Algorithm", IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. , pp. 1490-1504, November 2005, doi:10.1109/TKDE.2005.183
More about MAFIA Algorithm
We recommend to watch the following video: