RapidMiner is a broadly used commercial data science platform for data preparation, machine learning, predictive model deployments, and deep learning. The platform consists of the software packages called Studio, Auto Model, Server, and Radoop. It is very extensible through its Marketplace with a wide variety of contributions from data preparation to specific solutions for problems like Web mining or time series analysis. More details about the software can be obtained from this Web page while this article here offers a short overview.
The key tool of the software platform is the Studio tool that is a visual workflow designer GUI that enables an easy prototyping of ideas and quick design of predictive models. More recently, RapidMiner Studio also provides deep learning functionality through its Keras extension that is in turn using the TensorFlow deep learning library backend.
RapidMiner Auto Model
An interesting element of the software platform is the RapidMiner Auto Model tool that enables automated machine learning in order to accelerate the modeling process. The idea is that users specify the type of data and goals of the data analysis such as predictions, cluster identification, or finding outliers. Afterwards the Auto Model software will use certain data science best practices based on these rough specifications without the user to specify details such as concrete models or algorithms. Inherently the software will also identify quality problems in data such as certain correlations or missing values.
The Server component of the platform enables collaboration, computation, and deployments for data analytic teams. One of its features is to share knowledge and best practices across teams within an organization using a centralized repository with access control for specific users. Another feature is an advanced queuing system to manage resources for specific teams or projects.
The Radoop software component enables the parallel execution on nicely parallel infrastructures based on Apache Hadoop and Apache Spark. A visual workflow design tool visually represents certain data process flows of an application that can be executed in parallel. Internally this workflow is executed on the Apache Hadoop infrastructure using Apache Spark without the need to create specific code for it. Specifically interesting is the SparkRM that augments Apache Spark with operations and data process flows from the Studio software package and can thus run in parallel instead of purely serial. It therefore offers a broader set of algorithms than available in Apache Spark MLlib that in turn can also be used if needed.
We recommend the following video in order to obtain more details: