Big Data Platforms

Below is a list of big data platforms and their interfaces with R.

Hadoop

  • Hadoop (or YARN) - a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models

    • RHadoop - a collection of five R packages that allow users to manage and analyze data with Hadoop, developed by Revolution Analytics

    • RHIPE - an R and Hadoop Integrated Programming Environment

  • Hortonworks HDP

Spark

    • Spark - a fast and general engine for large-scale data processing, which can be 100 times faster than Hadoop

    • SparkR - R frontend for Spark

H2O

    • H2O - an open source in-memory prediction engine for big data science

    • Algorithms provided in H2O:

    • PCA, GBM, deep learning, random forest, BigData RF, GLM, k-means, Naive Bayes, anomaly detection

MongoDB