The big data includes many libraries and frameworks that interoperate with each other.Libraries usually provide solutions to specific problems; for instance, applying neural-network methods on your data. Frameworks integrate various libraries to provide even more functionality. Here are a few examples:
- Frameworks: Hadoop Ecosystem, Apache Spark, Apache Storm, Apache Pig, Facebook Presto
- Patterns: MapReduce, Actor Model, Data Pipeline
- Platforms: Cloudera, Pivotal, Amazon Redshift, Hortonworks, IBM, Google Compute Engine
The Hadoop ecosystem is complemented and surrounded by many different tools. Some of them are covered in our series of courses, such as:
- Apache Mahout: a scalable machine learning and data mining library
- Apache Pig: a high-level data-flow language and execution framework for parallel computation
- Apache Spark: a fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including Extract, Transform and Load (ETL), machine learning, stream processing, and graph computation.
Where to go for information
The company website is a good place to start when you need detailed information about a particular tool. The websites we’ve listed below provide user documentation and other sources of support. We’ve also compiled a list of recommended books for you. These are books you’re likely to find on a big data ninja’s bookshelf that you might like to borrow from your library.