Spark MLlib has a variety of different ML algorithms implemented. Due to limited resources, we will restrain ourselves from using sophisticated methods and use mainly the simplest algorithms. The focus of this lab is pipelines, transformations, estimators, and model selection.
The goal is to implement a naive text indexer and document retriever. These modules are often found in search engines. Conceptually, search engines were the first who tackled the problem of Big Data with the constraint of low latency response. Imagine an average search engine that has millions of documents in its index. Every second it receives hundreds to thousands of queries and requires to produce a list of the most relevant documents at sub-millisecond speed.
The goal of this task is to finalize the implementation of a movie recommendation system. In the process, you will get more experience with programming in Scala and working with RDDs.
Spark Stream creates an abstraction over streams that process stream data almost identically to RDDs or DataFrames. The most common (easy to work with) stream format is Discretized Stream (DStream
). Alternatively, you can convert your stream to Spark DataFrame
and process it using SQL Operations. All the details of how to create a stream object, the list of transformations implemented for streams, ways to convert DStream
to DataFrame
are provided in the official programming guide. Read it through before beginning your work.
Automatic game control generally consists of a loop. It can be a feedback loop or feed forward loop. Components are always the same and include a detector, a comparator and a gain controller.