Learning to Key-Value-Store from Scratch

In the life of every Machine Learning researcher, there is a moment when datasets get out of hand. For me, this moment appeared when I have accumulated about 70gb of text data and needed a way to access it efficiently. Especially since you cannot fit this much into your regular computer memory. When providing data for training an ML model, it is easy to stream the data from the disk. However, more advanced operations, like sorting texts by length or querying sentences with specific words, are not available. 

Graph Databases

The need for data management exists since the dawn of computers. Over the years the need to manage larger and larger amounts of data has emerged. Relational Database Management Systems are mature instruments that can manage millions of records that are efficient in terms of storage. However, SQL complacent databases fall short when the number of records increases dramatically. Historically, data management solutions for Big Data were restrictive and did not implement all SQL functionality. Especially, this concerns joins, which require a lot of storage and computational resources. NoSQL solutions allow people to use the data model that suits their needs more than traditional RDBMS by dropping the requirement for SQL support.

Scalable ML with Spark

Epinions.com was an online marketplace where users posted customer reviews for the goods. When reading each other’s comments, they could trust each other. The resulting social network can be represented as a graph where each node represents a user and an oriented edge represents trust. We would like to create a recommendation system on this graph. For each user (node), we want to recommend 10 other users (nodes) to trust.

Pagination


© 2022. Vitaly Romanov

Powered by Hydejack v8.1.1