Search Engine Architecture

The Math Behind the Magic: Vector Space Models in Java

When I built DevShelf, I didn’t want to just “find” strings. I wanted to rank them by relevance. To do this, I implemented the Vector Space Model. The Core Problem A naive search checks if Book.contains("Java"). A real search engine asks: “How relevant is this book to the query ‘Java’ compared to all other books?” To solve this, I engineered the QueryProcessor class to treat every book as a vector in multidimensional space. ...

January 10, 2024 · 1 min · Muhammad Qasim

From O(N) to O(1): Building a Java Search Engine from Scratch

Note: This is an architectural deep-dive into the core indexing engine of DevShelf. The Problem with Linear Search When I started building DevShelf, the naive approach was simple: load all books into a List<Book> and loop through them checking if (book.contains(query)). For 10 books, this is fine. For 1,000 books, it’s slow. For 1,000,000 books, the system crashes. This is an $O(N \cdot M)$ operation, where $N$ is books and $M$ is words. We needed $O(1)$. ...

January 1, 2024 · 12 min · Muhammad Qasim