DevShelf — Search Engine from First Principles (Java)

Overview

DevShelf is a classical information retrieval system designed to demonstrate how modern search engines work beneath abstraction layers like Lucene or ElasticSearch.

It serves as the foundational counterpart to my RAG work — focusing on lexical retrieval, indexing theory, and ranking mechanics rather than language models.

DevShelf indexes and ranks Computer Science literature using offline preprocessing and deterministic scoring, prioritizing predictability, explainability, and performance.

⬇️ Download

DevShelf is distributed as a self-contained desktop application for Windows.

Windows Installer

64-bit installer (Windows 10 / 11)
No external dependencies required

️Download (Windows)

Latest stable release · Windows 10 / 11 · 64-bit

DevShelf is designed as a read-only search system.
All indexing is performed offline; the runtime application only performs in-memory querying.

⚙️ System Architecture

DevShelf follows a split execution model that separates heavy computation from query-time execution.

Architectural Layers

Offline Indexing Layer (IndexerMain)
- Corpus traversal
- Text normalization
- Index construction
Online Query Layer (BookSearchEngine)
- In-memory retrieval
- Sub-millisecond response times
- Deterministic ranking

This separation mirrors how real-world search engines maintain low latency at scale.

Architecture Diagram

DevShelf System Architecture

🏗️ Offline Indexing

Index construction is treated as a batch operation to remove expensive computation from the runtime path.

Text Processing
Tokenization, stop-word removal, and stemming via a custom preprocessing pipeline.
Primary Data Structure
A Positional Inverted Index, serialized for fast keyword-to-document lookup.
Design Goal
Shift complexity out of the query path to guarantee predictable performance.

🔎 Query Processing

At runtime, DevShelf executes a multi-stage retrieval pipeline:

Lexical Retrieval
Candidate documents are retrieved directly from the inverted index.
Fuzzy Matching
Typographical errors are handled using Levenshtein Distance–based correction.
Autocomplete
Query suggestions are generated using a Trie (Prefix Tree) with O(L) lookup complexity.

🧠 Hybrid Ranking Strategy

Document relevance is computed using a weighted scoring model that blends lexical relevance with user behavior.

Conceptual Scoring Breakdown:

Signal	Weight
TF-IDF	0.6
Popularity	0.2
User Rating	0.2

This approach demonstrates how classical IR systems evolve beyond pure keyword matching.

👥 Engineering Ownership

DevShelf was built as a focused systems project with clear ownership boundaries.

Role	Engineer	Scope
Lead Architect	Muhammad Qasim	Search engine design, indexing algorithms, ranking logic
Frontend Engineer	Nancy Chawla	JavaFX UI, application flow
Feature Engineer	Ritika Lund	Recommendations, filtering, sorting

Positioning Within My Work

DevShelf represents my foundation in classical search systems:

Inverted indices
Vector space models
Ranking theory
Query optimization

These principles directly inform my work on modern Retrieval-Augmented Generation systems, where retrieval quality determines downstream LLM accuracy.

MQNotebook — Enterprise-Grade RAG System
https://kas-sim.github.io/systems/mqnotebook/

Documentation

For detailed implementation notes, algorithms, and design rationale:

https://github.com/Kas-sim/DevShelf/tree/main/documentation

Overview#

⬇️ Download#

Windows Installer#

⚙️ System Architecture#

Architectural Layers#

Architecture Diagram#

🏗️ Offline Indexing#

🔎 Query Processing#

🧠 Hybrid Ranking Strategy#

👥 Engineering Ownership#

Positioning Within My Work#

Related System#

Documentation#