Overview

DevShelf is a classical information retrieval system designed to demonstrate how modern search engines work beneath abstraction layers like Lucene or ElasticSearch.

It serves as the foundational counterpart to my RAG work — focusing on lexical retrieval, indexing theory, and ranking mechanics rather than language models.

DevShelf indexes and ranks Computer Science literature using offline preprocessing and deterministic scoring, prioritizing predictability, explainability, and performance.


⬇️ Download

DevShelf is distributed as a self-contained desktop application for Windows.

Windows Installer

  • 64-bit installer (Windows 10 / 11)
  • No external dependencies required
️Download (Windows)

Latest stable release · Windows 10 / 11 · 64-bit

DevShelf is designed as a read-only search system.
All indexing is performed offline; the runtime application only performs in-memory querying.


⚙️ System Architecture

DevShelf follows a split execution model that separates heavy computation from query-time execution.

Architectural Layers

  1. Offline Indexing Layer (IndexerMain)

    • Corpus traversal
    • Text normalization
    • Index construction
  2. Online Query Layer (BookSearchEngine)

    • In-memory retrieval
    • Sub-millisecond response times
    • Deterministic ranking

This separation mirrors how real-world search engines maintain low latency at scale.


Architecture Diagram

DevShelf System Architecture


🏗️ Offline Indexing

Index construction is treated as a batch operation to remove expensive computation from the runtime path.

  • Text Processing
    Tokenization, stop-word removal, and stemming via a custom preprocessing pipeline.

  • Primary Data Structure
    A Positional Inverted Index, serialized for fast keyword-to-document lookup.

  • Design Goal
    Shift complexity out of the query path to guarantee predictable performance.


🔎 Query Processing

At runtime, DevShelf executes a multi-stage retrieval pipeline:

  • Lexical Retrieval
    Candidate documents are retrieved directly from the inverted index.

  • Fuzzy Matching
    Typographical errors are handled using Levenshtein Distance–based correction.

  • Autocomplete
    Query suggestions are generated using a Trie (Prefix Tree) with O(L) lookup complexity.


🧠 Hybrid Ranking Strategy

Document relevance is computed using a weighted scoring model that blends lexical relevance with user behavior.

Conceptual Scoring Breakdown:

SignalWeight
TF-IDF0.6
Popularity0.2
User Rating0.2

This approach demonstrates how classical IR systems evolve beyond pure keyword matching.


👥 Engineering Ownership

DevShelf was built as a focused systems project with clear ownership boundaries.

RoleEngineerScope
Lead ArchitectMuhammad QasimSearch engine design, indexing algorithms, ranking logic
Frontend EngineerNancy ChawlaJavaFX UI, application flow
Feature EngineerRitika LundRecommendations, filtering, sorting

Positioning Within My Work

DevShelf represents my foundation in classical search systems:

  • Inverted indices
  • Vector space models
  • Ranking theory
  • Query optimization

These principles directly inform my work on modern Retrieval-Augmented Generation systems, where retrieval quality determines downstream LLM accuracy.



Documentation

For detailed implementation notes, algorithms, and design rationale:

https://github.com/Kas-sim/DevShelf/tree/main/documentation