← Back to all projects
Archivarius: Intelligent File Librarian
2024-2026
AIVLMLLMSemantic SearchArchive AnalysisComputer VisionMetadataDeduplicationPython
Archivarius is an intelligent platform designed to transform chaotic file archives into structured, searchable knowledge bases. Using a hybrid approach of fast heuristics and multimodal AI (VLM/LLM), it 'sees' photos, 'watches' videos, and 'reads' documents to determine their value and context.
Core Capabilities:
-
Multimodal AI Analysis:
- Vision: Uses VLM (Vision Language Models) to generate detailed natural language descriptions of images.
- Video: Automatically extracts keyframes and creates collages for AI-driven content classification (detecting drone footage, dashcams, etc.).
- Books & Docs: Full support for DJVU, FB2, EPUB, and PDF, including metadata extraction and semantic content summaries.
- RAW Support: Native processing of Nikon NEF files using embedded preview extraction.
-
Intelligent Quality Control (QA):
- Automatic detection of out-of-focus (blurry) shots using Laplacian variance.
- Exposure analysis to identify underexposed or overexposed frames.
- 'Junk' detection to filter out lens cap shots, accidental pocket photos, or empty textures.
-
Semantic Search & UI:
- Natural Language Search: A dedicated Web UI section for finding files by meaning (e.g., 'mountain landscapes' or 'utility bills') using vector embeddings (ChromaDB).
- Interactive Dashboard: Real-time statistics on archive health, value density scores, and processing status.
-
Advanced Deduplication: Identifies exact duplicates across different drives and directories, providing a primary-version strategy to reclaim storage space efficiently.
Archivarius acts as a digital archaeologist, uncovering forgotten value in terabytes of data while automating the tedious task of manual sorting.