Перейти до основного вмісту

Smart Search Feature Documentation

Version: 1.0 Date: December 23, 2025 Status: ✅ Implemented


Overview

Smart Search is an intelligent company search and data synchronization feature that automatically ensures data freshness when users search for companies. It combines database lookup, data freshness checking, and automatic scraping job triggering.


Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Header.jsx) │
│ Search Input │
└───────────────────────────┬─────────────────────────────────────┘
│ GET /api/search/smart?q=Audi

┌─────────────────────────────────────────────────────────────────┐
│ Backend API (main.py) │
│ /api/search/smart endpoint │
└───────────────────────────┬─────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ SmartSearchService (smart_search.py) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ DB Lookup │ │ Freshness │ │ Auto-trigger Scraping │ │
│ │ (find co.) │ │ Check │ │ (if stale/missing) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Cloud SQL (PostgreSQL) │
│ reviews, company_profiles, scraping_jobs tables │
└─────────────────────────────────────────────────────────────────┘

API Endpoint

GET /api/search/smart

Parameters: | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | q | string | Yes | - | Company name to search for | | country | string | No | "de" | Country code (de, at, ch) | | auto_scrape | bool | No | true | Auto-trigger scraping if stale |

Response:

{
"status": "found|refreshing|not_found|error",
"company_name": "Audi",
"company_slug": "audi",
"data_status": "fresh|stale|missing|unknown",
"review_count": 380,
"last_updated": "2025-12-13",
"scraping_jobs": ["uuid1", "uuid2"],
"message": "Human-readable status message"
}

Status Values:

  • found - Company exists with fresh data
  • refreshing - Data is stale; scraping jobs triggered
  • not_found - Company not in database and not on Kununu
  • error - An error occurred during search

Data Status Values:

  • fresh - Data updated within last 7 days
  • stale - Data older than 7 days or insufficient reviews
  • missing - No data exists for this company
  • unknown - Unable to determine data status

SmartSearchService Methods

search(query, country, auto_scrape)

Main entry point. Searches for a company and returns data status.

_find_company_in_db(query)

Searches the database for matching company by:

  1. Exact match on company_name
  2. Fuzzy match using ILIKE

_check_freshness(company_name)

Determines if company data is fresh based on:

  • Last review date (< 7 days = fresh)
  • Total review count (< 50 = needs more)
  • Sentiment analysis coverage

_trigger_scraping(company_name, slug, country)

Creates scraping jobs for:

  • kununu - Employee reviews from Kununu.com
  • google - Google Maps reviews
  • reddit - Reddit discussions
  • vacancies - Job vacancy tracking

Database Tables Used

reviews

ColumnDescription
review_idUnique identifier
company_nameCompany name
created_atWhen review was collected
ai_sentiment_labelSentiment (positive/neutral/negative)

company_profiles

ColumnDescription
company_slugURL-safe identifier
company_nameDisplay name
kununu_slugKununu.com URL slug

scraping_jobs

ColumnDescription
job_idUUID
company_nameTarget company
sourcekununu/google/reddit/vacancies
statuspending/running/completed/failed

Frontend Integration

Header.jsx

The search input in the dashboard header uses Smart Search:

const handleSearch = async (query) => {
const response = await fetch(
`/api/search/smart?q=${encodeURIComponent(query)}&auto_scrape=true`,
);
const data = await response.json();

if (data.status === "refreshing") {
setSearchStatus("Syncing...");
}

if (data.company_name) {
setSelectedCompany(data.company_name);
}
};

Configuration

Freshness Thresholds

  • Max Age: 7 days (data older is considered stale)
  • Min Reviews: 50 (fewer triggers refresh)
  • Sentiment Coverage: 80% (below triggers analysis)

Scraping Sources

Configurable in _trigger_scraping():

sources = ['kununu', 'google', 'reddit', 'vacancies']

Error Handling

  1. Database connection errors - Returns status: "error"
  2. Company not found on Kununu - Returns status: "not_found"
  3. Slug finder failures - Logs warning, skips job creation

FilePurpose
backend/services/smart_search.pyCore service logic
backend/main.pyAPI endpoint definition
dashboard_app/src/components/Header.jsxFrontend integration
backend/scraping_service.pyExecutes triggered jobs

Future Improvements

  1. WebSocket notifications - Real-time status updates
  2. Batch search - Multiple companies at once
  3. Search history - Recent searches for quick access
  4. Predictive refresh - Pre-emptive scraping for popular companies

Created: December 23, 2025 | Vartovii Engineering Team