Smart Search Feature Documentation
Version: 1.0 Date: December 23, 2025 Status: ✅ Implemented
Overview
Smart Search is an intelligent company search and data synchronization feature that automatically ensures data freshness when users search for companies. It combines database lookup, data freshness checking, and automatic scraping job triggering.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Header.jsx) │
│ Search Input │
└───────────────────────────┬─────────────────────────────────────┘
│ GET /api/search/smart?q=Audi
▼
┌─────────────────────────────────────────────────────────────────┐
│ Backend API (main.py) │
│ /api/search/smart endpoint │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ SmartSearchService (smart_search.py) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ DB Lookup │ │ Freshness │ │ Auto-trigger Scraping │ │
│ │ (find co.) │ │ Check │ │ (if stale/missing) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Cloud SQL (PostgreSQL) │
│ reviews, company_profiles, scraping_jobs tables │
└─────────────────────────────────────────────────────────────────┘
API Endpoint
GET /api/search/smart
Parameters: | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | q | string | Yes | - | Company name to search for | | country | string | No | "de" | Country code (de, at, ch) | | auto_scrape | bool | No | true | Auto-trigger scraping if stale |
Response:
{
"status": "found|refreshing|not_found|error",
"company_name": "Audi",
"company_slug": "audi",
"data_status": "fresh|stale|missing|unknown",
"review_count": 380,
"last_updated": "2025-12-13",
"scraping_jobs": ["uuid1", "uuid2"],
"message": "Human-readable status message"
}
Status Values:
found- Company exists with fresh datarefreshing- Data is stale; scraping jobs triggerednot_found- Company not in database and not on Kununuerror- An error occurred during search
Data Status Values:
fresh- Data updated within last 7 daysstale- Data older than 7 days or insufficient reviewsmissing- No data exists for this companyunknown- Unable to determine data status
SmartSearchService Methods
search(query, country, auto_scrape)
Main entry point. Searches for a company and returns data status.
_find_company_in_db(query)
Searches the database for matching company by:
- Exact match on company_name
- Fuzzy match using ILIKE
_check_freshness(company_name)
Determines if company data is fresh based on:
- Last review date (< 7 days = fresh)
- Total review count (< 50 = needs more)
- Sentiment analysis coverage
_trigger_scraping(company_name, slug, country)
Creates scraping jobs for:
kununu- Employee reviews from Kununu.comgoogle- Google Maps reviewsreddit- Reddit discussionsvacancies- Job vacancy tracking
Database Tables Used
reviews
| Column | Description |
|---|---|
| review_id | Unique identifier |
| company_name | Company name |
| created_at | When review was collected |
| ai_sentiment_label | Sentiment (positive/neutral/negative) |
company_profiles
| Column | Description |
|---|---|
| company_slug | URL-safe identifier |
| company_name | Display name |
| kununu_slug | Kununu.com URL slug |
scraping_jobs
| Column | Description |
|---|---|
| job_id | UUID |
| company_name | Target company |
| source | kununu/google/reddit/vacancies |
| status | pending/running/completed/failed |
Frontend Integration
Header.jsx
The search input in the dashboard header uses Smart Search:
const handleSearch = async (query) => {
const response = await fetch(
`/api/search/smart?q=${encodeURIComponent(query)}&auto_scrape=true`,
);
const data = await response.json();
if (data.status === "refreshing") {
setSearchStatus("Syncing...");
}
if (data.company_name) {
setSelectedCompany(data.company_name);
}
};
Configuration
Freshness Thresholds
- Max Age: 7 days (data older is considered stale)
- Min Reviews: 50 (fewer triggers refresh)
- Sentiment Coverage: 80% (below triggers analysis)
Scraping Sources
Configurable in _trigger_scraping():
sources = ['kununu', 'google', 'reddit', 'vacancies']
Error Handling
- Database connection errors - Returns
status: "error" - Company not found on Kununu - Returns
status: "not_found" - Slug finder failures - Logs warning, skips job creation
Related Files
| File | Purpose |
|---|---|
backend/services/smart_search.py | Core service logic |
backend/main.py | API endpoint definition |
dashboard_app/src/components/Header.jsx | Frontend integration |
backend/scraping_service.py | Executes triggered jobs |
Future Improvements
- WebSocket notifications - Real-time status updates
- Batch search - Multiple companies at once
- Search history - Recent searches for quick access
- Predictive refresh - Pre-emptive scraping for popular companies
Created: December 23, 2025 | Vartovii Engineering Team