Data Collection
How Vartovii collects and processes data from multiple sources.
📊 Data Sources
| Source | Data Type | Method | Update Frequency |
|---|---|---|---|
| Kununu | Employee reviews | Platform Integration | On-demand |
| Business reviews | Places API | On-demand | |
| Discussions | Reddit API (PRAW) | On-demand | |
| SerpAPI | Job vacancies | API | On-demand |
🔧 Collection Architecture
User Request → Smart Search → Collection Queue → 4 Parallel Jobs → Database
↓
┌─────────┴─────────┐
↓ ↓ ↓ ↓
Kununu Google Reddit Jobs
↓ ↓ ↓ ↓
└─────────┬─────────┘
↓
Sentiment Analysis (AI)
↓
Topic Extraction (ABSA)
↓
Trust Score Calculation
Data Points
1. Employee Reviews (Kununu)
| Field | Description |
|---|---|
| Review text | Full employee feedback |
| Rating | 1-5 star score |
| Pros/Cons | Structured feedback |
| Job role | Position title |
| Date | Review timestamp |
2. Customer Reviews (Google)
| Field | Description |
|---|---|
| Rating | 1-5 stars |
| Review text | Customer feedback |
| Author | Reviewer name |
| Date | Review timestamp |
3. Community Discussions (Reddit)
Uses official Reddit API (PRAW) with OAuth2.
| Field | Description |
|---|---|
| Post title | Discussion topic |
| Content | Post body + top comments |
| Score | Community upvotes |
| Subreddit | Source community |
Target Communities:
- r/jobs, r/careerguidance
- r/cscareerquestions
- Industry-specific subreddits
4. Job Vacancies
| Field | Description |
|---|---|
| Title | Job position |
| Location | Office location |
| Salary | Range if available |
| Posted date | Listing date |
📦 Smart Search API
One-click analysis from all sources:
POST /api/search/magic-search
{
"company_name": "BMW",
"country": "de"
}
Response:
{
"status": "analysis_started",
"jobs_started": 4,
"message": "🚀 Analysis started!"
}
🔄 Job Statuses
| Status | Meaning |
|---|---|
pending | In queue |
running | Currently collecting |
completed | Finished |
failed | Error occurred |
cancelled | Manually stopped |
⚡ Post-Processing Pipeline
After collection completes:
- Deduplication - Remove duplicate entries
- Sentiment Analysis - AI categorization (Gemini 2.5)
- Topic Extraction - ABSA for aspect analysis
- Trust Score - Recalculate company score
- Views Refresh - Update materialized views
📋 Data Usage
- All data is from public sources
- Employee reviews are anonymous on source platforms
- We comply with each platform's Terms of Service
- Data is used for aggregate analysis only
Data collection is triggered on-demand via Smart Search or API.