Skip to main content

Data Collection

How Vartovii collects and processes data from multiple sources.

📊 Data Sources

SourceData TypeMethodUpdate Frequency
KununuEmployee reviewsPlatform IntegrationOn-demand
GoogleBusiness reviewsPlaces APIOn-demand
RedditDiscussionsReddit API (PRAW)On-demand
SerpAPIJob vacanciesAPIOn-demand

🔧 Collection Architecture

User Request → Smart Search → Collection Queue → 4 Parallel Jobs → Database

┌─────────┴─────────┐
↓ ↓ ↓ ↓
Kununu Google Reddit Jobs
↓ ↓ ↓ ↓
└─────────┬─────────┘

Sentiment Analysis (AI)

Topic Extraction (ABSA)

Trust Score Calculation

Data Points

1. Employee Reviews (Kununu)

FieldDescription
Review textFull employee feedback
Rating1-5 star score
Pros/ConsStructured feedback
Job rolePosition title
DateReview timestamp

2. Customer Reviews (Google)

FieldDescription
Rating1-5 stars
Review textCustomer feedback
AuthorReviewer name
DateReview timestamp

3. Community Discussions (Reddit)

Uses official Reddit API (PRAW) with OAuth2.

FieldDescription
Post titleDiscussion topic
ContentPost body + top comments
ScoreCommunity upvotes
SubredditSource community

Target Communities:

  • r/jobs, r/careerguidance
  • r/cscareerquestions
  • Industry-specific subreddits

4. Job Vacancies

FieldDescription
TitleJob position
LocationOffice location
SalaryRange if available
Posted dateListing date

📦 Smart Search API

One-click analysis from all sources:

POST /api/search/magic-search
{
"company_name": "BMW",
"country": "de"
}

Response:

{
"status": "analysis_started",
"jobs_started": 4,
"message": "🚀 Analysis started!"
}

🔄 Job Statuses

StatusMeaning
pendingIn queue
runningCurrently collecting
completedFinished
failedError occurred
cancelledManually stopped

⚡ Post-Processing Pipeline

After collection completes:

  1. Deduplication - Remove duplicate entries
  2. Sentiment Analysis - AI categorization (Gemini 2.5)
  3. Topic Extraction - ABSA for aspect analysis
  4. Trust Score - Recalculate company score
  5. Views Refresh - Update materialized views

📋 Data Usage

  • All data is from public sources
  • Employee reviews are anonymous on source platforms
  • We comply with each platform's Terms of Service
  • Data is used for aggregate analysis only

Data collection is triggered on-demand via Smart Search or API.