Перейти до основного вмісту

ABSA (Aspect-Based Sentiment Analysis)

Overview

ABSA (Aspect-Based Sentiment Analysis) extracts specific topics/aspects mentioned in employee reviews and their associated sentiment. This enables the "Topics" page in the dashboard to show what employees are talking about (salary, management, work-life balance, etc.) and whether sentiment is positive, neutral, or negative for each topic.

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│ Reviews Table │────▶│ ABSA Analyzer │────▶│ review_aspects │
│ (raw text) │ │ (extraction) │ │ (results) │
└─────────────────┘ └──────────────────┘ └─────────────────┘

┌──────────┴──────────┐
│ │
Rule-based AI-powered
(fast, offline) (accurate, Gemini)

Database Schema

Table: review_aspects

ColumnTypeDescription
idINTEGERPrimary key
review_idVARCHARFK to reviews.review_id
aspectVARCHARTopic category (e.g., "salary and benefits")
sentimentVARCHARpositive, neutral, or negative
confidenceFLOAT0.0-1.0 confidence score
snippetTEXTOptional text snippet

Predefined Aspect Categories

The system recognizes these topics:

  1. salary and benefits - Pay, bonuses, insurance, pension
  2. work-life balance - Hours, overtime, remote work, flexibility
  3. management - Leadership, bosses, supervision
  4. company culture - Atmosphere, values, team spirit
  5. career growth - Promotions, opportunities, development
  6. job security - Stability, layoffs, restructuring
  7. work environment - Office, facilities, equipment
  8. colleagues - Coworkers, team dynamics
  9. training and development - Learning, courses, skills
  10. communication - Transparency, information flow, feedback

Usage

Manual Run

cd /Users/vitaliiradionov/Desktop/Vartovii/backend
source venv/bin/activate

# Analyze specific company
python absa_analyzer.py --company "Audi" --limit 200

# Use rule-based only (faster, no AI costs)
python absa_analyzer.py --company "Audi" --no-ai

# Analyze all companies
python absa_analyzer.py --limit 500

Programmatic Usage

from absa_analyzer import ABSAAnalyzer

analyzer = ABSAAnalyzer()

# With AI (more accurate)
analyzer.analyze(company="Audi", limit=100, use_ai=True)

# Rule-based only (faster)
analyzer.analyze(company="Audi", limit=100, use_ai=False)

Automation

ABSA runs automatically after scraping:

  1. Scraping Service (scraping_service.py) completes a job
  2. Triggers AI sentiment analysis
  3. Triggers ABSA for the company (rule-based, 100 reviews)
  4. Refreshes materialized views

No manual intervention needed for new companies!

API Endpoints

GET /api/aspects

Returns aspect sentiment distribution.

Query Parameters:

  • company - Company name filter (case-insensitive)

Response:

{
"aspects": [
{ "aspect": "salary and benefits", "sentiment": "neutral", "count": 21 },
{ "aspect": "management", "sentiment": "neutral", "count": 18 },
{ "aspect": "work-life balance", "sentiment": "positive", "count": 14 }
]
}

Extraction Methods

1. Rule-Based (Default)

Fast keyword matching:

  • Scans review text for topic keywords
  • Uses simple sentiment word detection
  • ~60-70% accuracy
  • No API costs

2. AI-Powered (Gemini)

Uses Gemini AI for extraction:

  • More accurate context understanding
  • Better sentiment detection
  • ~85-90% accuracy
  • Costs ~$0.001 per review

Dashboard Integration

The Topics page (/app → Topics) displays:

  • Top 10 Discussed Topics - Bar chart
  • Topic Insights - Most positive, most negative, top mentioned
  • Topic Details - Drill-down by company

Troubleshooting

No topics showing

  1. Check if reviews exist:
SELECT COUNT(*) FROM reviews WHERE company_name ILIKE '%CompanyName%';
  1. Check if aspects were extracted:
SELECT COUNT(*) FROM review_aspects ra
JOIN reviews r ON ra.review_id = r.review_id
WHERE r.company_name ILIKE '%CompanyName%';
  1. Run ABSA manually:
python absa_analyzer.py --company "CompanyName" --limit 200

Low accuracy

Switch to AI mode:

python absa_analyzer.py --company "CompanyName"  # AI enabled by default
  • backend/absa_analyzer.py - Main analyzer script
  • backend/api_repositories/aspect_repository.py - Data access layer
  • backend/main.py - API endpoints (/api/aspects)
  • dashboard_app/src/components/TopicAnalysis.jsx - Frontend component

Future Improvements

  1. Multilingual support - German keyword detection
  2. Custom aspects - User-defined topics
  3. Trend analysis - Aspect sentiment over time
  4. Aspect clustering - AI-discovered topics