Data Harvesters

The Crypto module uses 3 data harvesters to collect real-time information from external APIs.

🏗️ Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  CoinGecko   │     │   GitHub     │     │  DefiLlama   │
│  Harvester   │     │  Harvester   │     │  Harvester   │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       ▼                    ▼                    ▼
┌─────────────────────────────────────────────────────────┐
│                     Redis Cache                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │ coingecko:  │  │ github:     │  │ defillama:  │     │
│  │ SLUG        │  │ REPO        │  │ PROTOCOL    │     │
│  │ TTL: 24h    │  │ TTL: 24h    │  │ TTL: 6h     │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└─────────────────────────────────────────────────────────┘

1. CoinGecko Harvester

File: coingecko.py

Purpose: Fetches price, market data, and metadata.

Data Collected

Field	Description	Example
`price_usd`	Current price in USD	245.50
`market_cap`	Total market capitalization	3,650,000,000
`fdv`	Fully Diluted Valuation	4,100,000,000
`price_change_24h`	% change in 24 hours	-2.5
`github_url`	Main repository URL	github.com/aave
`homepage`	Project website	aave.com

API Details

Endpoint: https://api.coingecko.com/api/v3/coins/ID
Rate Limit: 50 requests/minute (free tier)
Cache TTL: 24 hours
Cost: Free

Usage

from harvesters.coingecko import CoinGeckoHarvester

harvester = CoinGeckoHarvester()
data = await harvester.fetch("aave")

# Returns:
{
    "price_usd": 245.50,
    "market_cap": 3650000000,
    "fdv": 4100000000,
    "github_url": "https://github.com/aave"
}

Known Issues

Outdated GitHub URLs: CoinGecko sometimes returns old repository URLs. We maintain an override mapping:

GITHUB_OVERRIDES = {
    "aave": "https://github.com/aave/aave-v3-core",
    "compound": "https://github.com/compound-finance/compound-v2",
}

2. GitHub Harvester

File: github.py

Purpose: Fetches developer activity metrics.

Data Collected

Field	Description	Example
`commits_30d`	Commits in last 30 days	45
`active_devs`	Unique contributors (30d)	12
`last_commit_date`	Most recent commit	2025-12-25
`total_stars`	Repository stars	2,500
`open_issues`	Open issue count	48

API Details

Endpoint: https://api.github.com/repos/OWNER/REPO
Stats Endpoint: https://api.github.com/repos/OWNER/REPO/stats/participation
Rate Limit: 5,000 requests/hour (authenticated)
Cache TTL: 24 hours
Cost: Free

Async Stats Handling

Large repositories return 202 NO CONTENT initially - stats are computed asynchronously:

async def fetch_stats(self, repo):
    response = await self.client.get(f"/repos/REPO/stats/participation")
    
    if response.status_code == 202:
        # GitHub is computing stats, retry after delay
        await asyncio.sleep(2)
        return await self.fetch_stats(repo)
    
    return response.json()

Fallback for Missing Repos

if not github_url or github_url == "":
    return {
        "commits_30d": 0,
        "active_devs": 0,
        "error": "No public repository found"
    }

3. DefiLlama Harvester

File: defillama.py

Purpose: Fetches DeFi-specific metrics (TVL, treasury).

Data Collected

Field	Description	Example
`tvl`	Total Value Locked (USD)	32,400,000,000
`chain_tvls`	TVL breakdown by chain	ethereum: 25B, polygon: 5B
`category`	DeFi category	"Lending"
`mcap_to_tvl_ratio`	Market Cap / TVL	0.09

API Details

Endpoint: https://api.llama.fi/protocol/SLUG
TVL History: https://api.llama.fi/protocol/SLUG/tvl
Rate Limit: Unlimited (no key required)
Cache TTL: 6 hours (TVL changes faster)
Cost: Free ✅

Slug Mapping

DefiLlama uses different slugs than CoinGecko. We maintain a mapping:

DEFILLAMA_SLUGS = {
    "aave": "aave",
    "uniswap": "uniswap",
    "compound-governance-token": "compound",
}

Example Response

{
  "tvl": 32451269361,
  "chainTvls": {
    "Ethereum": 25000000000,
    "Polygon": 5000000000,
    "Avalanche": 2451269361
  },
  "category": "Lending",
  "mcap": 2900000000
}

🔄 Harvesting Pipeline

When a project is requested, the CryptoService orchestrates all harvesters:

async def fetch_project(self, slug: str):
    # 1. Check cache first
    cached = await self.cache.get(f"project:{slug}")
    if cached:
        return cached
    
    # 2. Fetch from all sources in parallel
    coingecko_data, github_data, defillama_data = await asyncio.gather(
        self.coingecko.fetch(slug),
        self.github.fetch(slug),
        self.defillama.fetch(slug)
    )
    
    # 3. Merge data
    project = self.merge_data(coingecko_data, github_data, defillama_data)
    
    # 4. Calculate Trust Score
    project["trust_score"] = self.risk_engine.calculate(project)
    project["risk_level"] = self.risk_engine.get_risk_level(project["trust_score"])
    
    # 5. Save to database
    await self.db.upsert(project)
    
    # 6. Cache for next request
    await self.cache.set(f"project:{slug}", project, ttl=3600)
    
    return project

⚡ Performance

Metric	Fresh Fetch	Cached
Total Time	2-3 seconds	under 50ms
CoinGecko	~800ms	-
GitHub	~1.2s	-
DefiLlama	~500ms	-

Cache Hit Rates

CoinGecko: ~90%
GitHub: ~85%
DefiLlama: ~75%

🚀 Future Harvesters (Planned)

CryptoRank Harvester

Data: Fundraising rounds, VC investors, token unlocks
Method: Web scraping (no free API)
Purpose: Tokenomics pillar

TwitterScore Harvester

Data: Follower quality, bot detection, engagement
Purpose: Community pillar

Discord/Telegram Harvester

Data: Member count, activity, sentiment
Purpose: Community pillar

All harvesters are designed for zero-cost operation using free API tiers.

🏗️ Architecture​

1. CoinGecko Harvester​

Data Collected​

API Details​

Usage​

Known Issues​

2. GitHub Harvester​

Data Collected​

API Details​

Async Stats Handling​

Fallback for Missing Repos​

3. DefiLlama Harvester​

Data Collected​

API Details​

Slug Mapping​

Example Response​

🔄 Harvesting Pipeline​

⚡ Performance​

Cache Hit Rates​

🚀 Future Harvesters (Planned)​

CryptoRank Harvester​

TwitterScore Harvester​

Discord/Telegram Harvester​

🏗️ Architecture

1. CoinGecko Harvester

Data Collected

API Details

Usage

Known Issues

2. GitHub Harvester

Data Collected

API Details

Async Stats Handling

Fallback for Missing Repos

3. DefiLlama Harvester

Data Collected

API Details

Slug Mapping

Example Response

🔄 Harvesting Pipeline

⚡ Performance

Cache Hit Rates

🚀 Future Harvesters (Planned)

CryptoRank Harvester

TwitterScore Harvester

Discord/Telegram Harvester