Skip to main content

Data Harvesters

The Crypto module uses 3 data harvesters to collect real-time information from external APIs.

🏗️ Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ CoinGecko │ │ GitHub │ │ DefiLlama │
│ Harvester │ │ Harvester │ │ Harvester │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────┐
│ Redis Cache │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ coingecko: │ │ github: │ │ defillama: │ │
│ │ SLUG │ │ REPO │ │ PROTOCOL │ │
│ │ TTL: 24h │ │ TTL: 24h │ │ TTL: 6h │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────┘

1. CoinGecko Harvester

File: coingecko.py

Purpose: Fetches price, market data, and metadata.

Data Collected

FieldDescriptionExample
price_usdCurrent price in USD245.50
market_capTotal market capitalization3,650,000,000
fdvFully Diluted Valuation4,100,000,000
price_change_24h% change in 24 hours-2.5
github_urlMain repository URLgithub.com/aave
homepageProject websiteaave.com

API Details

  • Endpoint: https://api.coingecko.com/api/v3/coins/ID
  • Rate Limit: 50 requests/minute (free tier)
  • Cache TTL: 24 hours
  • Cost: Free

Usage

from harvesters.coingecko import CoinGeckoHarvester

harvester = CoinGeckoHarvester()
data = await harvester.fetch("aave")

# Returns:
{
"price_usd": 245.50,
"market_cap": 3650000000,
"fdv": 4100000000,
"github_url": "https://github.com/aave"
}

Known Issues

Outdated GitHub URLs: CoinGecko sometimes returns old repository URLs. We maintain an override mapping:

GITHUB_OVERRIDES = {
"aave": "https://github.com/aave/aave-v3-core",
"compound": "https://github.com/compound-finance/compound-v2",
}

2. GitHub Harvester

File: github.py

Purpose: Fetches developer activity metrics.

Data Collected

FieldDescriptionExample
commits_30dCommits in last 30 days45
active_devsUnique contributors (30d)12
last_commit_dateMost recent commit2025-12-25
total_starsRepository stars2,500
open_issuesOpen issue count48

API Details

  • Endpoint: https://api.github.com/repos/OWNER/REPO
  • Stats Endpoint: https://api.github.com/repos/OWNER/REPO/stats/participation
  • Rate Limit: 5,000 requests/hour (authenticated)
  • Cache TTL: 24 hours
  • Cost: Free

Async Stats Handling

Large repositories return 202 NO CONTENT initially - stats are computed asynchronously:

async def fetch_stats(self, repo):
response = await self.client.get(f"/repos/REPO/stats/participation")

if response.status_code == 202:
# GitHub is computing stats, retry after delay
await asyncio.sleep(2)
return await self.fetch_stats(repo)

return response.json()

Fallback for Missing Repos

if not github_url or github_url == "":
return {
"commits_30d": 0,
"active_devs": 0,
"error": "No public repository found"
}

3. DefiLlama Harvester

File: defillama.py

Purpose: Fetches DeFi-specific metrics (TVL, treasury).

Data Collected

FieldDescriptionExample
tvlTotal Value Locked (USD)32,400,000,000
chain_tvlsTVL breakdown by chainethereum: 25B, polygon: 5B
categoryDeFi category"Lending"
mcap_to_tvl_ratioMarket Cap / TVL0.09

API Details

  • Endpoint: https://api.llama.fi/protocol/SLUG
  • TVL History: https://api.llama.fi/protocol/SLUG/tvl
  • Rate Limit: Unlimited (no key required)
  • Cache TTL: 6 hours (TVL changes faster)
  • Cost: Free ✅

Slug Mapping

DefiLlama uses different slugs than CoinGecko. We maintain a mapping:

DEFILLAMA_SLUGS = {
"aave": "aave",
"uniswap": "uniswap",
"compound-governance-token": "compound",
}

Example Response

{
"tvl": 32451269361,
"chainTvls": {
"Ethereum": 25000000000,
"Polygon": 5000000000,
"Avalanche": 2451269361
},
"category": "Lending",
"mcap": 2900000000
}

🔄 Harvesting Pipeline

When a project is requested, the CryptoService orchestrates all harvesters:

async def fetch_project(self, slug: str):
# 1. Check cache first
cached = await self.cache.get(f"project:{slug}")
if cached:
return cached

# 2. Fetch from all sources in parallel
coingecko_data, github_data, defillama_data = await asyncio.gather(
self.coingecko.fetch(slug),
self.github.fetch(slug),
self.defillama.fetch(slug)
)

# 3. Merge data
project = self.merge_data(coingecko_data, github_data, defillama_data)

# 4. Calculate Trust Score
project["trust_score"] = self.risk_engine.calculate(project)
project["risk_level"] = self.risk_engine.get_risk_level(project["trust_score"])

# 5. Save to database
await self.db.upsert(project)

# 6. Cache for next request
await self.cache.set(f"project:{slug}", project, ttl=3600)

return project

⚡ Performance

MetricFresh FetchCached
Total Time2-3 secondsunder 50ms
CoinGecko~800ms-
GitHub~1.2s-
DefiLlama~500ms-

Cache Hit Rates

  • CoinGecko: ~90%
  • GitHub: ~85%
  • DefiLlama: ~75%

🚀 Future Harvesters (Planned)

CryptoRank Harvester

  • Data: Fundraising rounds, VC investors, token unlocks
  • Method: Web scraping (no free API)
  • Purpose: Tokenomics pillar

TwitterScore Harvester

  • Data: Follower quality, bot detection, engagement
  • Purpose: Community pillar

Discord/Telegram Harvester

  • Data: Member count, activity, sentiment
  • Purpose: Community pillar

All harvesters are designed for zero-cost operation using free API tiers.