Skip to main content

Data Harvesters

The Crypto module uses 8 data harvesters to collect real-time information from external APIs.

🏗️ Architecture

┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
│ CoinGecko │ │ GitHub │ │ DefiLlama │ │ DefiLlama │ │ Dropstab │
│ Harvester │ │ Harvester │ │ TVL │ │ Raises │ │ Investors │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Redis Cache │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │coingecko:│ │ github: │ │defillama:│ │ raises: │ │dropstab: │ │
│ │ SLUG │ │ REPO │ │ PROTOCOL │ │ ALL │ │ SLUG │ │
│ │ TTL:24h │ │ TTL:24h │ │ TTL:6h │ │ TTL:24h │ │ TTL:24h │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

1. CoinGecko Harvester

File: coingecko.py

Purpose: Fetches price, market data, and metadata.

Data Collected

FieldDescriptionExample
price_usdCurrent price in USD245.50
market_capTotal market capitalization3,650,000,000
fdvFully Diluted Valuation4,100,000,000
price_change_24h% change in 24 hours-2.5
github_urlMain repository URLgithub.com/aave
homepageProject websiteaave.com

API Details

  • Endpoint: https://api.coingecko.com/api/v3/coins/ID
  • Rate Limit: 50 requests/minute (free tier)
  • Cache TTL: 24 hours
  • Cost: Free

Usage

from harvesters.coingecko import CoinGeckoHarvester

harvester = CoinGeckoHarvester()
data = await harvester.fetch("aave")

# Returns:
{
"price_usd": 245.50,
"market_cap": 3650000000,
"fdv": 4100000000,
"github_url": "https://github.com/aave"
}

Known Issues

Outdated GitHub URLs: CoinGecko sometimes returns old repository URLs. We maintain an override mapping:

GITHUB_OVERRIDES = {
"aave": "https://github.com/aave/aave-v3-core",
"compound": "https://github.com/compound-finance/compound-v2",
}

2. GitHub Harvester

File: github.py

Purpose: Fetches developer activity metrics.

Data Collected

FieldDescriptionExample
commits_30dCommits in last 30 days45
active_devsUnique contributors (30d)12
last_commit_dateMost recent commit2025-12-25
total_starsRepository stars2,500
open_issuesOpen issue count48

API Details

  • Endpoint: https://api.github.com/repos/OWNER/REPO
  • Stats Endpoint: https://api.github.com/repos/OWNER/REPO/stats/participation
  • Rate Limit: 5,000 requests/hour (authenticated)
  • Cache TTL: 24 hours
  • Cost: Free

Async Stats Handling

Large repositories return 202 NO CONTENT initially - stats are computed asynchronously:

async def fetch_stats(self, repo):
response = await self.client.get(f"/repos/REPO/stats/participation")

if response.status_code == 202:
# GitHub is computing stats, retry after delay
await asyncio.sleep(2)
return await self.fetch_stats(repo)

return response.json()

Fallback for Missing Repos

if not github_url or github_url == "":
return {
"commits_30d": 0,
"active_devs": 0,
"error": "No public repository found"
}

3. DefiLlama Harvester

File: defillama.py

Purpose: Fetches DeFi-specific metrics (TVL, treasury).

Data Collected

FieldDescriptionExample
tvlTotal Value Locked (USD)32,400,000,000
chain_tvlsTVL breakdown by chainethereum: 25B, polygon: 5B
categoryDeFi category"Lending"
mcap_to_tvl_ratioMarket Cap / TVL0.09

API Details

  • Endpoint: https://api.llama.fi/protocol/SLUG
  • TVL History: https://api.llama.fi/protocol/SLUG/tvl
  • Rate Limit: Unlimited (no key required)
  • Cache TTL: 6 hours (TVL changes faster)
  • Cost: Free ✅

Slug Mapping

DefiLlama uses different slugs than CoinGecko. We maintain a mapping:

DEFILLAMA_SLUGS = {
"aave": "aave",
"uniswap": "uniswap",
"compound-governance-token": "compound",
}

Example Response

{
"tvl": 32451269361,
"chainTvls": {
"Ethereum": 25000000000,
"Polygon": 5000000000,
"Avalanche": 2451269361
},
"category": "Lending",
"mcap": 2900000000
}

🔄 Harvesting Pipeline

When a project is requested, the CryptoService orchestrates all harvesters:

async def fetch_project(self, slug: str):
# 1. Check cache first
cached = await self.cache.get(f"project:{slug}")
if cached:
return cached

# 2. Fetch from all sources in parallel
coingecko_data, github_data, defillama_data = await asyncio.gather(
self.coingecko.fetch(slug),
self.github.fetch(slug),
self.defillama.fetch(slug)
)

# 3. Merge data
project = self.merge_data(coingecko_data, github_data, defillama_data)

# 4. Calculate Trust Score
project["trust_score"] = self.risk_engine.calculate(project)
project["risk_level"] = self.risk_engine.get_risk_level(project["trust_score"])

# 5. Save to database
await self.db.upsert(project)

# 6. Cache for next request
await self.cache.set(f"project:{slug}", project, ttl=3600)

return project

⚡ Performance

MetricFresh FetchCached
Total Time2-3 secondsunder 50ms
CoinGecko~800ms-
GitHub~1.2s-
DefiLlama~500ms-

Cache Hit Rates

  • CoinGecko: ~90%
  • GitHub: ~85%
  • DefiLlama: ~75%

4. DefiLlama Raises Harvester ✨ NEW

File: defillama_raises.py

Purpose: Fetches fundraising data (rounds, investors, amounts) for the Tokenomics pillar.

Data Collected

FieldDescriptionExample
total_raisedTotal funding in millions USD150.4
funding_roundsList of funding rounds[Seed, Series A, ...]
lead_investorsLead investors per round["a16z", "Paradigm"]
all_investorsAll investors with tiers[{name, tier, weight}]
backer_tier_scoreInvestor quality score (0-100)80

API Details

  • Endpoint: https://api.llama.fi/raises
  • Rate Limit: Unlimited
  • Cache TTL: 24 hours (data rarely changes)
  • Cost: Free ✅
  • Data Size: 6,723 funding rounds, 9,311 investors

VC Tier Classification

The harvester automatically classifies investors into tiers:

VC_TIERS = {
"tier_1": ["a16z", "paradigm", "pantera", "polychain", "coinbase ventures"],
"tier_2": ["delphi digital", "galaxy", "blockchain capital"],
"tier_3": ["dwf labs", "wintermute", "gsr"] # Red flags
}

Backer Score Calculation

score = 50  # Base
score += min(tier_1_count * 10, 30) # Tier 1 bonus (max +30)
score -= min(tier_3_count * 10, 20) # Tier 3 penalty (max -20)
score += min(total_investors * 2, 20) # Investor count bonus
score += min(lead_count * 5, 10) # Lead investor bonus

Example Response

{
"project_name": "Polkadot",
"total_raised": 150.4,
"backer_tier_score": 80,
"funding_rounds": [
{ "round_type": "Private", "amount_raised": 100, "date": "2020-06-01" },
{ "round_type": "Series A", "amount_raised": 50.4, "date": "2021-02-15" }
],
"investors": [
{ "name": "Polychain Capital", "tier": 1, "is_lead": true },
{ "name": "Hashed", "tier": 2, "is_lead": true }
],
"lead_investors": ["Polychain Capital", "Hashed"]
}
Tier 3 Investors If a project is backed by DWF Labs,

Wintermute, or GSR, the backer_tier_score is automatically reduced to flag potential market manipulation concerns. :::


5. Dropstab Harvester (All-in-One) ✨ NEW

File: dropstab.py

Purpose: A powerful aggregator that extracts critical data usually hidden behind paywalls or complex UIs.

Data Collected

FieldSourceDescription
Certik ScoreCertik SkynetSecurity score (0-100) and Tier (AAA-Bronze)
TweetScoutTweetScout.ioCommunity Influence Level (1-5) and Score
Scam AlertDropstab RiskCRITICAL flag if project is a known scam
InvestorsCryptoRank/CrunchbaseList of VCs and fundraising rounds
VestingVestLabToken unlock schedules

Key Logic: The "Dropstab Object"

We extract a massive JSON object embedded in the Dropstab page source (__NEXT_DATA__). This gives us access to their entire partner ecosystem (Certik, TweetScout) without needing separate API keys for each service.


🔄 Harvesting Pipeline

When a project is requested, the CryptoService orchestrates all harvesters:

def _fetch_and_save(self, cur, slug: str):
# 1. Fetch from CoinGecko
cg_data = self.coingecko.get_coin_data(slug)

# 2. Fetch GitHub metrics
gh_data = self.github.get_repo_metrics(cg_data.get("github_org"))

# 3. Fetch DefiLlama TVL
dl_data = self.defillama.get_protocol_data(slug)

# 4. Fetch DefiLlama Raises (fundraising)
raises_data = self.defillama_raises.harvest_project_funding(cg_data.get("name"))

# 5. Merge all data
merged_data = {**cg_data, **gh_data, **dl_data, **raises_data}

# 6. Calculate 6-Pillar Trust Score
score_result = self.risk_engine.calculate_trust_score(merged_data)

# 7. Save to database
cur.execute("INSERT INTO crypto_projects ...")

⚡ Performance

MetricFresh FetchCached
Total Time2-3 seconds< 50ms
CoinGecko~800ms-
GitHub~1.2s-
DefiLlama TVL~500ms-
DefiLlama Raises~100ms*-

*Raises data is bulk-cached (all 6,723 rounds at once).

Cache Hit Rates

HarvesterHit Rate
CoinGecko~90%
GitHub~85%
DefiLlama TVL~75%
DefiLlama Raises~95%*

*Raises cache is shared across all projects.


6. DefiLlama Emissions Harvester ✨ NEW

File: defillama_emissions.py

Purpose: Fetches token vesting/unlock schedules with allocation breakdown.

Data Collected

FieldDescriptionExample
categoriesAllocation by category[Airdrop, Team, Investors...]
unlock_progress_pct% of tokens unlocked65.4%
total_supplyMaximum supply10,000,000,000
next_unlockNext unlock event{date, amount}

API Details

  • Endpoint: https://api.llama.fi/emission/SLUG
  • Rate Limit: Unlimited
  • Cache TTL: 24 hours
  • Cost: Free ✅

Example Output

{
"name": "Arbitrum",
"categories": [
{ "name": "Airdrop", "unlocked": 1162000000 },
{ "name": "Foundation", "unlocked": 750000000 },
{ "name": "Advisors Team OffchainLabs", "unlocked": 2694000000 }
],
"unlock_progress_pct": 100.0,
"total_supply": 10000000000
}

Test Results

TokenCategoriesUnlock Progress
Arbitrum5100%
Aptos4100%
Sui80%
ApeCoin5100%
Coverage Not all tokens have emissions data. For tokens without data

(e.g., Optimism), we gracefully return "not_tracked" status. :::


7. ICODrops Harvester ✨ NEW

File: icodrops.py

Purpose: Fallback vesting data source when DefiLlama Emissions lacks coverage. Extracts detailed token allocation breakdowns from ICODrops SSR pages.

Data Collected

FieldDescriptionExample
vesting_scheduleList of vesting categories[Team, Investors, Airdrop...]
tge_dateToken Generation Event date"2025-04-28"
investorsList of project investors[Paradigm, a16z, ...]
vesting_progressOverall unlock percentage45.5%

Key Features

  • Fallback Integration: Only called when DefiLlama Emissions returns no data
  • days_remaining: Calculates days until each category unlocks
  • locked_value_m: Dollar value of locked tokens per category

Example Output

{
"coin_name": "Monad",
"vesting_schedule": [
{
"category": "Team",
"status": "locked",
"days_remaining": 1453,
"locked_value_m": 606.99
},
{ "category": "Public Sale", "status": "unlocked", "unlock_pct": 100 }
],
"investor_count": 15,
"tge_date": "2025-04-28"
}
Vesting Fallback ICODrops provides detailed vesting breakdowns for new

projects that DefiLlama may not yet track. This ensures comprehensive coverage for recently launched tokens. :::


8. CryptoRank Harvester (Disabled)

File: cryptorank.py

Purpose: Official API for fundraising data (requires paid API key).

Status: Currently disabled due to API cost.


🚀 Future Harvesters (Planned)

LunarCrush Harvester

  • Data: Social metrics, influencer engagement
  • Purpose: Community pillar
  • Status: Planned for Q1 2026

TwitterScore Harvester

  • Data: Follower quality, bot detection
  • Purpose: Community pillar
  • Status: Planned

All 8 harvesters operate on free API tiers with zero cost.