Skip to content

tn3w/IPBlocklist

Repository files navigation

IPBlocklist

Build Intel sources Legacy feeds License

Aggregates IP/ASN threat intelligence into release artifacts.

intel.bin: 164 sources, 20 flags, ~30 MB mmap
0 to 100 maliciousness scoring (consumer-side)

Artifacts

file role
intel.bin primary, columnar mmap, 20-flag bitmask
blocklist.txt scored, CIDR-minimized text for firewalls
asns.json ASN lists keyed by feed name
asn_prefixes.json ASN to announced prefixes (RIPEstat-cached)
blocklist.bin legacy IPBL v2 (scoring + categories)
wget https://github.com/tn3w/IPBlocklist/releases/latest/download/intel.bin

Live demo: ipblocklist.tn3w.dev.

intel.bin

Built by builder/ (Rust) from feeds-intel.json (164 sources). SoA columnar layout, mmap-friendly. No scoring or categories in-file; scoring is consumer-side.

Layout

128-byte header, little-endian throughout:

offset size field
0 u32 version (4)
4 u32 reserved
8 u64 v4_count
16 u64 v6_count
24 u64 val_count
32 u64 str_count
40 u64 v4_starts_off
48 u64 v4_ends_off
56 u64 v4_vals_off
64 u64 v6_starts_off
72 u64 v6_ends_off
80 u64 v6_vals_off
88 u64 val_table_off
96 u64 str_index_off
104 u64 str_data_off
112 u64 str_data_len

Sections:

  • v4_starts, v4_ends: v4_count × u32, sorted by start
  • v4_vals: v4_count × u16 (index into value table)
  • v6_starts, v6_ends: v6_count × u128, sorted
  • v6_vals: v6_count × u16
  • value table: val_count × {flags u32, provider_id u32, source_id u32, _pad u32}
  • string index: str_count × {offset u32, len u32} into str_data
  • string data: raw UTF-8

Flag bits (LSB to MSB): vpn, proxy, tor, malware, c2, scanner, brute_force, spammer, compromised, datacenter, cdn, anycast, crawler, bot, cloud, private_relay, anonymizer, mobile, isp, government.

Lookup: bisect *_starts, scan back while prefix_max(ends)[i] >= ip.

Build

cd builder
cargo build --release
PEERINGDB_API_KEY=... FEEDS_FILE=../feeds-intel.json OUT_FILE=../intel.bin \
  ./target/release/builder update
./target/release/builder check 1.2.3.4

NO_CACHE=1 bypasses HTTP + RIPEstat caches (~/.cache/ipblocklist-builder/).

Source model (feeds-intel.json)

Top-level: { "flags": [...], "feeds": [...] }.

Per-source fields:

  • name (required)
  • flags (required, subset of the 20 canonical flags)
  • url + regex for IP/CIDR sources
  • is_asn: true with asns (static) or url+regex (remote)
  • provider (optional)

Canonical flags: vpn, proxy, tor, malware, c2, scanner, brute_force, spammer, compromised, datacenter, cdn, anycast, crawler, bot, cloud, private_relay, anonymizer, mobile, isp, government.

Python lookup and scoring

lookup_intel.py mmaps intel.bin and emits a 0 to 100 maliciousness score.

python3 lookup_intel.py 185.220.101.1
python3 lookup_intel.py --benchmark

Scoring model

  • Per-flag base severity: malware/c2=95, compromised=75, brute_force=70, spammer=65, scanner=55, tor=45, bot=40, anonymizer=35, vpn=30, proxy=25, private_relay=15, datacenter=15, crawler=10, cloud=10, cdn=5; anycast/mobile/isp/government=0.
  • IDF-style rarity bonus: severity × (1 + log2(1/prevalence) / 24).
  • Top contribution plus 15% of remaining contributions, capping correlated flag double-counting (proxy + anonymizer).
  • Multi-source confidence: × (1 + 0.08 · log2(sources+1)).
  • Capped at 100. Levels: critical >=80, high >=60, medium >=35, low >=15, else minimal.

Validation

20k sampled v4 IPs: mean score tracks top-flag severity monotonically (Spearman 0.94, Pearson 0.83).

top flag severity mean score
c2 95 100.0
malware 95 100.0
compromised 75 100.0
brute_force 70 99.5
spammer 65 99.9
tor 45 91.5
scanner 55 91.2
bot 40 82.3
vpn 30 58.3
anonymizer 35 44.4
datacenter 15 21.4

blocklist.txt

Scored ranges, thresholded, CIDR-promoted, non-routable stripped.

Forms per line: 1.2.3.4, 1.2.3.0/24, 1.2.3.1-1.2.3.254, 2001:db8::1, 2001:db8::/32, 2001:db8::1-2001:db8::ff.

ipset create blocklist hash:net
grep -v '^#' blocklist.txt | xargs -n1 ipset add blocklist

asns.json and asn_prefixes.json

{"datacenter_asns": ["16509", "15169"], "bgptools_c2_asns": ["14618"]}
{"16509": ["192.0.2.0/24"], "15169": ["203.0.113.0/24"]}

Pipeline

flowchart LR
    F[feeds-intel.json] --> B[builder/]
    B --> I[intel.bin]
    A[feeds.json] --> P[aggregator.py]
    P --> T[blocklist.txt]
    P --> J[asns.json]
    P --> AP[asn_prefixes.json]
    P --> L[blocklist.bin]
Loading

aggregator.py resolves ASNs via RIPEstat (cached in asn_prefixes.json), merges overlapping ranges, writes legacy artifacts. Delete the cache to force a full refresh.

Deprecated: blocklist.bin

IPBL v2 self-describing binary with scoring and categories. Kept for the demo page and legacy consumers; new integrations should use intel.bin.

Built from feeds.json (163 feeds).

[4 magic "IPBL"][1 version=2][4 timestamp LE]
[1 flag_count] flag_count × { [1 len][N utf-8] }
[1 cat_count]  cat_count  × { [1 len][N utf-8] }
[2 feed_count LE]
feed_count × {
  [1 len][N name]
  [1 base_score 0-200][1 confidence 0-200]
  [4 flags bitmask LE][1 cats bitmask]
  [4 range_count LE]
  range_count × { [varint start_delta][varint size] }
}

Lookup implementations for 25 languages live in examples/. Python:

python lookup.py 8.8.8.8

License

LICENSE