⚠️ Draft Standards Track: ERC

ERC-8117: Anti-Poisoning Compact EVM Address Format

Subscript (0x0₈) and ASCII (0x0(8)) notation for leading-zero EVM addresses to prevent address-poisoning by surfacing the unique suffix.

Authors	Zainan Victor Zhou (@xinbenlv)
Created	2025-12-30
Discussion Link	https://ethereum-magicians.org/t/erc-8117-compressed-display-format-for-evm-addresses/27360
Requires	EIP-55, EIP-1191

Abstract
Motivation
Specification
Rationale
Backwards Compatibility
Reference Implementation
Security Considerations
Copyright

Abstract

This ERC proposes a standard presentation-layer transformation whose primary purpose is to prevent address-poisoning attacks. Addresses with long leading zeros — increasingly common due to gas-optimization techniques and CREATE2 address mining — are a favoured target: the human eye skips over a monotonous run of zeros, allowing an attacker to substitute a crafted address that shares only the first few and last few visible characters. By condensing the low-entropy zero prefix into a compact subscript count, this standard forces the high-entropy suffix into immediate visual prominence, making poisoned addresses far easier to detect. It defines two display formats:

Subscript Notation (Unicode) — for wallets, block explorers, and mobile apps: 0x0₈abcd…1234
Parenthesis Notation (ASCII fallback) — for logs, terminals, and legacy APIs: 0x0(8)abcd…1234

In both formats the subscript or parenthesised integer n encodes the total count of leading zero nibbles present in the full address after the 0x prefix. This standard is a purely visual UX transformation; it does not modify the underlying address data and is fully compatible with EIP-55 checksumming.

Motivation

Address Poisoning Attacks: Address poisoning exploits the fact that wallets typically display addresses in truncated form (0xd28b…6922), so users learn to verify only the first few and last few visible characters. An attacker can generate a lookalike address matching those characters in seconds on commodity hardware:

Real:     0xd28bE19170C22Bf3a5eD2E46890C9F394e3c6922
Attacker: 0xd28ba7F38c1D5e9B3f2A6c8E0d7b1F4a9C3e6922
            ^^^^                                ^^^^
            match  ←  32 completely different  →  match

The attack proceeds in four steps: (1) the attacker watches the victim’s transaction history; (2) generates a lookalike address matching the target’s prefix and suffix; (3) sends a zero-value transaction from that spoofed address, planting it in the victim’s wallet history; (4) the victim copies the wrong address from their history and sends funds to the attacker. A single such attack has resulted in losses of $68M; aggregate losses across the ecosystem run into the hundreds of millions.

Addresses with Long Leading Zeros and Exponential Cost: Major protocols defend against address poisoning by deliberately mining addresses with long leading zeros. To poison such an address, an attacker must match all leading zeros plus the suffix — a task whose difficulty scales as $16^n$ per additional zero:

Protocol	Address	Leading zeros	Approx. poisoning cost (consumer GPU)
ERC-4337 EntryPoint	`0x0000000071727De22E5E9d8BAf0edAc6f37da032`	7	Hours
Namefi NFT	`0x0000000000cf80E7Cf8Fa4480907f692177f8e06`	10	Days
Uniswap V4 PoolManager	`0x000000000004444c5dc75cB358380D2e3dE08A90`	11	~42 days
ENS Public Resolver	`0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e`	11	~42 days
Seaport	`0x0000000000000068F116a894984e2DB1123eB395`	14	Years

Addresses with long leading zeros make poisoning economically irrational — but only if users can efficiently verify the zero count. Counting eleven zeros in 0x00000000000C2E… by eye is error-prone and tedious; 0x0₁₁C2E… communicates it instantaneously. This ERC standardises that notation.

CREATE2 and Deliberate Address Mining: Factories and developers frequently mine addresses with long leading zeros for branding, recognisability, or on-chain identification (e.g., 0x0000000071727De22E5E9d8BAf0edAc6f37da032, the ERC-4337 EntryPoint).

Gas Optimization (EIP-7939): Proposals like EIP-7939 reduce calldata gas costs proportional to leading zero bytes, creating a strong economic incentive to deploy contracts at addresses with large zero prefixes.

Industry Adoption Gap: Leading block explorers — Etherscan, PolygonScan, BscScan, and DexScreener — already display token balances with subscript notation for leading decimal zeros (e.g., 0.0₆9 for very small token amounts). Extending this well-understood UX pattern to hexadecimal addresses gives users a familiar, immediately interpretable signal.

Specification

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

The compression transformation applies only to the visual representation of the address.

1. Trigger Condition

The notation SHOULD be applied when an EVM address has $n$ consecutive leading zero nibbles immediately after the 0x prefix, where $n \geq 4$.

The value $n$ is the total count of leading zero nibbles — it includes every 0 nibble from the start of the address body up to (but not including) the first non-zero nibble.

Scope: This ERC deliberately limits its scope to leading zeros in order to focus on the primary security concern: address poisoning. Non-leading repeated sequences (trailing, middle) and non-zero repeated characters are out of scope. However, the industry is RECOMMENDED to follow the same subscript/parenthesis convention defined here should they choose to handle those cases.

2. Formatting Rules

This standard defines two display modes. Both retain the literal 0x0 prefix and encode the total leading-zero count $n$ compactly after it, followed by the remainder of the address (starting at the first non-zero nibble).

Mode A: Subscript Notation (UI/Frontend)

Recommended for: Wallets, block explorers (Etherscan, PolygonScan, BscScan), mobile apps, and any environment that can render Unicode.

Syntax: 0x0 + [subscript_n] + [remainder]

Encoding: The integer $n$ is converted to Unicode Subscript Digit characters (U+2080–U+2089).

Raw address (excerpt)	$n$	Subscript display
`0x0000abcd…1234`	4	`0x0₄abcd…1234`
`0x00000000abcd…1234`	8	`0x0₈abcd…1234`
`0x000000000004444c5dc75cB358380D2e3dE08A90`	11	`0x0₁₁44444c5dc75cB358380D2e3dE08A90`
`0x000000000004444c5dc75cB358380D2e3dE08A90`	11	`0x0₁₁4444c…8A90`
`0x0000000000000000000000000000000000000001`	39	`0x0₃₉1`

Mode B: Parenthesis Notation — ASCII Fallback (CLI/Logs/APIs)

Recommended for: Developer consoles, log files, copy-paste operations, automated trading bot notifications, cross-platform messaging, and any environment where consistent Unicode rendering cannot be guaranteed.

The (n) parenthesis notation is the emerging de facto standard in automated trading bot alerts and cross-platform messaging systems precisely because it is safe in every ASCII context while still being unambiguous.

Syntax: 0x0 + ( + [n] + ) + [remainder]

Raw address (excerpt)	$n$	ASCII display
`0x0000abcd…1234`	4	`0x0(4)abcd…1234`
`0x00000000abcd…1234`	8	`0x0(8)abcd…1234`
`0x000000000004444c5dc75cB358380D2e3dE08A90`	11	`0x0(11)44444c5dc75cB358380D2e3dE08A90`
`0x000000000004444c5dc75cB358380D2e3dE08A90`	11	`0x0(11)4444c…8A90`
`0x0000000000000000000000000000000000000001`	39	`0x0(39)1`

3. Case Sensitivity and EIP-55

To preserve the checksum integrity defined in EIP-55 and EIP-1191:

The compression MUST strictly match identical ASCII characters.
The compression MUST NOT be applied to mixed-case sequences (e.g., aAaA cannot be compressed).
All non-compressed characters MUST retain their original casing.

Rationale

Subscript vs. Superscript (UI Mode)

This ERC selects Subscript (0x0₈) rather than superscript notation for the following reasons:

Industry Alignment: As described in the Motivation’s “Industry Adoption Gap”, block explorers already use subscript for token amounts. Reusing that convention here gives users a single, consistent mental model across the ecosystem.

Chemical/Scientific Metaphor: Subscript notation in chemistry (H₂O) universally means “count of the preceding atom.” Applied here — 0x0₈ — it intuitively reads as “eight zeros.” Superscripts, by contrast, carry a mathematical power-of metaphor (x⁸ = x to the power of 8) that conflicts with the intended meaning.

Baseline Conflict Avoidance: Superscripts can visually collide with EIP-55 checksummed capitals when rendered in compact fonts or small UI elements. Subscripts sit below the text baseline and remain visually distinct.

Parenthesis vs. Curly-Brace (ASCII Mode)

This ERC selects parenthesis (n) rather than curly-brace {n} for the ASCII fallback:

De Facto Adoption: Automated trading bots, Telegram/Discord notification systems, and cross-platform alert pipelines have independently converged on 0x0(n) as the shorthand for addresses with long leading zeros. Standardising on existing practice minimises adoption friction.

Copy-Paste Safety: Parentheses () are legal in many shell contexts where curly braces {} trigger shell expansion, making 0x0(8)abcd… safer to paste into terminals and scripts without escaping.

Divergence from IPv6: Unlike IPv6’s implicit :: fill, this notation always carries an explicit count, preserving the address-identity property that the exact number of zeros matters.

Threshold of $n \geq 4$

A threshold of four leading zeros was chosen because:

Fewer than four leading zeros (0x000…) are common enough that compression would add noise without meaningful readability gain.
Four or more zeros (0x0000…) already push the boundary of reliable human counting; subscript compression provides measurable UX benefit at this point.
The most security-relevant deliberately mined and gas-optimised addresses (e.g., ERC-4337 EntryPoint at 7 zeros, Uniswap V4 at 11 zeros) are all well above the threshold.

Asymmetric Security: Easy to Verify, Hard to Attack

The use of addresses with long leading zeros, as described in the Motivation, creates a quantifiable cost asymmetry. With a modern consumer GPU at ~180 MH/s (double-keccak via WebGPU):

Defender (one-time): Mining 9 leading zeros takes on the order of minutes.
Attacker (per victim): Must match 9 zeros and a specific 6-nibble suffix — difficulty $O(16^{9+6}) = 16^{15}$ hashes ≈ 32 years on the same hardware.

Even with hardware 10× more powerful than the defender’s, the attacker still requires years per victim. This aligns perfectly with Ethereum’s broader security principle: operations should be easy to verify but hard to forge.

This ERC is the display complement to that approach. Without it, a user holding an address with long leading zeros cannot efficiently communicate or verify its zero count. With it, 0x0₉suffix signals both the count and the security posture at a glance — turning a cryptographic property into a human-readable security guarantee.

Gas Optimization Context

With the introduction of logic similar to EIP-7939 (scaling calldata gas costs by zero bytes), the ecosystem will see a proliferation of addresses engineered for maximum leading-zero bytes. A display format that handles these addresses gracefully is a necessary proactive measure for user experience.

Backwards Compatibility

This ERC is strictly a presentation layer standard.

Wallets: Must strip the compression formatting (convert back to full hex) before signing or broadcasting transactions.

Safety: Neither parentheses () nor Unicode subscript characters (U+2080–U+2089) are valid hexadecimal characters. If a user blindly copies a notated address into a legacy system, the system will reject the input as invalid rather than processing a wrong address. This acts as an automatic fail-safe mechanism.

Reference Implementation

The following Python implementation demonstrates the encoding logic for both Subscript (Unicode) and Parenthesis (ASCII) modes, including full-display and truncated variants.

# Subscript digit Unicode codepoints: U+2080 (₀) … U+2089 (₉)
SUBSCRIPTS = {
    '0': '\u2080', '1': '\u2081', '2': '\u2082', '3': '\u2083', '4': '\u2084',
    '5': '\u2085', '6': '\u2086', '7': '\u2087', '8': '\u2088', '9': '\u2089',
}

LEADING_ZERO_THRESHOLD = 4  # only apply notation when n >= 4


def _to_subscript(n: int) -> str:
    """Convert a non-negative integer to a Unicode subscript string."""
    return "".join(SUBSCRIPTS[d] for d in str(n))


def count_leading_zeros(address: str) -> int:
    """Return the number of leading zero nibbles after the 0x prefix."""
    body = address[2:] if address.startswith("0x") else address
    count = 0
    for ch in body:
        if ch == '0':
            count += 1
        else:
            break
    return count


def format_address(address: str, mode: str = 'unicode', truncate: bool = False) -> str:
    """
    Format an EVM address using Hexadecimal Subscript Notation.

    Parameters
    ----------
    address  : Full 42-character EVM address (with 0x prefix).
    mode     : 'unicode' → subscript notation  (0x0₈abcd…)
               'ascii'   → parenthesis notation (0x0(8)abcd…)
    truncate : If True, abbreviate the remainder to first4…last4 characters.

    Returns
    -------
    Formatted address string, or the original address if n < LEADING_ZERO_THRESHOLD.
    """
    n = count_leading_zeros(address)
    if n < LEADING_ZERO_THRESHOLD:
        return address  # below threshold — no transformation

    body = address[2:]       # strip 0x
    remainder = body[n:]     # everything after the leading zeros

    if mode == 'unicode':
        count_str = _to_subscript(n)
        compact_prefix = f"0x0{count_str}"
    else:  # ascii / parenthesis
        compact_prefix = f"0x0({n})"

    if not truncate or len(remainder) <= 8:
        return f"{compact_prefix}{remainder}"

    # Truncated: show first 4 and last 4 nibbles of the remainder
    return f"{compact_prefix}{remainder[:4]}…{remainder[-4:]}"


# ---------------------------------------------------------------------------
# Test Cases
# ---------------------------------------------------------------------------

# 1. ERC-4337 EntryPoint — 7 leading zeros
addr1 = "0x0000000071727De22E5E9d8BAf0edAc6f37da032"
print(format_address(addr1, mode='unicode'))
# → 0x0₇71727De22E5E9d8BAf0edAc6f37da032

print(format_address(addr1, mode='ascii'))
# → 0x0(7)71727De22E5E9d8BAf0edAc6f37da032

# 2. Uniswap V4 PoolManager — 11 leading zeros (truncated)
addr2 = "0x000000000004444c5dc75cB358380D2e3dE08A90"
print(format_address(addr2, mode='unicode', truncate=True))
# → 0x0₁₁4444…8A90

print(format_address(addr2, mode='ascii', truncate=True))
# → 0x0(11)4444…8A90

# 3. Ethereum precompile — 39 leading zeros
addr3 = "0x0000000000000000000000000000000000000001"
print(format_address(addr3, mode='unicode'))
# → 0x0₃₉1

print(format_address(addr3, mode='ascii'))
# → 0x0(39)1

# 4. Below-threshold address — no transformation applied
addr4 = "0x000abcdef1234567890abcdef1234567890abcd"
print(format_address(addr4, mode='unicode'))
# → 0x000abcdef1234567890abcdef1234567890abcd  (unchanged: n=3 < 4)

Security Considerations

Address Poisoning Mitigation: The attack mechanism and the use of addresses with long leading zeros as a mitigation are described in the Motivation section. From an implementation standpoint, the critical requirement is that the notation MUST be rendered prominently and legibly — a subscript that is too small or too faint to read at a glance defeats the purpose of this standard entirely.

Subscript Legibility: Implementations MUST choose a font in which subscript digits (₀–₉) are clearly distinguishable from their full-size counterparts. 0x0₇ MUST NOT be renderable as 0x07 through font choice or rendering pipeline.

Copy-Paste Validation: Applications that accept address input MUST prioritise standard hexadecimal parsing. If an input contains (n) parenthesis notation or Unicode subscript characters, the application MUST either:

Explicitly offer to decompress the notation before processing, or
Reject the input as invalid.

Neither subscript Unicode characters nor parentheses are valid hexadecimal. This acts as an automatic fail-safe: a user who blindly pastes a notated address into a legacy system will receive an “invalid address” error rather than a silent wrong-address transaction.

Preservation of Entropy: This standard operates exclusively on the low-entropy leading-zero prefix. All non-zero nibbles — which carry the cryptographic uniqueness of the address — are always displayed verbatim and uncompressed, preserving the full information needed for security verification.

Copyright

Citation

Please cite this document as:

Zainan Victor Zhou (@xinbenlv), "ERC-8117: Anti-Poisoning Compact EVM Address Format [DRAFT]," Ethereum Improvement Proposals, no. 8117, December 2025. [Online serial]. Available: https://eips.ethereum.org/EIPS/eip-8117.