INDEX

Explanations

Kav, Kaw, or Cav followed by names

The neuron consistently activates on word fragments beginning with “Kaw” or “Kav,” i.e. proper‐name substrings containing that prefix.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ňte

-0.82

할

-0.78

aliere

-0.69

🦵

-0.69

ńcz

-0.69

 ramo

-0.69

辛

-0.69

 chua

-0.69

 خواه

-0.69

ы

-0.69

POSITIVE LOGITS

tius

0.85

/***

0.84

hept

0.77

chero

0.75

すじ

0.75

参り

0.75

 shrinking

0.75

BT

0.74

 "}";

0.74

疽

0.73

Activations Density 0.015%