INDEX
Explanations
The neuron fires on tokens naming social parties or relationships (e.g. crew, strains) – in other words, it detects words referring to people‐group entities or their interpersonal dynamics.
New Auto-Interp
Negative Logits
McCorm
-0.06
617
-0.06
.same
-0.06
Νο
-0.06
hits
-0.06
fis
-0.06
알고
-0.06
lsruhe
-0.06
onTap
-0.06
knot
-0.06
POSITIVE LOGITS
Johnny
0.07
guarda
0.07
.get
0.07
dying
0.06
ByID
0.06
.getToken
0.06
(point
0.06
Config
0.06
)?↵↵
0.06
Double
0.06
Activations Density 0.000%