INDEX
Explanations
Self-reference
This neuron detects first-person self-referential words and role/identity declarations (tokens like "I", "I'm", "am" and similar self-identifying phrases).
New Auto-Interp
Negative Logits
IVING
-0.08
ario
-0.08
/list
-0.07
OVE
-0.07
IRE
-0.07
ITA
-0.07
_sphere
-0.07
ijo
-0.07
ARIO
-0.07
.Hidden
-0.06
POSITIVE LOGITS
licz
0.07
FIFA
0.06
];↵↵↵
0.06
να
0.06
.getApp
0.06
neby
0.06
essa
0.06
porte
0.06
'nde
0.05
pochop
0.05
Activations Density 0.175%