INDEX
Explanations
This neuron specifically detects mentions of personal connections, most notably the phrase “friend of.”
New Auto-Interp
Negative Logits
Mild
-0.07
Buscar
-0.06
Filip
-0.06
cmp
-0.06
představ
-0.06
як
-0.06
acas
-0.06
bsite
-0.06
fasting
-0.06
windshield
-0.06
POSITIVE LOGITS
oring
0.07
IONS
0.07
Undo
0.06
ANTED
0.06
$status
0.06
edir
0.06
game
0.06
.Qt
0.06
tink
0.06
issue
0.06
Activations Density 0.016%