INDEX
Explanations
This neuron activates on mentions of people going missing or disappearing (e.g., “missing,” “disappearance,” “went missing”).
New Auto-Interp
Negative Logits
jenom
-0.06
mocker
-0.06
tạp
-0.06
'?
-0.06
chiefs
-0.06
ITS
-0.06
маль
-0.06
профессиональ
-0.06
onces
-0.06
Lamb
-0.06
POSITIVE LOGITS
disappeared
0.09
(steps
0.07
Scalar
0.07
phil
0.07
locking
0.07
vanished
0.07
-bl
0.06
起
0.06
แจ
0.06
得
0.06
Activations Density 0.018%