INDEX
Explanations
The neuron specifically fires on mentions of “panic,” particularly in the context of “panic attacks.”
New Auto-Interp
Negative Logits
ràng
-0.07
лист
-0.07
Rows
-0.07
_slave
-0.07
rib
-0.06
slee
-0.06
�
-0.06
isors
-0.06
_bounds
-0.06
کل
-0.06
POSITIVE LOGITS
panic
0.14
Panic
0.13
panicked
0.09
libc
0.08
Quick
0.08
political
0.07
_SYSTEM
0.07
frantic
0.07
panic
0.07
pry
0.07
Activations Density 0.001%