INDEX
Explanations
scores and maximum values
The neuron does not consistently activate on any tokens and thus does not detect a specific pattern.
New Auto-Interp
Negative Logits
employee
-0.07
ну
-0.07
root
-0.07
لم
-0.07
더
-0.07
Permission
-0.06
blob
-0.06
allen
-0.06
gw
-0.06
_transport
-0.06
POSITIVE LOGITS
двиг
0.07
(/^
0.07
desc
0.06
ีช
0.06
_possible
0.06
Clients
0.06
_nama
0.06
_limit
0.06
oji
0.06
//----------------------------------------------------------------------------↵
0.06
Activations Density 0.007%