INDEX
Explanations
politics
This neuron activates on tokens related to politics, power struggles, and intrigue.
New Auto-Interp
Negative Logits
PKK
-0.07
ماد
-0.07
ธ
-0.07
Kang
-0.07
anguish
-0.07
.csrf
-0.06
yak
-0.06
/train
-0.06
(customer
-0.06
대구
-0.06
POSITIVE LOGITS
aspirations
0.07
самостоятельно
0.07
_URI
0.06
static
0.06
Queens
0.06
После
0.06
oooooooo
0.06
perimeter
0.06
墙
0.06
Runtime
0.06
Activations Density 0.027%