INDEX
Explanations
phrases related to legal matters and accusations
New Auto-Interp
Negative Logits
Böl
-0.17
usz
-0.16
poil
-0.16
-www
-0.15
Hint
-0.14
plevel
-0.14
ีà¹Ģà¸Ń
-0.13
/apis
-0.13
apis
-0.13
codegen
-0.13
POSITIVE LOGITS
witch
0.19
partisan
0.19
Witch
0.18
political
0.17
witch
0.17
lies
0.17
isol
0.17
nothing
0.16
Political
0.15
politics
0.15
Activations Density 0.046%