INDEX
Explanations
phrases related to violent or threatening actions
New Auto-Interp
Negative Logits
ÐĴС
-0.15
tel
-0.15
wire
-0.15
PACE
-0.15
wire
-0.15
ewire
-0.14
лÑıд
-0.14
nova
-0.14
iple
-0.14
Nova
-0.14
POSITIVE LOGITS
ssel
0.16
JNI
0.16
Patch
0.15
.patch
0.14
yonel
0.14
037
0.14
patch
0.14
Nut
0.13
Å¡tÃŃ
0.13
keys
0.13
Activations Density 0.085%