INDEX
Explanations
references to aggressive or violent actions
New Auto-Interp
Negative Logits
lán
-0.16
cr
-0.15
.gwt
-0.15
aphrag
-0.14
"<?
-0.14
.sap
-0.14
bilt
-0.14
AppBundle
-0.14
icio
-0.14
atoi
-0.14
POSITIVE LOGITS
robat
0.15
airs
0.14
ávÄĽ
0.14
.gdx
0.14
elerik
0.14
endant
0.14
жд
0.13
sooner
0.13
unsch
0.13
resher
0.13
Activations Density 0.017%