INDEX
Explanations
phrases expressing threats or harmful intentions
New Auto-Interp
Negative Logits
akan
-0.16
OLUMNS
-0.15
reau
-0.15
asi
-0.15
etic
-0.15
amel
-0.14
__("-0.14
-END
-0.14
overrides
-0.14
_featured
-0.14
POSITIVE LOGITS
arih
0.17
orgh
0.16
OrNull
0.16
krom
0.16
ôi
0.15
engo
0.15
CTest
0.14
urum
0.14
ovit
0.14
Gulf
0.14
Activations Density 0.334%