INDEX
Explanations
phrases and terms related to danger and conflict
New Auto-Interp
Negative Logits
ober
-0.20
IBC
-0.15
omba
-0.15
лиÑĤ
-0.15
hi
-0.14
cano
-0.14
uther
-0.14
mpi
-0.14
Äįe
-0.14
oth
-0.14
POSITIVE LOGITS
iveau
0.15
ous
0.14
xmm
0.14
strup
0.14
_UID
0.14
imson
0.14
igar
0.14
-offset
0.13
resse
0.13
bor
0.13
Activations Density 0.158%