INDEX
Negative Logits
-0.08
Haven
-0.07
(frame
-0.07
Data
-0.07
(image
-0.07
kiss
-0.07
adoras
-0.07
_MULTI
-0.07
Verlet
-0.07
Multi
-0.07
POSITIVE LOGITS
осторож
0.14
cautious
0.14
caution
0.13
cautiously
0.13
precaution
0.12
harmless
0.12
_safe
0.12
responsibly
0.12
safely
0.11
legít
0.11
Activations Density 0.672%