INDEX
Negative Logits
included
-0.08
EB
-0.08
akt
-0.08
ather
-0.08
SYS
-0.08
ünkü
-0.07
JAVA
-0.07
半年
-0.07
,row
-0.07
орал
-0.07
POSITIVE LOGITS
zomaar
0.13
overly
0.12
frivol
0.11
unless
0.10
tenzij
0.10
unrealistic
0.10
preconce
0.09
contradictory
0.09
misleading
0.09
deceptive
0.09
Activations Density 0.001%