INDEX
Negative Logits
IL
-0.07
_INST
-0.07
꽃
-0.07
robbed
-0.06
Fury
-0.06
owy
-0.06
iptables
-0.06
puppy
-0.06
рост
-0.06
kittens
-0.06
POSITIVE LOGITS
change
0.24
Change
0.20
-change
0.14
Change
0.14
change
0.13
CHANGE
0.12
CHANGE
0.10
changes
0.10
_change
0.09
chang
0.09
Activations Density 0.044%