INDEX
Negative Logits
weakening
-0.10
bruis
-0.09
inability
-0.08
strengthening
-0.08
asakan
-0.08
weakened
-0.08
weaker
-0.08
weaken
-0.08
אד
-0.08
itching
-0.08
POSITIVE LOGITS
mirror
0.10
Mirror
0.10
.flip
0.10
_flip
0.10
Mirror
0.09
Flip
0.09
mirror
0.09
Flip
0.09
mirrored
0.09
flip
0.09
Activations Density 0.021%