INDEX
Negative Logits
Pun
-0.06
Reward
-0.06
lland
-0.06
Fmt
-0.06
继
-0.06
.questions
-0.06
Raf
-0.06
isEqual
-0.06
Kur
-0.06
Rin
-0.06
POSITIVE LOGITS
correctly
0.07
similar
0.07
TICK
0.07
generator
0.06
assaults
0.06
čist
0.06
incorrectly
0.06
↵ ↵
0.06
этих
0.06
replicate
0.06
Activations Density 0.009%