INDEX
Negative Logits
ют
-0.07
praising
-0.07
bunun
-0.07
ynet
-0.06
_but
-0.06
erect
-0.06
reas
-0.06
indows
-0.06
こう
-0.06
loat
-0.06
POSITIVE LOGITS
ethical
0.07
refrigerator
0.07
missing
0.06
line
0.06
↵ ↵
0.06
easier
0.06
basketball
0.06
.metric
0.06
username
0.06
skilled
0.06
Activations Density 0.002%