INDEX
Negative Logits
adviser
-0.07
onders
-0.07
TEGR
-0.06
Leopard
-0.06
along
-0.06
Owners
-0.06
spouse
-0.06
ocial
-0.06
reinforce
-0.06
.it
-0.06
POSITIVE LOGITS
wy
0.07
out
0.07
ousted
0.07
вып
0.07
выб
0.07
вывод
0.07
.shortcuts
0.07
heraus
0.06
выступ
0.06
بیرون
0.06
Activations Density 0.031%