INDEX
Explanations
comparisons and contrasts between concepts
New Auto-Interp
Negative Logits
.BLL
-0.15
earn
-0.15
acco
-0.14
Ñģло
-0.14
faux
-0.14
ulle
-0.14
nors
-0.14
Ïīνα
-0.13
arn
-0.13
silver
-0.13
POSITIVE LOGITS
--)
0.16
.parameter
0.15
593
0.14
gger
0.14
ugin
0.14
atego
0.14
еÑĢалÑĮ
0.14
ابت
0.14
jt
0.14
Ñģов
0.13
Activations Density 0.155%