INDEX
Explanations
different expressions or references to "face" and "way."
New Auto-Interp
Negative Logits
568
-0.16
sque
-0.15
pez
-0.15
Zap
-0.15
loh
-0.14
aque
-0.14
mary
-0.13
figure
-0.13
against
-0.13
bur
-0.13
POSITIVE LOGITS
rieve
0.17
mpp
0.16
ãĥĵãĥ¼
0.15
crown
0.15
еÑĢин
0.14
mented
0.14
gin
0.14
ÑģÑĤин
0.14
krev
0.14
ĽĪ
0.14
Activations Density 0.253%