INDEX
Explanations
phrases related to superficial or initial impressions
references to appearances or superficial characteristics
New Auto-Interp
Negative Logits
icer
-0.79
ashington
-0.69
rench
-0.68
tailed
-0.66
quar
-0.65
ãĥĺ
-0.65
rus
-0.65
Joined
-0.64
die
-0.60
cel
-0.60
POSITIVE LOGITS
glance
1.18
blush
0.81
superf
0.70
pedia
0.69
resemb
0.68
sounds
0.66
LOOK
0.66
standpoint
0.65
clues
0.65
superficial
0.65
Activations Density 0.111%