INDEX
Explanations
phrases related to the outer appearance or surface characteristics
phrases that indicate superficiality or appearances
New Auto-Interp
Negative Logits
rons
-0.73
Els
-0.67
iband
-0.66
rys
-0.64
adr
-0.63
igers
-0.63
iru
-0.63
CHQ
-0.62
ĸļ
-0.61
WR
-0.61
POSITIVE LOGITS
nutshell
0.88
glance
0.73
standpoint
0.70
ij士
0.69
hindsight
0.68
Appearance
0.64
speaking
0.64
aside
0.63
synopsis
0.63
though
0.61
Activations Density 0.186%