INDEX
Explanations
adjectives related to visual appearance and comparisons
phrases related to appearances and perceptions
New Auto-Interp
Negative Logits
oute
-0.67
oulos
-0.66
cember
-0.62
railing
-0.60
ciating
-0.60
Strikes
-0.60
Perspect
-0.60
Variant
-0.60
Ways
-0.59
ensu
-0.59
POSITIVE LOGITS
stronger
0.84
nicer
0.80
invincible
0.79
bigger
0.79
attractive
0.78
smarter
0.78
insignificant
0.77
thood
0.76
weaker
0.75
worse
0.74
Activations Density 0.109%