INDEX
Explanations
mentions of appearance or visual resemblance
phrases that describe appearance or perception
New Auto-Interp
Negative Logits
arius
-0.76
ufact
-0.72
ItemTracker
-0.65
Variant
-0.64
railing
-0.63
olars
-0.63
gnu
-0.62
toget
-0.59
NEWS
-0.58
riad
-0.58
POSITIVE LOGITS
vind
0.76
lier
0.73
smarter
0.70
stronger
0.68
louder
0.67
invincible
0.65
operative
0.64
uate
0.64
ingly
0.62
="#
0.61
Activations Density 0.081%