INDEX
Explanations
exaggerated or extreme adjectives
phrases that convey a sense of near completeness or approximation
New Auto-Interp
Negative Logits
tein
-0.73
yi
-0.68
ioch
-0.68
alez
-0.66
lest
-0.65
ems
-0.64
iere
-0.64
igans
-0.64
seller
-0.63
aley
-0.63
POSITIVE LOGITS
unchanged
0.81
indistinguishable
0.80
etheless
0.79
thood
0.78
identical
0.76
unemploy
0.76
illiter
0.76
electric
0.75
unheard
0.71
unint
0.68
Activations Density 0.017%