INDEX
Explanations
phrases indicating disappointment or dissatisfaction
expressions of disappointment or lamentation
New Auto-Interp
Negative Logits
pyramid
-0.61
guiActiveUn
-0.61
subordinate
-0.61
disguise
-0.57
horr
-0.57
anat
-0.56
Picture
-0.55
opian
-0.54
complicit
-0.54
part
-0.54
POSITIVE LOGITS
been
1.33
been
1.08
gotten
0.97
gotten
0.90
Been
0.90
shown
0.83
ered
0.80
lled
0.79
fallen
0.78
ished
0.76
Activations Density 0.057%