INDEX
Explanations
words related to strong opinions or emphasis
New Auto-Interp
Negative Logits
ioned
-0.77
bourg
-0.71
ulton
-0.71
ulative
-0.71
isson
-0.71
jri
-0.70
engers
-0.70
iem
-0.69
tein
-0.69
NetMessage
-0.69
POSITIVE LOGITS
ove
0.71
reme
0.71
bananas
0.67
urious
0.67
nuts
0.67
fucking
0.66
delighted
0.65
Vader
0.65
ĪĴ
0.65
adore
0.64
Activations Density 0.025%