INDEX
Explanations
positive adjectives related to pleasant experiences
the word "nice" and its variations in different contexts
New Auto-Interp
Negative Logits
rained
-0.82
arian
-0.80
arians
-0.79
aer
-0.76
ogens
-0.75
inant
-0.75
wear
-0.74
uilding
-0.73
arers
-0.73
alone
-0.70
POSITIVE LOGITS
nice
0.88
additions
0.86
enough
0.84
reception
0.81
bye
0.78
touches
0.77
consolation
0.77
bonus
0.76
breeze
0.74
fluffy
0.73
Activations Density 0.014%