INDEX
Explanations
words related to happiness or positive outcomes
occurrences of the word "happy" and its variations
New Auto-Interp
Negative Logits
impulse
-0.73
Nile
-0.68
ħĭ
-0.67
ctica
-0.65
trl
-0.64
Scotia
-0.64
guiActiveUn
-0.64
heses
-0.63
IMAGES
-0.60
oute
-0.60
POSITIVE LOGITS
ened
1.27
ening
1.23
iness
1.11
iest
1.08
ily
1.07
ier
1.00
eners
0.92
icial
0.90
ert
0.88
eret
0.87
Activations Density 0.025%