INDEX
Explanations
mentions of celebrations or well wishes
the word "Happy" in various contexts
New Auto-Interp
Negative Logits
arin
-0.79
rovers
-0.72
ij
-0.72
IDER
-0.72
uing
-0.71
ickr
-0.68
urally
-0.68
aeda
-0.68
urer
-0.68
enses
-0.67
POSITIVE LOGITS
ness
0.90
Birthday
0.86
birthday
0.81
Gilmore
0.80
Happy
0.79
joy
0.78
Happy
0.76
Meal
0.76
bies
0.72
happy
0.69
Activations Density 0.023%