INDEX
Explanations
the word "Happy"
occurrences of the word "Happy" in various contexts
New Auto-Interp
Negative Logits
IDER
-0.87
arin
-0.87
urally
-0.86
ents
-0.82
ental
-0.77
andestine
-0.76
ently
-0.74
encers
-0.72
encies
-0.72
Ernst
-0.72
POSITIVE LOGITS
Happy
0.96
Happy
0.91
Birthday
0.91
joy
0.84
Meal
0.83
happy
0.81
ness
0.81
happy
0.81
birthday
0.78
Joy
0.72
Activations Density 0.012%