INDEX
Explanations
words related to positive feelings or emotions
expressions of positive feelings or happiness
New Auto-Interp
Negative Logits
distinguished
-0.68
ring
-0.67
favoured
-0.65
ngth
-0.62
Lif
-0.62
inguished
-0.61
disob
-0.59
eming
-0.58
favored
-0.57
rown
-0.56
POSITIVE LOGITS
enough
0.82
stories
0.82
waves
0.69
enough
0.67
paren
0.66
Enough
0.64
lapt
0.64
Textures
0.63
aloud
0.63
alright
0.62
Activations Density 0.065%