INDEX
Explanations
adjectives and nouns related to positive emotional memories
expressions of fondness or nostalgia
New Auto-Interp
Negative Logits
irrel
-0.74
adesh
-0.69
UGH
-0.68
udder
-0.67
pta
-0.64
helicop
-0.63
FT
-0.63
opers
-0.62
NG
-0.62
IDER
-0.62
POSITIVE LOGITS
fond
1.13
uously
0.99
nesses
0.94
ness
0.94
memories
0.89
iously
0.88
entimes
0.84
remem
0.83
uous
0.80
ré
0.79
Activations Density 0.012%