INDEX
Explanations
words relating to strong positive emotions, particularly a high degree of liking or affection
mentions of fondness or positive feelings
New Auto-Interp
Negative Logits
irrel
-0.78
adesh
-0.74
UGH
-0.69
Tube
-0.67
udder
-0.67
opers
-0.65
ħĭ
-0.64
atts
-0.64
KT
-0.64
IDER
-0.64
POSITIVE LOGITS
fond
1.20
ness
0.94
memories
0.88
uously
0.87
farewell
0.83
remem
0.82
nesses
0.81
rait
0.79
entimes
0.79
ries
0.76
Activations Density 0.010%