INDEX
Explanations
phrases expressing strong positive emotions, particularly about different things like recipes, songs, and experiences
expressions of strong personal preferences or affections
New Auto-Interp
Negative Logits
uthor
-0.86
river
-0.74
ufact
-0.72
transpired
-0.71
ailable
-0.68
Corp
-0.67
imester
-0.66
disadvantage
-0.64
ascript
-0.63
Args
-0.62
POSITIVE LOGITS
dearly
0.86
uncond
0.76
idea
0.72
spont
0.69
guts
0.67
reputation
0.63
hobby
0.63
outdoors
0.62
vibe
0.62
decoration
0.62
Activations Density 0.291%