INDEX
Explanations
adjectives describing positive qualities or outcomes
expressions of moderate to high enthusiasm or positivity about subjects, particularly using the word "pretty."
New Auto-Interp
Negative Logits
upon
-0.82
venant
-0.80
APD
-0.79
FTA
-0.73
arta
-0.72
KY
-0.71
pent
-0.69
HAEL
-0.68
jen
-0.68
alez
-0.67
POSITIVE LOGITS
darn
1.26
nifty
0.97
much
0.89
damn
0.87
tasty
0.85
harmless
0.85
nice
0.84
straightforward
0.83
neat
0.81
awesome
0.80
Activations Density 0.019%