INDEX
Explanations
words related to positivity or desirability
words expressing positive or favorable evaluations and opinions
New Auto-Interp
Negative Logits
hid
-0.88
grave
-0.85
bus
-0.85
lang
-0.82
liam
-0.79
hod
-0.78
driver
-0.75
jack
-0.74
iq
-0.73
drivers
-0.72
POSITIVE LOGITS
favorable
1.40
avorable
1.31
favourable
1.20
unfavorable
1.12
matchups
0.97
favorably
0.93
agre
0.88
advantageous
0.86
favors
0.85
ratings
0.81
Activations Density 0.007%