INDEX
Explanations
phrases related to expressing opinions or beliefs
New Auto-Interp
Negative Logits
nered
-0.77
kus
-0.73
emouth
-0.73
gage
-0.67
uncture
-0.66
ezvous
-0.65
ourning
-0.64
ngth
-0.63
ortment
-0.63
Aware
-0.63
POSITIVE LOGITS
smoothly
0.86
WARD
0.72
nicely
0.71
dividends
0.70
moot
0.68
unnoticed
0.67
downhill
0.67
smoother
0.64
unden
0.63
precedence
0.63
Activations Density 0.089%