INDEX
Explanations
words indicating likelihood or possibility
phrases indicating perceived qualities or characteristics
New Auto-Interp
Negative Logits
estern
-0.98
ests
-0.77
ilts
-0.76
iding
-0.72
orthern
-0.72
atform
-0.68
loads
-0.67
atching
-0.67
aign
-0.66
leasing
-0.65
POSITIVE LOGITS
rils
0.88
oddly
0.81
awfully
0.79
strangely
0.79
innocuous
0.78
plaus
0.77
mysteriously
0.76
Pause
0.76
poised
0.73
like
0.72
Activations Density 0.059%