INDEX
Explanations
words related to certainty or confidence
New Auto-Interp
Negative Logits
cial
-0.76
idelines
-0.74
utch
-0.74
jab
-0.72
vert
-0.69
AK
-0.67
enta
-0.67
hes
-0.66
thinkable
-0.66
Rated
-0.66
POSITIVE LOGITS
someday
0.85
whoever
0.83
Rasmussen
0.74
sooner
0.71
Admin
0.70
readers
0.67
none
0.64
someone
0.62
historians
0.62
CCP
0.61
Activations Density 0.313%