INDEX
Explanations
phrases indicating initial impressions or evaluations
New Auto-Interp
Negative Logits
esser
-0.16
ational
-0.16
ady
-0.16
oje
-0.15
جد
-0.14
绾
-0.14
PIP
-0.14
dub
-0.14
isky
-0.14
overe
-0.14
POSITIVE LOGITS
glance
0.49
sight
0.37
quick
0.30
-gl
0.29
Sight
0.29
blush
0.27
look
0.26
quick
0.26
inspection
0.26
Gl
0.25
Activations Density 0.028%