INDEX
Explanations
phrases that express uncertainty or questioning
New Auto-Interp
Negative Logits
طب
-0.16
TintColor
-0.15
zed
-0.14
-selector
-0.14
elon
-0.14
SSI
-0.14
ecure
-0.14
annes
-0.14
jÃŃž
-0.14
rement
-0.14
POSITIVE LOGITS
tell
0.79
Tell
0.70
telling
0.68
tell
0.66
tells
0.66
Tell
0.61
Tells
0.53
told
0.52
åijĬè¯ī
0.47
.tell
0.44
Activations Density 0.083%