INDEX
Explanations
expressions of agreement or validation in conversation
New Auto-Interp
Negative Logits
elles
-0.15
ropp
-0.15
είο
-0.14
upy
-0.14
ophone
-0.14
opis
-0.13
hud
-0.13
γο
-0.13
_managed
-0.13
STATS
-0.13
POSITIVE LOGITS
correct
0.60
spot
0.50
spot
0.47
correct
0.46
Correct
0.43
Spot
0.42
Correct
0.41
Spot
0.40
accurate
0.40
-spot
0.38
Activations Density 0.200%