INDEX
Explanations
affirmations and expressions of agreement
New Auto-Interp
Negative Logits
елиÑĩ
-0.16
arella
-0.15
ushman
-0.15
ocop
-0.14
Bounty
-0.14
viÄį
-0.14
ãĥ«ãĤ¯
-0.14
ount
-0.14
olest
-0.14
éal
-0.13
POSITIVE LOGITS
correct
0.59
Correct
0.45
yes
0.45
Correct
0.45
right
0.44
correct
0.43
right
0.39
Äijúng
0.39
Yes
0.38
yes
0.36
Activations Density 0.232%