INDEX
Explanations
affirmative responses and expressions of agreement
New Auto-Interp
Negative Logits
för
-0.38
Towns
-0.35
Αν
-0.34
ⓧ
-0.34
gider
-0.34
ARROLL
-0.34
帖最后由
-0.34
んですけど
-0.33
Token
-0.33
currentColor
-0.33
POSITIVE LOGITS
yes
1.06
Yes
0.95
Yes
0.94
yes
0.90
YES
0.82
yep
0.77
YES
0.76
Yep
0.73
Yep
0.68
yep
0.68
Activations Density 0.214%