INDEX
Explanations
the word "No" in various contexts and formats
New Auto-Interp
Negative Logits
ⓧ
-0.85
imagui
-0.76
<pad>
-0.73
𑄮
-0.73
-0.72
AISSEE
-0.72
<unused52>
-0.72
<unused14>
-0.72
<unused8>
-0.72
<unused3>
-0.72
POSITIVE LOGITS
No
0.51
No
0.38
NO
0.34
Neither
0.33
neither
0.32
URL
0.31
RE
0.29
image
0.29
Neither
0.29
no
0.28
Activations Density 0.047%