INDEX
Explanations
end punctuation marks, particularly periods and quotes
New Auto-Interp
Negative Logits
↵
-0.39
↵↵
-0.24
↵ ↵
-0.19
 
-0.18
↵ ↵
-0.18
↵ ↵
-0.18
.
-0.17
ses
-0.16
↵ ↵
-0.16
↵ ↵
-0.16
POSITIVE LOGITS
jpg
0.25
This
0.21
The
0.20
0.20
It
0.19
png
0.19
They
0.18
These
0.18
"↵
0.18
And
0.17
Activations Density 0.122%