INDEX
Explanations
punctuation, particularly periods and question marks
New Auto-Interp
Negative Logits
idis
-0.16
tags
-0.15
share
-0.15
click
-0.14
aforementioned
-0.14
link
-0.14
source
-0.14
quick
-0.13
type
-0.13
number
-0.13
POSITIVE LOGITS
Iron
0.23
↵↵
0.18
It
0.17
Iron
0.17
That
0.17
That
0.17
iron
0.17
But
0.17
There
0.16
Of
0.16
Activations Density 0.181%