INDEX
Explanations
mentions of political figures
instances of the end of a document
New Auto-Interp
Negative Logits
Azerb
-0.14
thous
-0.10
indo
-0.10
citiz
-0.10
ĪĴ
-0.10
miscon
-0.10
pilgr
-0.10
newcom
-0.10
millenn
-0.09
elsius
-0.09
POSITIVE LOGITS
↵
0.13
The
0.12
G
0.12
I
0.11
B
0.11
In
0.11
<|endoftext|>
0.11
S
0.11
Share
0.11
This
0.11
Activations Density 0.781%