INDEX
Explanations
specific items in a list
colons followed by lists or items
New Auto-Interp
Negative Logits
gow
-0.65
orate
-0.64
oland
-0.63
ashington
-0.62
pty
-0.61
veland
-0.59
agon
-0.58
iliate
-0.58
ritten
-0.58
hed
-0.57
POSITIVE LOGITS
<|endoftext|>
1.15
↵
1.14
âĹı
1.08
âĢ¢
1.04
↵↵
1.03
·
0.99
Firstly
0.93
↵Âł
0.90
âĢ¢
0.87
âĹı
0.86
Activations Density 0.120%