INDEX
Explanations
proper nouns
instances of the placeholder token and instances of the word "THE"
New Auto-Interp
Negative Logits
stood
-0.72
Bulg
-0.69
Schwar
-0.67
lets
-0.65
gy
-0.65
sup
-0.65
ãĤ£
-0.65
Gong
-0.64
Murdoch
-0.63
tons
-0.61
POSITIVE LOGITS
BOOK
1.32
MAN
1.30
ERSON
1.27
VERS
1.27
ING
1.25
ION
1.23
FORE
1.22
FER
1.22
VER
1.22
IN
1.22
Activations Density 0.161%