INDEX
Explanations
the beginning of a document or section
ending with "self"
s Encyclopedia
New Auto-Interp
Negative Logits
(
-0.55
b
-0.53
ne
-0.53
k
-0.52
Hall
-0.51
g
-0.51
never
-0.51
tri
-0.51
for
-0.50
ro
-0.50
POSITIVE LOGITS
myſelf
1.09
AndEndTag
0.99
ViewImports
0.96
itſelf
0.95
pleaſure
0.92
crdi
0.90
ſeveral
0.85
purpoſe
0.85
themſelves
0.85
himſelf
0.84
Activations Density 0.032%