INDEX
Explanations
the presence of various numerical representations and symbols in the text
New Auto-Interp
Negative Logits
baugh
-0.17
okable
-0.16
ographer
-0.15
xee
-0.15
NST
-0.14
ÑģÑĤÑĢÑĥ
-0.14
icontrol
-0.14
ÑĢажд
-0.14
_UNUSED
-0.13
xm
-0.13
POSITIVE LOGITS
181
0.28
178
0.27
179
0.27
186
0.25
184
0.25
183
0.24
177
0.23
191
0.23
190
0.23
182
0.23
Activations Density 0.085%