INDEX
Explanations
punctuation and formatting elements in the text
New Auto-Interp
Negative Logits
оÑĢд
-0.19
Truy
-0.16
etrofit
-0.15
edor
-0.15
ithub
-0.14
.mj
-0.14
ioxid
-0.14
ÑĢиз
-0.14
imer
-0.13
viders
-0.13
POSITIVE LOGITS
eam
0.16
lean
0.15
eya
0.15
imo
0.14
�
0.14
Rifle
0.14
ä¹İ
0.14
_AUX
0.14
TURE
0.13
rå
0.13
Activations Density 0.006%