INDEX
Explanations
punctuation and question marks in the text
New Auto-Interp
Negative Logits
ordion
-0.18
/tos
-0.17
onymous
-0.16
ìłĪ
-0.15
ATAB
-0.15
zier
-0.15
_Meta
-0.15
ajo
-0.15
oggle
-0.14
lsru
-0.14
POSITIVE LOGITS
Knox
0.16
Solo
0.15
ificate
0.14
acy
0.14
atti
0.14
Chest
0.14
drib
0.13
ä¾
0.13
602
0.13
avar
0.13
Activations Density 0.005%