INDEX
Explanations
special characters and punctuation in text
New Auto-Interp
Negative Logits
ufe
-0.18
nr
-0.17
ighton
-0.16
Ïħν
-0.15
757
-0.15
nun
-0.14
óg
-0.14
ummies
-0.13
cy
-0.13
Norris
-0.13
POSITIVE LOGITS
WXYZ
0.14
auer
0.14
utsch
0.14
anytime
0.14
ktion
0.13
li
0.13
relude
0.13
Bilg
0.13
ιαν
0.13
tran
0.13
Activations Density 0.118%