INDEX
Explanations
occurrences of formatting and stylistic elements in text
New Auto-Interp
Negative Logits
anut
-0.17
_AES
-0.15
lien
-0.14
anship
-0.14
ublik
-0.14
uell
-0.14
undi
-0.14
ä»ĺ
-0.14
ennes
-0.14
ppe
-0.14
POSITIVE LOGITS
IJ
0.16
Ùħر
0.15
290
0.14
аÑĢод
0.14
itan
0.14
ικο
0.14
_bar
0.14
izer
0.13
onen
0.13
ordial
0.13
Activations Density 0.010%