INDEX
Explanations
occurrences of specific non-standard characters or symbols
New Auto-Interp
Negative Logits
bbe
-0.15
ietet
-0.14
ifi
-0.14
cap
-0.14
alice
-0.14
RAW
-0.14
onta
-0.14
ourt
-0.14
roman
-0.14
ytt
-0.13
POSITIVE LOGITS
ÏĮÏĤ
0.15
Semantic
0.15
enticator
0.14
peater
0.14
itar
0.14
åŃ
0.14
ÑĤий
0.14
nid
0.14
ictory
0.14
phere
0.14
Activations Density 0.004%