INDEX
Explanations
occurrences of special characters or symbols
New Auto-Interp
Negative Logits
corner
-0.17
wed
-0.17
ling
-0.17
274
-0.16
ochen
-0.15
567
-0.15
cher
-0.15
Holden
-0.15
ose
-0.15
opa
-0.14
POSITIVE LOGITS
Ń
0.30
820
0.20
Ī
0.19
¬
0.17
®
0.17
823
0.17
Ĥæķ°
0.16
¯u
0.16
bert
0.16
ĥn
0.16
Activations Density 0.002%