INDEX
Explanations
text sequences containing special characters followed by numbers
the presence of a specific character or symbol
New Auto-Interp
Negative Logits
decomp
-0.82
interf
-0.79
mathemat
-0.75
fortun
-0.71
Downloadha
-0.71
Palest
-0.68
horizont
-0.68
photoc
-0.67
transc
-0.66
Osw
-0.66
POSITIVE LOGITS
¬
1.45
Ļ
1.36
£
1.24
ª
1.23
į
1.21
ĸ
1.21
Ī
1.21
ı
1.20
ĵ
1.20
ħ
1.19
Activations Density 0.382%