INDEX
Explanations
religious or mystical symbols
occurrences of the character "ŀ"
New Auto-Interp
Negative Logits
ufact
-0.69
abwe
-0.69
pton
-0.68
Spit
-0.64
Spice
-0.63
pher
-0.62
trainers
-0.61
Downloadha
-0.61
disenfranch
-0.60
manag
-0.60
POSITIVE LOGITS
ŀ
1.16
Ĺ
1.03
·
0.96
ĵ
0.95
ļ
0.94
ĺ
0.92
IJ
0.91
¬
0.91
³
0.90
ł
0.89
Activations Density 0.006%