INDEX
Explanations
names of individuals or groups, possibly from news articles
occurrences of the special character Ļ in different contexts
New Auto-Interp
Negative Logits
disadvant
-1.07
vulner
-0.87
mathemat
-0.86
comprom
-0.86
princ
-0.84
incorpor
-0.81
agre
-0.77
constitu
-0.77
fundament
-0.75
raints
-0.75
POSITIVE LOGITS
ï¸ı
1.28
ï¸
1.07
女
0.93
à¥
0.92
æľ
0.86
½
0.84
ı
0.81
Ĩ
0.80
ãĤŃ
0.80
\":
0.79
Activations Density 0.330%