INDEX
Explanations
mentions of countries and political figures
specific letter combinations, particularly those resembling the character "â" and its variants
New Auto-Interp
Negative Logits
accessory
-0.71
ifications
-0.66
aimon
-0.66
giveaway
-0.66
NEC
-0.65
implant
-0.65
ellipt
-0.65
isode
-0.63
commissions
-0.63
scatter
-0.63
POSITIVE LOGITS
¬
1.36
£
1.15
ħ
1.13
Ń
1.12
²
1.10
ı
1.10
Ĵ
1.10
ª
1.08
º
1.05
¹
1.05
Activations Density 0.165%