INDEX
Explanations
proper nouns or names that end with "-amer"
references to American identity or context
New Auto-Interp
Negative Logits
ŃĶ
-0.80
ĵĺ
-0.73
erest
-0.68
fax
-0.65
margin
-0.64
pressed
-0.63
ird
-0.62
WAR
-0.61
fermentation
-0.61
ģ«
-0.61
POSITIVE LOGITS
icans
1.26
ican
1.24
ICAN
1.19
gdala
1.03
icity
0.94
icas
0.93
ica
0.92
gency
0.90
ilial
0.89
agic
0.80
Activations Density 0.012%