INDEX
Explanations
names and titles associated with people and organizations in cultural and political contexts
New Auto-Interp
Negative Logits
ittle
-0.26
ORA
-0.17
uur
-0.15
flen
-0.15
inish
-0.15
ç£
-0.15
plode
-0.14
å·
-0.14
ichick
-0.14
glm
-0.14
POSITIVE LOGITS
co
0.16
ign
0.15
ini
0.15
vo
0.15
ela
0.15
affair
0.15
ola
0.14
beck
0.14
à¹ģà¸ķ
0.14
ercial
0.13
Activations Density 0.297%