INDEX
Explanations
names of political figures and locations
mentions of government officials or authorities
New Auto-Interp
Negative Logits
french
-0.77
vacuum
-0.73
Europe
-0.71
comparable
-0.71
colder
-0.71
Cologne
-0.71
femin
-0.69
neoc
-0.69
cheese
-0.68
chees
-0.68
POSITIVE LOGITS
wana
1.59
uku
1.47
onga
1.42
ulu
1.42
atu
1.41
yip
1.41
angan
1.40
amba
1.34
anu
1.32
arat
1.30
Activations Density 0.367%