INDEX
Explanations
phrases related to semantics and political discourse
New Auto-Interp
Negative Logits
alla
-0.15
chalk
-0.14
ilar
-0.13
ifton
-0.13
lik
-0.13
ione
-0.13
prior
-0.13
diam
-0.13
rances
-0.13
-0.13
POSITIVE LOGITS
Hell
0.19
roulette
0.18
hell
0.18
kab
0.16
equivalent
0.16
Gord
0.16
Equivalent
0.16
Kab
0.15
baise
0.15
etine
0.15
Activations Density 0.260%