INDEX
Explanations
references to political events and policies
New Auto-Interp
Negative Logits
Joined
-0.68
allah
-0.57
ohan
-0.56
emp
-0.55
autions
-0.54
veland
-0.54
arten
-0.53
ventures
-0.52
hari
-0.52
obe
-0.51
POSITIVE LOGITS
aforementioned
0.78
latter
0.74
dreaded
0.69
latest
0.59
remainder
0.59
largest
0.58
sexes
0.57
entire
0.56
nation
0.56
elusive
0.55
Activations Density 8.215%