INDEX
Explanations
references to political ideologies and socio-political promises within narratives
New Auto-Interp
Negative Logits
ardes
-0.59
né
-0.51
anal
-0.46
affects
-0.45
場合があります
-0.43
y
-0.42
тена
-0.42
のですか
-0.42
Mary
-0.41
estre
-0.41
POSITIVE LOGITS
Hochspringen
0.78
SequentialGroup
0.77
myſelf
0.73
pleaſure
0.72
hoped
0.71
Theſe
0.69
$_"
0.69
Biôgrafia
0.68
raiſ
0.67
ſever
0.65
Activations Density 0.408%