INDEX
Explanations
references to key figures and concepts related to political and social issues
New Auto-Interp
Negative Logits
ſever
-0.59
ſou
-0.58
fallu
-0.57
ſta
-0.57
fubject
-0.56
myſelf
-0.56
iſter
-0.56
ſol
-0.54
tranſ
-0.54
the
-0.54
POSITIVE LOGITS
es
0.84
ses
0.80
ES
0.63
ness
0.62
ess
0.60
hes
0.52
sed
0.48
nes
0.47
NESS
0.47
sing
0.47
Activations Density 1.342%