INDEX
Explanations
references to political figures and their relationships
New Auto-Interp
Negative Logits
SError
-0.16
\Bridge
-0.15
iona
-0.15
XObject
-0.14
Tar
-0.14
iggers
-0.14
oger
-0.14
zdrav
-0.14
seen
-0.14
tery
-0.14
POSITIVE LOGITS
dro
0.29
var
0.28
blev
0.26
fick
0.25
tog
0.25
tog
0.24
sat
0.23
dog
0.20
val
0.20
slog
0.20
Activations Density 0.030%