INDEX
Explanations
political figures and related events
references to political figures and their interactions
New Auto-Interp
Negative Logits
zee
-0.79
achine
-0.74
notation
-0.68
abwe
-0.67
zees
-0.67
pta
-0.66
earable
-0.63
haar
-0.63
entin
-0.62
BMC
-0.61
POSITIVE LOGITS
Lago
0.71
TAMADRA
0.69
fman
0.68
oglu
0.68
BTC
0.63
Os
0.62
counterparts
0.62
NK
0.61
Invalid
0.61
counterpart
0.61
Activations Density 0.581%