INDEX
Explanations
references to political events and figures
New Auto-Interp
Negative Logits
agna
-0.17
.sax
-0.15
_RB
-0.14
adow
-0.14
eton
-0.13
obar
-0.13
hed
-0.13
orro
-0.13
isses
-0.13
akra
-0.13
POSITIVE LOGITS
代
0.15
avern
0.15
mandates
0.15
distant
0.14
natural
0.14
mandate
0.14
/renderer
0.14
xic
0.14
βι
0.13
incip
0.13
Activations Density 0.034%