INDEX
Explanations
references to specific geopolitical locations and their implications
New Auto-Interp
Negative Logits
ument
-0.15
pires
-0.15
vern
-0.14
heiten
-0.14
duk
-0.14
859
-0.13
lement
-0.13
sag
-0.13
dag
-0.13
834
-0.13
POSITIVE LOGITS
oned
0.16
extrem
0.14
osite
0.14
bung
0.14
env
0.14
igg
0.14
andes
0.14
Chronicles
0.14
ascript
0.13
-head
0.13
Activations Density 0.037%