INDEX
Explanations
names of political figures
completely empty sections or gaps in the text
New Auto-Interp
Negative Logits
iology
-0.91
ivalent
-0.85
icans
-0.85
ysis
-0.81
iguous
-0.76
ienne
-0.76
ulous
-0.76
ican
-0.76
vier
-0.76
isers
-0.75
POSITIVE LOGITS
pling
0.86
sqor
0.77
cha
0.73
oway
0.72
PIN
0.70
plings
0.67
eca
0.66
urities
0.65
pixel
0.64
OTA
0.64
Activations Density 0.053%