INDEX
Explanations
mentions of government officials, specifically Prime Ministers
New Auto-Interp
Negative Logits
olu
-0.17
tsky
-0.16
mdi
-0.16
dde
-0.15
UNUSED
-0.14
urette
-0.14
ì
-0.14
.httpClient
-0.14
елиÑĩ
-0.14
IRECTION
-0.14
POSITIVE LOGITS
yal
0.15
hift
0.15
lava
0.15
Schiff
0.15
Äįek
0.14
cy
0.14
ÏĦιÏĥ
0.14
ERV
0.14
roke
0.14
Reputation
0.14
Activations Density 0.005%