INDEX
Explanations
mentions of government officials or political figures
New Auto-Interp
Negative Logits
ازÛĮ
-0.16
Marathon
-0.15
ERİ
-0.14
ynchronous
-0.14
ERY
-0.14
rière
-0.14
uft
-0.14
dden
-0.14
ards
-0.14
ictor
-0.14
POSITIVE LOGITS
iors
0.30
egal
0.30
eca
0.28
pai
0.23
iores
0.22
ior
0.22
ile
0.21
IOR
0.20
esch
0.19
Sen
0.19
Activations Density 0.009%