INDEX
Explanations
phrases related to political or official positions
references to bureaucratic or administrative contexts involving underlying issues or systemic criticisms
New Auto-Interp
Negative Logits
aceutical
-0.72
Gaul
-0.71
jri
-0.67
agher
-0.66
chat
-0.65
Flip
-0.61
arij
-0.61
raltar
-0.61
ophon
-0.61
ichael
-0.60
POSITIVE LOGITS
ãģĨ
0.76
EGIN
0.76
xual
0.74
ATCH
0.73
uary
0.73
OLD
0.72
riage
0.70
LECT
0.70
FontSize
0.69
IAL
0.68
Activations Density 0.058%