INDEX
Explanations
elements related to power dynamics and authority
New Auto-Interp
Negative Logits
loh
-0.16
olina
-0.16
аÑĢан
-0.16
lan
-0.15
Nib
-0.15
brill
-0.15
Fcn
-0.15
Wahl
-0.15
171
-0.14
731
-0.14
POSITIVE LOGITS
Å
0.18
wy
0.18
ificacion
0.17
sond
0.17
bih
0.17
Å©
0.15
vail
0.15
rych
0.15
adow
0.15
Estr
0.15
Activations Density 0.014%