INDEX
Explanations
references to social dynamics and structures that affect power and control
New Auto-Interp
Negative Logits
instead
-0.15
inger
-0.15
awl
-0.15
raith
-0.14
Ỽt
-0.14
neau
-0.14
NAT
-0.14
jde
-0.14
aker
-0.14
ADATA
-0.13
POSITIVE LOGITS
physical
0.33
overt
0.32
Physical
0.30
physical
0.29
obvious
0.27
Physical
0.27
explicit
0.26
direct
0.26
explicitly
0.25
visible
0.24
Activations Density 0.333%