INDEX
Explanations
references to specific organizations and their associated power dynamics
New Auto-Interp
Negative Logits
fir
-0.15
quil
-0.15
ptune
-0.15
ringe
-0.14
ynos
-0.14
prung
-0.14
гоÑĢод
-0.14
bane
-0.14
341
-0.13
aliyet
-0.13
POSITIVE LOGITS
power
0.53
Power
0.52
Power
0.45
power
0.43
.Power
0.43
/power
0.42
(power
0.41
POWER
0.41
-power
0.40
_power
0.38
Activations Density 0.042%