INDEX
Negative Logits
outcomes
-0.08
outcome
-0.07
recharge
-0.07
speaks
-0.06
soldiers
-0.06
succeed
-0.06
Outcome
-0.06
aktu
-0.06
Companies
-0.06
Comp
-0.06
POSITIVE LOGITS
_STANDARD
0.07
Utilities
0.07
fell
0.07
DISCLAIMS
0.06
_execute
0.06
بیشتری
0.06
eylem
0.06
Hover
0.06
еним
0.06
Vampire
0.06
Activations Density 0.150%