INDEX
Negative Logits
norms
1.09
ambiguity
1.05
enforce
1.02
公正
1.02
फ्टी
0.99
disagreement
0.98
ambiguities
0.98
entitlement
0.98
MDL
0.98
engagement
0.97
POSITIVE LOGITS
grande
1.13
media
1.08
orche
1.05
Media
1.04
ust
1.03
ret
1.02
ándose
1.00
ult
0.99
woods
0.98
irk
0.98
Activations Density 0.014%