INDEX
Explanations
societal and economic actors' attributes
New Auto-Interp
Negative Logits
?!
0.16
étant
0.16
;
0.15
عوامی
0.14
="
0.14
是对
0.14
)=
0.13
,
0.13
=\
0.13
violating
0.13
POSITIVE LOGITS
anxieties
0.22
dynamism
0.21
attitudes
0.21
sensibilities
0.21
motivations
0.20
priorities
0.20
sentiment
0.20
discourse
0.19
ingenuity
0.19
idiosync
0.19
Activations Density 0.484%