INDEX
Explanations
negative perceptions about societal norms and behaviors
New Auto-Interp
Negative Logits
Билгалдахарш
-0.69
ValueStyle
-0.68
utafitiHapana
-0.59
хьтан
-0.57
gaver
-0.57
}}}}
-0.55
valsty
-0.54
militare
-0.54
TextAppearance
-0.53
swig
-0.53
POSITIVE LOGITS
people
0.64
saites
0.58
often
0.57
thinking
0.56
minds
0.53
كويكب
0.50
Ska
0.49
ople
0.49
folks
0.49
people
0.49
Activations Density 0.406%