INDEX
Explanations
negative characterizations or judgments towards individuals or groups
Opinions or judgments
negative personal judgment
New Auto-Interp
Negative Logits
correctes
-0.74
Билгалдахарш
-0.71
StateList
-0.70
StreetMap
-0.68
AssemblyProduct
-0.66
-0.66
出版年
-0.65
ագրություններ
-0.62
surla
-0.61
">//
-0.61
POSITIVE LOGITS
selfish
0.92
selfish
0.73
bias
0.72
ego
0.71
hypocrisy
0.71
arrogant
0.70
lazy
0.70
vanity
0.70
selfishness
0.70
frivolous
0.69
Activations Density 0.459%