INDEX
Explanations
weak, inept, or negative traits
New Auto-Interp
Negative Logits
robuste
0.48
robustness
0.46
艰
0.43
असामान्य
0.43
艱
0.40
Robust
0.39
defy
0.39
unheard
0.39
شہریوں
0.38
Robust
0.37
POSITIVE LOGITS
incompetent
0.98
inept
0.90
opportunistic
0.88
greedy
0.87
opportun
0.85
manipulative
0.82
scheming
0.81
incompet
0.80
clueless
0.80
unreliable
0.79
Activations Density 0.068%