INDEX
Explanations
negative portrayals of human behavior and judgment
New Auto-Interp
Negative Logits
Liberties
-0.17
znik
-0.17
ứa
-0.16
okus
-0.15
leen
-0.15
ederland
-0.14
åŀ
-0.14
InnerText
-0.14
unky
-0.14
ishlist
-0.14
POSITIVE LOGITS
intelligence
0.35
judgment
0.34
wisdom
0.33
judgement
0.31
logic
0.30
æĻº
0.29
rational
0.28
commons
0.28
brains
0.28
brain
0.27
Activations Density 0.300%