INDEX
Explanations
phrases related to selfishness
references to selfishness and self-centered behavior
New Auto-Interp
Negative Logits
Downloadha
-0.93
gdala
-0.88
enez
-0.77
apter
-0.77
APH
-0.75
代
-0.74
ngth
-0.71
phies
-0.71
Genie
-0.71
Gaga
-0.70
POSITIVE LOGITS
ly
1.21
ness
1.09
nesses
1.04
fold
0.85
liness
0.84
comings
0.82
esse
0.82
wich
0.81
sum
0.80
ridge
0.79
Activations Density 0.024%