INDEX
Explanations
positive sentiments associated with the concept of "good."
New Auto-Interp
Negative Logits
ean
-0.17
attern
-0.16
/Dk
-0.15
laz
-0.14
chers
-0.14
ĵåIJį
-0.14
ynchronously
-0.14
ltk
-0.14
atoire
-0.14
Equality
-0.14
POSITIVE LOGITS
intentions
0.27
deeds
0.24
intention
0.24
fortune
0.24
intent
0.23
Intent
0.23
Samar
0.22
works
0.21
citizenship
0.21
reads
0.20
Activations Density 0.055%