INDEX
Explanations
positively or negatively charged adjectives or phrases indicating approval or disapproval
expressions related to the concepts of good and better, as well as phrases indicating a moral evaluation
New Auto-Interp
Negative Logits
ned
-0.57
gall
-0.56
Glac
-0.56
Notting
-0.55
Clancy
-0.53
pat
-0.53
egu
-0.52
Guam
-0.52
Rolls
-0.52
Tornado
-0.51
POSITIVE LOGITS
sake
1.62
purposes
1.35
reasons
1.21
ummies
1.20
reason
0.94
Reasons
0.90
aughs
0.87
purpose
0.87
ages
0.84
instance
0.80
Activations Density 0.187%