INDEX
Explanations
expressions indicating desire or necessity
New Auto-Interp
Negative Logits
LET
-0.77
updated
-0.76
BLIC
-0.71
icum
-0.71
Bridge
-0.70
olicited
-0.68
Edited
-0.66
ENTION
-0.64
EDIT
-0.64
ELS
-0.64
POSITIVE LOGITS
yourself
1.32
yourselves
1.16
somebody
0.83
your
0.82
guys
0.81
kidding
0.74
oneself
0.74
extremes
0.69
Yourself
0.67
pudding
0.67
Activations Density 0.339%