INDEX
Explanations
phrases related to desires or intentions
phrases indicating a desire or motivation to take action
New Auto-Interp
Negative Logits
gart
-0.71
laughter
-0.64
ONES
-0.62
LAN
-0.61
summary
-0.60
henko
-0.58
Manufacturer
-0.57
memos
-0.56
Juliet
-0.54
pause
-0.54
POSITIVE LOGITS
themselves
0.97
selves
0.80
cov
0.73
careers
0.71
otherwise
0.71
footing
0.69
oppressed
0.69
menstru
0.67
warr
0.66
gyn
0.66
Activations Density 0.457%