INDEX
Explanations
conditional phrases indicating desire or intent
New Auto-Interp
Negative Logits
isset
-0.18
rud
-0.17
gles
-0.15
would
-0.15
would
-0.15
gle
-0.14
ái
-0.14
surely
-0.14
asio
-0.14
qv
-0.14
POSITIVE LOGITS
prefer
0.22
rather
0.21
likes
0.21
prefers
0.20
Rather
0.19
rather
0.18
LIK
0.17
Prefer
0.17
Rather
0.17
prefer
0.16
Activations Density 0.043%