INDEX
Explanations
requests or expressions of interest
phrases expressing desire or conditional statements
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.64
Brill
-0.63
Claus
-0.58
Kiw
-0.57
juven
-0.56
Chal
-0.56
RAM
-0.55
Jarrett
-0.54
bound
-0.54
Timber
-0.54
POSITIVE LOGITS
prefer
1.47
rather
1.20
rather
1.19
gladly
1.16
dearly
1.15
like
1.10
love
1.04
LOVE
1.00
appreciate
0.98
ideally
0.96
Activations Density 0.167%