INDEX
Explanations
negations and questions about preferences or intentions
New Auto-Interp
Negative Logits
.central
-0.14
Gatt
-0.14
GV
-0.14
коÑģÑĤ
-0.14
.scalablytyped
-0.14
canf
-0.14
atis
-0.14
uke
-0.14
ove
-0.14
inati
-0.14
POSITIVE LOGITS
wouldn
0.23
want
0.20
mind
0.20
dream
0.20
rather
0.19
mind
0.17
trade
0.17
nt
0.17
Dream
0.17
梦
0.17
Activations Density 0.069%