INDEX
Explanations
expressions of reluctance or avoidance
New Auto-Interp
Negative Logits
ament
-0.17
baugh
-0.15
ente
-0.14
eam
-0.14
ENTE
-0.14
meli
-0.14
umpt
-0.14
outs
-0.14
_NM
-0.14
Leisure
-0.14
POSITIVE LOGITS
spared
0.19
ngại
0.19
å¬
0.17
çľ
0.16
sugar
0.16
sparing
0.16
minced
0.15
çľ
0.15
ispers
0.15
punches
0.15
Activations Density 0.059%