INDEX
Explanations
expressions of strong personal feelings or preferences
New Auto-Interp
Negative Logits
really
-0.16
very
-0.15
说
-0.15
Say
-0.15
quite
-0.14
Colleg
-0.14
ndx
-0.14
aed
-0.14
IPS
-0.14
saying
-0.14
POSITIVE LOGITS
dig
0.20
enjoyed
0.17
luck
0.17
digging
0.16
üz
0.16
asel
0.16
dig
0.15
μι
0.15
connect
0.15
eshire
0.15
Activations Density 0.046%