INDEX
Explanations
phrases that indicate minimal or insufficient quantities
New Auto-Interp
Negative Logits
umer
-0.17
somehow
-0.14
vil
-0.14
redirect
-0.13
pha
-0.13
ivate
-0.13
خر
-0.13
apesh
-0.13
azor
-0.13
ysa
-0.13
POSITIVE LOGITS
/no
0.29
else
0.25
chance
0.19
except
0.18
regard
0.17
est
0.17
progress
0.16
effort
0.15
-to
0.15
else
0.15
Activations Density 0.031%