INDEX
Explanations
phrases indicating comparisons or distinctions
New Auto-Interp
Negative Logits
iez
-0.16
hazi
-0.15
thừa
-0.14
indsight
-0.14
motion
-0.13
amel
-0.13
ronym
-0.13
asic
-0.13
ombat
-0.13
gest
-0.13
POSITIVE LOGITS
anymore
0.32
nor
0.25
necessarily
0.24
ani
0.17
ogany
0.17
agon
0.15
nor
0.15
Nor
0.15
usual
0.15
nip
0.14
Activations Density 0.045%