INDEX
Explanations
phrases indicating changes or improvements in various contexts
New Auto-Interp
Negative Logits
heavier
-0.18
heavily
-0.16
harder
-0.16
worse
-0.16
ardash
-0.14
worsening
-0.14
heavy
-0.13
ÙĬÙĪÙĨ
-0.13
erdale
-0.13
agini
-0.13
POSITIVE LOGITS
sign
0.33
Sign
0.31
SIGN
0.31
sign
0.30
SIGN
0.26
stant
0.24
Sign
0.24
apprec
0.23
-sign
0.23
Dram
0.23
Activations Density 0.152%