INDEX
Explanations
comparative phrases and terms related to performance or quality
New Auto-Interp
Negative Logits
efs
-0.20
.effects
-0.18
änger
-0.15
аÑĤки
-0.15
bih
-0.15
achen
-0.14
reff
-0.14
_GF
-0.14
ellung
-0.14
кÑĥлÑĮ
-0.14
POSITIVE LOGITS
fare
0.32
Fare
0.28
performance
0.26
fares
0.25
better
0.24
performances
0.24
fare
0.23
well
0.23
worse
0.22
performance
0.22
Activations Density 0.093%