INDEX
Explanations
phrases indicating negative outcomes or hindrances
New Auto-Interp
Negative Logits
oler
-0.17
vron
-0.16
oldem
-0.15
lea
-0.15
RLF
-0.14
resher
-0.14
ENUM
-0.14
otes
-0.14
anca
-0.14
cole
-0.14
POSITIVE LOGITS
//{{0.18
umno
0.16
/conf
0.16
AGO
0.15
asic
0.14
akh
0.14
attempts
0.14
efforts
0.13
etti
0.13
inte
0.13
Activations Density 0.222%