INDEX
Explanations
negative constructs or words indicating lack and avoidance
New Auto-Interp
Negative Logits
Deal
-0.15
mey
-0.14
Ell
-0.14
kate
-0.14
borough
-0.14
irl
-0.14
.grp
-0.14
phin
-0.13
irim
-0.13
andest
-0.13
POSITIVE LOGITS
ADVERTISEMENT
0.15
ÙİÙĤ
0.15
Jaw
0.15
/format
0.15
ruce
0.15
imes
0.15
ner
0.14
ovaly
0.14
Wert
0.14
ILON
0.14
Activations Density 0.000%