INDEX
Explanations
contractions with negations
negations or expressions of inability or refusal
New Auto-Interp
Negative Logits
newcom
-0.76
anwhile
-0.74
princ
-0.72
gobl
-0.71
satell
-0.65
referen
-0.62
populated
-0.61
Powered
-0.61
revolving
-0.61
juven
-0.60
POSITIVE LOGITS
't
1.48
´
0.82
âĶĢâĶĢâĶĢâĶĢ
0.82
itely
0.82
n
0.81
ÃŃ
0.81
Õ
0.79
probably
0.79
âĤ¬
0.78
bare
0.78
Activations Density 0.032%