INDEX
Explanations
terms that indicate rejection or disapproval
New Auto-Interp
Negative Logits
الإنجليزية
-0.76
ố
-0.66
Din
-0.65
Hv
-0.63
двига
-0.61
Baile
-0.60
Duff
-0.60
oblig
-0.58
huff
-0.58
Dien
-0.57
POSITIVE LOGITS
reject
1.38
Rejection
1.33
Reject
1.32
rejection
1.28
Rejection
1.20
rejected
1.19
rejects
1.16
rejected
1.16
Reject
1.14
rejecting
1.13
Activations Density 0.012%