INDEX
Explanations
negations and conditional phrases indicating limitations or restrictions in a context
New Auto-Interp
Negative Logits
tro
-0.22
tro
-0.21
é«ĺéĢŁ
-0.17
tl
-0.15
pole
-0.15
edin
-0.15
rich
-0.15
Tro
-0.15
def
-0.14
رخ
-0.14
POSITIVE LOGITS
acceptance
0.18
Settlement
0.18
Start
0.16
Accept
0.16
opup
0.16
accepted
0.16
accepted
0.16
rell
0.15
accept
0.15
oldt
0.15
Activations Density 0.030%