INDEX
Explanations
ambiguous words or phrases that can be open to interpretation
negations and phrases indicating what should or should not happen
New Auto-Interp
Negative Logits
Corona
-0.70
Wid
-0.70
satisfactory
-0.63
restores
-0.61
Niet
-0.60
clears
-0.60
restoration
-0.59
Fug
-0.59
enhancements
-0.59
Xuan
-0.57
POSITIVE LOGITS
be
0.80
underestimate
0.80
bother
0.76
underestimated
0.75
blindly
0.75
interfere
0.75
erest
0.74
oga
0.74
reated
0.74
hesitate
0.73
Activations Density 0.116%