INDEX
Explanations
negative statements or contradictions, where the latter part of the sentence contradicts the earlier part
negations and expressions of impossibility or inadequacy
New Auto-Interp
Negative Logits
only
-0.89
merely
-0.82
doubtless
-0.73
PLUS
-0.66
chiefly
-0.66
unintentionally
-0.65
rimination
-0.64
alternatively
-0.64
inadvertently
-0.64
falsely
-0.63
POSITIVE LOGITS
darn
0.77
fuckin
0.76
wow
0.74
enough
0.70
iability
0.67
hin
0.67
enough
0.66
íķ
0.65
Õ
0.64
fucking
0.63
Activations Density 0.068%