INDEX
Explanations
negative statements containing the word "not"
negations or denials of existence or relevance
New Auto-Interp
Negative Logits
Vers
-0.68
Unfortunately
-0.62
Unfortunately
-0.59
Less
-0.59
Sadly
-0.58
Tor
-0.55
utt
-0.55
essim
-0.55
Less
-0.54
aughs
-0.53
POSITIVE LOGITS
nor
0.90
even
0.90
anywhere
0.88
anybody
0.85
whatsoever
0.82
remotely
0.81
anything
0.80
ever
0.78
anyone
0.78
anything
0.77
Activations Density 0.358%