INDEX
Explanations
negations and situations with negative/potentially harmful connotations
negations and expressions of inability or lack
New Auto-Interp
Negative Logits
sis
-0.72
Invalid
-0.70
wording
-0.69
tein
-0.68
inning
-0.65
Reasons
-0.65
gar
-0.63
tons
-0.63
Cancel
-0.62
omen
-0.62
POSITIVE LOGITS
necessarily
0.99
ever
0.98
bothered
0.95
bother
0.95
any
0.94
anyone
0.93
anything
0.91
anybody
0.90
overtly
0.89
remotely
0.87
Activations Density 0.424%