INDEX
Explanations
instances of the word 'no' used strongly in a negative context or to negate an action
phrases indicating negation or absence
New Auto-Interp
Negative Logits
RAFT
-0.84
mosp
-0.65
romy
-0.63
inese
-0.58
jet
-0.58
WATCHED
-0.57
chev
-0.57
aleb
-0.56
ioch
-0.56
çͰ
-0.56
POSITIVE LOGITS
xious
1.26
longer
1.10
matter
0.91
ct
0.87
doubt
0.86
except
0.85
obs
0.84
indication
0.84
consolation
0.79
otrop
0.78
Activations Density 0.038%