INDEX
Explanations
the word "No" followed by a high activation value suggesting a negative sentiment or disagreement
the phrase "No" indicating negative assertions or dismissals
New Auto-Interp
Negative Logits
iership
-0.79
RAFT
-0.69
rn
-0.68
rog
-0.67
ript
-0.66
leaning
-0.66
ALE
-0.64
ijuana
-0.62
ËĪ
-0.61
tnc
-0.60
POSITIVE LOGITS
kidding
1.12
etheless
1.08
matter
1.03
wonder
0.97
doubt
0.97
longer
0.96
xious
0.90
zzle
0.83
sooner
0.83
except
0.82
Activations Density 0.077%