INDEX
Explanations
words related to surprise or contradiction
the word "actually" to emphasize certainty or reality
New Auto-Interp
Negative Logits
lain
-0.79
wich
-0.79
cit
-0.76
fu
-0.68
bye
-0.68
heid
-0.66
newsletters
-0.66
illed
-0.65
tailed
-0.65
nan
-0.61
POSITIVE LOGITS
metic
0.79
netflix
0.77
comprom
0.76
bothering
0.74
meant
0.73
ional
0.72
WRITE
0.71
okia
0.70
amn
0.70
REALLY
0.69
Activations Density 0.026%