INDEX
Explanations
comparisons or evaluations followed by a positive or negative sentiment
phrases that assert a state or identity
New Auto-Interp
Negative Logits
itches
-0.70
ffe
-0.70
actory
-0.65
atre
-0.64
ees
-0.62
ievers
-0.61
gg
-0.61
lex
-0.60
Gir
-0.60
agon
-0.60
POSITIVE LOGITS
admittedly
1.02
comprised
0.94
basically
0.92
supposed
0.91
essentially
0.90
meant
0.89
unlikely
0.88
undoubtedly
0.87
presumably
0.86
obviously
0.86
Activations Density 0.164%