INDEX
Explanations
phrases indicating undermining or weakening something
phrases that undermine authority or credibility
New Auto-Interp
Negative Logits
————
-0.79
saw
-0.68
replace
-0.67
followed
-0.67
gat
-0.66
differs
-0.65
avoids
-0.63
bg
-0.63
tackle
-0.62
aido
-0.62
POSITIVE LOGITS
entirety
1.29
entire
1.17
seriousness
1.15
effectiveness
1.13
possibility
1.12
notion
1.12
usefulness
1.07
ability
1.06
validity
1.05
credibility
1.04
Activations Density 0.265%