INDEX
Explanations
words related to actions or situations that involve undermining something
mentions of actions that threaten or weaken authority or integrity
New Auto-Interp
Negative Logits
gran
-0.79
enne
-0.71
sa
-0.68
gone
-0.67
ones
-0.67
eb
-0.66
NetMessage
-0.65
Flo
-0.64
iser
-0.63
area
-0.63
POSITIVE LOGITS
undermin
1.07
undermine
0.98
guiActiveUn
0.96
undermining
0.89
undermines
0.87
undermined
0.80
xual
0.77
undercut
0.76
havoc
0.73
incent
0.72
Activations Density 0.018%