INDEX
Explanations
words related to actions of undermining or discrediting
phrases related to attempts to weaken or sabotage
New Auto-Interp
Negative Logits
lov
-0.71
area
-0.67
ather
-0.66
andro
-0.65
Spoiler
-0.63
cise
-0.63
////////
-0.62
onna
-0.62
çĦ
-0.61
flo
-0.60
POSITIVE LOGITS
havoc
0.83
perceptions
0.76
expectations
0.73
undermin
0.73
morale
0.72
livelihood
0.70
undermining
0.68
attempts
0.67
ments
0.67
deterrence
0.67
Activations Density 0.045%