INDEX
Explanations
verbs related to actions that have a negative impact or intent
activites or phrases suggesting subversion or damage to credibility or authority
New Auto-Interp
Negative Logits
area
-0.65
cise
-0.64
onna
-0.63
ëĭ
-0.63
sa
-0.61
lov
-0.61
Flo
-0.61
ather
-0.60
enne
-0.59
ann
-0.59
POSITIVE LOGITS
havoc
0.89
undermin
0.81
lessly
0.70
xual
0.70
undermining
0.68
amental
0.67
ments
0.66
livelihood
0.66
morale
0.64
Mn
0.64
Activations Density 0.044%