INDEX
Explanations
words related to negative impact or harm
expressions related to causing harm or damage
New Auto-Interp
Negative Logits
mad
-0.72
uls
-0.71
dding
-0.67
uesday
-0.66
ãĥ¼ãĥ³
-0.66
igers
-0.66
ricks
-0.65
ellen
-0.65
odor
-0.63
leans
-0.63
POSITIVE LOGITS
havoc
1.11
credibility
1.02
delicate
0.96
morale
0.94
livelihood
0.90
sensibilities
0.90
friendships
0.88
integrity
0.88
innocent
0.87
morals
0.85
Activations Density 0.229%