INDEX
Explanations
words indicating a strong negative opinion or action
New Auto-Interp
Negative Logits
crow
-0.72
essed
-0.65
Pressure
-0.64
eman
-0.64
nings
-0.63
ammy
-0.61
hma
-0.60
ILY
-0.59
isson
-0.58
imir
-0.58
POSITIVE LOGITS
tering
0.80
ç¥ŀ
0.73
:(
0.71
guiActiveUn
0.71
ascript
0.71
ainer
0.69
heartedly
0.66
ingu
0.65
except
0.64
onne
0.63
Activations Density 0.024%