INDEX
Explanations
negative adjectives and insults
derogatory terms aimed at individuals and their characteristics
New Auto-Interp
Negative Logits
iscover
-0.71
ideally
-0.69
outdoors
-0.68
Enc
-0.67
incorporating
-0.67
arger
-0.65
incorporate
-0.65
Fold
-0.63
consulted
-0.63
conserv
-0.63
POSITIVE LOGITS
pathetic
2.20
worthless
1.89
idiots
1.89
stupidity
1.89
hypocritical
1.86
meaningless
1.83
bullshit
1.83
laughable
1.82
pointless
1.81
shitty
1.79
Activations Density 0.090%