INDEX
Explanations
words related to criticism or negativity
instances of the word "stupid" in various contexts
New Auto-Interp
Negative Logits
AUT
-0.95
APH
-0.80
largeDownload
-0.77
cussion
-0.71
soType
-0.70
aver
-0.70
Reviewed
-0.67
akings
-0.67
apers
-0.67
ILA
-0.66
POSITIVE LOGITS
nesses
0.95
stupid
0.84
ishly
0.83
ly
0.83
ulously
0.81
itude
0.80
enough
0.79
silly
0.77
gery
0.76
liest
0.75
Activations Density 0.014%