INDEX
Explanations
words related to negative judgment of intelligence or actions
the word "stupid" and its variations in various contexts
New Auto-Interp
Negative Logits
AUT
-0.93
apers
-0.83
largeDownload
-0.83
APH
-0.82
accompan
-0.76
riott
-0.75
aver
-0.75
orthy
-0.74
rigan
-0.73
arnaev
-0.71
POSITIVE LOGITS
nesses
1.04
ly
0.95
ness
0.89
itude
0.81
gery
0.80
glers
0.77
stupid
0.77
founded
0.77
ged
0.71
shit
0.71
Activations Density 0.035%