INDEX
Explanations
phrases related to criticism or judgment, especially focused on labeling things as "stupid."
references to the concept of stupidity
New Auto-Interp
Negative Logits
AUT
-0.80
riott
-0.77
APH
-0.76
ILA
-0.73
apers
-0.73
accompan
-0.73
Reviewed
-0.72
orthy
-0.71
chwitz
-0.69
avez
-0.67
POSITIVE LOGITS
founded
1.09
found
0.96
nesses
0.92
ness
0.86
est
0.86
ly
0.84
fuck
0.83
itude
0.83
shit
0.81
asses
0.80
Activations Density 0.047%