INDEX
Explanations
words related to strong negative expressions or criticism
negative evaluations or criticisms
New Auto-Interp
Negative Logits
uclear
-0.76
isations
-0.70
ouf
-0.67
teness
-0.66
utherford
-0.66
Lago
-0.65
ATIONS
-0.65
ATIONAL
-0.64
isation
-0.63
nm
-0.62
POSITIVE LOGITS
dick
1.05
sylvania
0.95
asses
0.95
suck
0.93
eries
0.89
shit
0.84
hots
0.83
bowl
0.83
driver
0.82
loads
0.81
Activations Density 0.013%