INDEX
Explanations
words related to grotesque or horrifying imagery
terms related to extreme or violent situations
New Auto-Interp
Negative Logits
BALL
-0.76
horn
-0.72
ACP
-0.72
WARE
-0.68
MQ
-0.66
roads
-0.66
fields
-0.64
Logged
-0.64
STD
-0.62
wine
-0.62
POSITIVE LOGITS
ities
1.25
ity
1.24
itous
1.18
acies
1.08
inals
1.08
als
1.00
itors
0.98
acy
0.97
ians
0.97
agi
0.96
Activations Density 0.035%