INDEX
Explanations
references to furniture, specifically couches and sofas
references to couches or sofas
New Auto-Interp
Negative Logits
utenberg
-0.74
hypers
-0.71
Resistance
-0.68
iferation
-0.66
uality
-0.65
abad
-0.64
iyah
-0.63
cease
-0.62
Uz
-0.61
icity
-0.60
POSITIVE LOGITS
cush
1.22
washer
1.12
tops
1.07
cushion
1.06
chair
0.94
sofa
0.91
couch
0.90
potato
0.88
stairs
0.88
chairs
0.84
Activations Density 0.017%