INDEX
Explanations
mentions of furniture, specifically couches and sofas
references to couches and sofas
New Auto-Interp
Negative Logits
uality
-0.71
utenberg
-0.68
Abel
-0.67
Morse
-0.66
hypers
-0.66
REDACTED
-0.65
heny
-0.63
Ames
-0.62
Uz
-0.62
eters
-0.62
POSITIVE LOGITS
cush
1.25
cushion
1.11
sofa
0.95
washer
0.95
tops
0.93
mattress
0.93
estinal
0.85
couch
0.85
visor
0.85
chair
0.85
Activations Density 0.027%