INDEX
Explanations
words related to household appliances
references to different types of appliances and their features
New Auto-Interp
Negative Logits
lot
-0.77
bos
-0.77
ffiti
-0.77
pac
-0.75
tering
-0.72
nda
-0.71
lock
-0.70
Pad
-0.70
maps
-0.69
Po
-0.69
POSITIVE LOGITS
appliance
0.91
Appl
0.82
itness
0.74
appliances
0.73
simultane
0.70
^^^^
0.68
uous
0.67
rador
0.67
^^
0.67
transformer
0.66
Activations Density 0.035%