INDEX
Explanations
references to helmets
references to helmets
New Auto-Interp
Negative Logits
ween
-0.69
tery
-0.68
VD
-0.68
quart
-0.68
atoes
-0.67
uality
-0.67
agents
-0.67
Roosevelt
-0.66
hower
-0.66
aceae
-0.66
POSITIVE LOGITS
helmets
1.08
helmet
1.04
worn
1.02
goggles
0.88
wearer
0.87
Helmet
0.84
equipped
0.80
adorned
0.79
mask
0.76
masks
0.75
Activations Density 0.024%