INDEX
Explanations
words related to wheels or wheel-related activities
repeated references to wheels and related terms indicating mechanical features
New Auto-Interp
Negative Logits
uates
-0.79
uated
-0.77
ABE
-0.76
ropolitan
-0.75
orescent
-0.70
raviolet
-0.70
Grounds
-0.67
ocado
-0.67
Suc
-0.67
Sunshine
-0.67
POSITIVE LOGITS
chairs
1.34
chair
1.25
wright
1.18
wheel
1.11
base
0.96
wash
0.92
house
0.89
bar
0.86
horn
0.85
Wheel
0.84
Activations Density 0.025%