INDEX
Explanations
words related to directional indications like "right" or "left"
references to the concept of "right" in various contexts
New Auto-Interp
Negative Logits
aton
-0.85
anned
-0.82
eson
-0.78
angan
-0.72
odium
-0.69
Clar
-0.68
atically
-0.68
atis
-0.67
rolet
-0.66
ensable
-0.65
POSITIVE LOGITS
wing
1.13
flank
1.00
wing
0.94
eye
0.89
most
0.89
ward
0.86
hand
0.84
hemisphere
0.84
side
0.84
eous
0.82
Activations Density 0.037%