INDEX
Explanations
words related to stairs and stairwells
references to stairs and staircases
New Auto-Interp
Negative Logits
eer
-0.80
istic
-0.77
ust
-0.73
nesota
-0.72
Samoa
-0.72
itarian
-0.72
uala
-0.70
emo
-0.70
nar
-0.70
roid
-0.68
POSITIVE LOGITS
stairs
1.18
stairs
1.16
stair
1.01
staircase
0.99
slope
0.89
steps
0.86
upstairs
0.84
slopes
0.80
door
0.80
step
0.79
Activations Density 0.027%