INDEX
Explanations
names containing the sequence "ih" with varying activations
instances of the substring "ih" in various contexts
New Auto-Interp
Negative Logits
pop
-0.71
pie
-0.69
aster
-0.67
fixture
-0.64
snap
-0.64
Pop
-0.64
popup
-0.62
Pie
-0.61
dotted
-0.59
plaster
-0.59
POSITIVE LOGITS
ih
4.16
iy
1.52
ij
1.40
uh
1.27
iw
1.25
oh
1.21
iq
1.20
ika
1.19
ich
1.14
ihu
1.13
Activations Density 0.013%