INDEX
Explanations
references to the number three and its variations in context
New Auto-Interp
Negative Logits
th
-0.17
omer
-0.17
rect
-0.15
stuff
-0.15
ween
-0.15
rew
-0.14
mar
-0.14
erable
-0.14
tery
-0.14
絡
-0.14
POSITIVE LOGITS
Musk
0.25
peats
0.22
peater
0.21
peat
0.20
cheers
0.19
amigos
0.18
Cs
0.17
eyed
0.17
mus
0.17
strikes
0.17
Activations Density 0.061%