INDEX
Explanations
repeated uses of the word "my" in various contexts
New Auto-Interp
Negative Logits
ories
-0.16
åŃĿ
-0.16
pig
-0.15
oves
-0.15
ries
-0.15
ible
-0.15
gan
-0.14
pig
-0.14
agen
-0.14
s
-0.14
POSITIVE LOGITS
own
0.19
own
0.16
adow
0.16
shake
0.15
ombine
0.15
Own
0.15
lyn
0.15
èŃ
0.14
Shore
0.14
ä¸Ģ页
0.14
Activations Density 0.051%