INDEX
Explanations
instances of the word "share" and its variations
New Auto-Interp
Negative Logits
ape
-0.18
ritz
-0.18
ago
-0.16
erness
-0.15
ÑģобÑĸ
-0.14
ockets
-0.14
go
-0.14
orphic
-0.14
amientos
-0.13
_GU
-0.13
POSITIVE LOGITS
stories
0.20
secrets
0.20
experiences
0.19
knowledge
0.19
with
0.17
knowledge
0.17
thoughts
0.16
information
0.16
story
0.16
μαζί
0.16
Activations Density 0.030%