INDEX
Explanations
the word "go" in various contexts with different activation levels
instances of the word "go" in varying contexts
New Auto-Interp
Negative Logits
Aram
-0.59
Interested
-0.56
aples
-0.55
Tribune
-0.52
Purs
-0.52
referring
-0.51
Contracts
-0.51
Mead
-0.51
FML
-0.50
Alchemist
-0.49
POSITIVE LOGITS
ggle
1.39
lems
1.23
vt
1.19
lem
1.14
ogly
1.10
fund
1.04
ogl
1.04
verning
1.04
vier
0.98
ading
0.96
Activations Density 0.030%