INDEX
Explanations
instances of the word "go" in various contexts
New Auto-Interp
Negative Logits
lake
-0.67
picking
-0.65
ammon
-0.64
ament
-0.64
mith
-0.63
race
-0.62
Aram
-0.61
urgy
-0.61
role
-0.59
lain
-0.59
POSITIVE LOGITS
vernment
1.07
verning
1.04
ven
0.96
ffic
0.92
ogly
0.87
lems
0.84
etz
0.83
zzi
0.83
zzo
0.83
ppe
0.80
Activations Density 0.008%