INDEX
Explanations
instances of the word "goes" followed by a number indicating the strength of the activation
instances of the phrase "goes" in various contexts
New Auto-Interp
Negative Logits
role
-0.74
eers
-0.71
uctor
-0.71
rient
-0.67
icon
-0.67
cos
-0.66
essor
-0.65
icons
-0.62
eer
-0.62
itionally
-0.62
POSITIVE LOGITS
Ń·
0.96
verning
0.86
vt
0.83
Forth
0.83
lems
0.81
itters
0.80
OHN
0.73
ashore
0.73
uten
0.73
ģĸ
0.71
Activations Density 0.014%