INDEX
Explanations
instances of the verb "go" and its variations
New Auto-Interp
Negative Logits
idis
-0.18
inz
-0.16
athon
-0.15
tehdy
-0.15
uh
-0.15
GAN
-0.14
vil
-0.14
riel
-0.14
-dismiss
-0.14
ahren
-0.14
POSITIVE LOGITS
with
0.27
ahead
0.20
for
0.18
old
0.17
avec
0.16
ult
0.16
586
0.15
old
0.15
agon
0.15
.with
0.14
Activations Density 0.047%