INDEX
Explanations
references to the word "ant" in various contexts
New Auto-Interp
Negative Logits
rint
-0.17
rl
-0.17
ryo
-0.17
ra
-0.15
lear
-0.15
trl
-0.15
opup
-0.14
baÅŁ
-0.14
र
-0.14
ru
-0.14
POSITIVE LOGITS
ucket
0.23
woord
0.20
ropic
0.20
y
0.19
elope
0.18
yne
0.18
ucky
0.18
rop
0.18
enna
0.17
ing
0.17
Activations Density 0.037%