INDEX
Explanations
references to horses and horse-related activities or concepts
New Auto-Interp
Negative Logits
eration
-0.18
lain
-0.17
ewis
-0.17
rive
-0.16
elper
-0.16
rvine
-0.16
éro
-0.15
erset
-0.15
ests
-0.15
ombs
-0.15
POSITIVE LOGITS
hair
0.27
back
0.26
men
0.23
play
0.20
BACK
0.20
power
0.19
stable
0.19
hoe
0.19
women
0.18
fly
0.18
Activations Density 0.012%