INDEX
Explanations
references to horses and related terms
New Auto-Interp
Negative Logits
eration
-0.18
lain
-0.18
enty
-0.15
راÙģ
-0.15
dog
-0.15
orp
-0.15
ewis
-0.15
enance
-0.15
ests
-0.14
617
-0.14
POSITIVE LOGITS
hair
0.27
back
0.27
men
0.25
power
0.23
play
0.20
fly
0.20
BACK
0.20
hoe
0.20
women
0.20
stable
0.19
Activations Density 0.013%