INDEX
Explanations
references to the brand "Polo" and related fashion terminology
New Auto-Interp
Negative Logits
martial
-0.16
aina
-0.14
wrestling
-0.14
raft
-0.14
Martial
-0.14
rips
-0.14
Projectile
-0.14
okie
-0.13
lottery
-0.13
é϶
-0.13
POSITIVE LOGITS
polo
0.50
Polo
0.45
pony
0.26
pon
0.26
Pony
0.24
pol
0.23
pol
0.22
horses
0.22
polar
0.21
polarization
0.21
Activations Density 0.002%