INDEX
Explanations
references to shirts or t-shirts
New Auto-Interp
Negative Logits
ⓧ
-0.62
sağlay
-0.48
muhte
-0.44
المعيارى
-0.42
Combined
-0.41
nového
-0.41
Likely
-0.41
nieuw
-0.41
contenir
-0.41
GetAxis
-0.40
POSITIVE LOGITS
shirt
1.41
Shirt
1.21
shirts
1.21
tshirt
1.04
tshirt
1.02
Shirts
0.93
hirt
0.86
shirt
0.84
футболка
0.80
camiseta
0.80
Activations Density 0.002%