INDEX
Explanations
references to brand names and their attributes
New Auto-Interp
Negative Logits
bral
-0.17
ikh
-0.16
edes
-0.15
Eig
-0.15
bane
-0.15
riba
-0.15
ëŀ
-0.14
bras
-0.14
Lennon
-0.14
elig
-0.14
POSITIVE LOGITS
-name
0.27
enburg
0.24
-new
0.24
ishing
0.24
ished
0.22
name
0.21
name
0.20
spanking
0.20
/type
0.17
t
0.17
Activations Density 0.034%