INDEX
Explanations
references to stainless steel and its various attributes
New Auto-Interp
Negative Logits
inders
-0.19
udi
-0.17
tones
-0.16
kk
-0.15
adlo
-0.14
loub
-0.14
brun
-0.14
asurer
-0.14
udies
-0.14
inn
-0.13
POSITIVE LOGITS
steel
0.63
Steel
0.56
steel
0.52
Steel
0.48
steal
0.41
_ste
0.40
Steele
0.39
éĴ¢
0.38
ste
0.36
éĭ¼
0.35
Activations Density 0.004%