INDEX
Explanations
references to furniture or furnishings
New Auto-Interp
Negative Logits
ennes
-0.16
Sharper
-0.15
dül
-0.15
uyến
-0.15
orient
-0.15
dete
-0.15
jom
-0.15
opers
-0.14
done
-0.14
jual
-0.14
POSITIVE LOGITS
aces
0.35
ishing
0.32
ace
0.32
ished
0.28
itures
0.24
isher
0.23
ISHED
0.21
ACES
0.21
aced
0.21
itur
0.20
Activations Density 0.012%