INDEX
Explanations
words related to colors
references to colors
New Auto-Interp
Negative Logits
plug
-0.81
visor
-0.75
wered
-0.70
liest
-0.70
WARE
-0.70
nomine
-0.68
ELF
-0.68
EStream
-0.68
Boot
-0.65
compr
-0.65
POSITIVE LOGITS
iseum
1.02
ossus
0.95
estial
0.84
onel
0.82
isions
0.81
s
0.79
ophon
0.77
icol
0.75
umn
0.75
isco
0.74
Activations Density 0.025%