INDEX
Explanations
words related to colors or visual descriptors
New Auto-Interp
Negative Logits
glers
-0.74
earchers
-0.74
hiba
-0.73
Starr
-0.72
undai
-0.72
perty
-0.70
cemic
-0.69
pering
-0.69
xon
-0.68
Preview
-0.67
POSITIVE LOGITS
е
1.52
а
1.52
о
1.50
и
1.43
оÐ
1.39
Ñĭ
1.37
Ñĥ
1.30
л
1.23
Ñı
1.22
ÑĢ
1.21
Activations Density 0.005%